Reading a Complex CSV

Hello All,

I have a challenge that I think that split or explode will handle, but unfortunately I am not smart enough to figure it out.

I have a form (Jotform) that users submit a variable number of video and image clips for each scene of a video that I produce automatically.

The CSV that I receive is Double-quoted and contains the content the user uploaded.
It looks like this:

“SubmissionDate”,“Scene1”,“Scene2”,“Scene3”
“2015-12-09 08:39:18”,“[1.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566858f1f12151.mp4 [23.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566858f993e7623.mp4 [25.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590b787b525.mp4 [IMLS98596647-15.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590e75119IMLS98596647-15.jpg [IMLS98596647-16.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590fb9572IMLS98596647-16.jpg [IMLS98596647-17.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668591105af2IMLS98596647-17.jpg [IMLS98596647-18.jpg] [url=https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685912387b8IMLS98596647-18.jpg,]https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685912387b8IMLS98596647-18.jpg","[/url][20.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566858f16bb2720.mp4 [21.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566858fc51d5b21.mp4 [22.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590a2e51222.mp4 [IMLS98596647-7.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590c81858IMLS98596647-7.jpg [IMLS98596647-8.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590dc1e87IMLS98596647-8.jpg [IMLS98596647-9.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590f0ff18IMLS98596647-9.jpg [IMLS98596647-10.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685910836dcIMLS98596647-10.jpg [IMLS98596647-11.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685912158efIMLS98596647-11.jpg [IMLS98596647-12.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566859138e67bIMLS98596647-12.jpg [IMLS98596647-13.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566859150d977IMLS98596647-13.jpg [IMLS98596647-14.jpg] [url=https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685916e9115IMLS98596647-14.jpg,]https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685916e9115IMLS98596647-14.jpg","[/url][18.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566858f22530018.mp4 [17.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566858fec7fca17.mp4 [16.mp4] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668590d8e99316.mp4 [IMLS98596647-1.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/5668591035666IMLS98596647-1.jpg [IMLS98596647-2.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685911cfbe2IMLS98596647-2.jpg [IMLS98596647-3.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566859134cb5aIMLS98596647-3.jpg [IMLS98596647-4.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685914ddbcbIMLS98596647-4.jpg [IMLS98596647-5.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/566859168ffbbIMLS98596647-5.jpg [IMLS98596647-6.jpg] https://s3.amazonaws.com/jotformWidgets/dragDropUpload/53346332689968/56685918354d3IMLS98596647-6.jpg

Can anyone point me in the right direction? I would like to count the mp4’s and images of each scene and access them individually for processing.

I use grep “[[^]]*]” to remove the brackets and contents and replace with a pipe delimiter, but it also places one at the beginning of the string.

I feel that explode is the way to go, but I am having problems even starting this process.

Thank you all in advance.

Given your sample datas I don’t know how to split the doc in scenes chapters.
Here is a quick and dirty script extracting some wanted datas.

set csvPath to ((path to desktop as text) & "essai.csv") as alias
set rawText to read csvPath

set thisList to my decoupe(rawText, "mp4]")
set nbMp4 to (count thisList) - 1
set linksToMp4s to my extractLinksFrom(thisList)

set thisList to my decoupe(rawText, "jpg]")
set nbJpg to (count thisList) - 1
set linksToJpgs to my extractLinksFrom(thisList)

#=====

on extractLinksFrom(aList)
	set links to {}
	repeat with anItem in rest of aList
		set aLink to item 1 of my decoupe(anItem, " [")
		set end of links to aLink
	end repeat
	return links
end extractLinksFrom

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====


Yvan KOENIG running El Capitan 10.11.2 in French (VALLAURIS, France) mercredi 9 décembre 2015 22:43:28

I believe there’s a problem with the field separators (comma) and/or the field delimiters (double quotes).
In the 1st line (field names) the fields are correctly separated & delimited, but less so in the 1st line of data (it is one line, correct?).
The date field is delimited as it should be, and the 1st field separator is there too. The rest of the line looks like 3 fields but the delimiters & separators seem garbled: I see 2 occurrences of ;," where that should be “,”. So a double quote mysteriously changed into a semicolon.

(To complicate matters further: the semicolon is used as field separator in locales that use the decimal comma.)

There’s a CSV-to-list routine here, but it won’t help with solving this problem.
It turns a csv file into a list of lists: each sublist contains one line of data, the fields become elements of the list.

Thank you both for your input!

I am trying to go down the shell scripting route, but think I am ultimately going to employ a freelance linux coder. It seems to be a job for sed awk cat etc

If I do get the data into a tab delimited file with pipe delimited strings to seperate the file locations for each Scene is it possible to use that data in applescript?

I have also tried to use CSVkit but to no avail

I think you should try to get good data. If the separators are garbled, what else is missing or f… up?
Would you be prepared to accept responsibility for the final result, when given faulty input?

When you have good data, the routine I linked to will produce the lists you’re asking about, in a format that AppleScript can understand.

Hey Snaplash,

Your CSV data is improperly formatted, and I’m assuming this happened when you edited it for posting on MacScripter.

I had to add a couple of double-quotes to fix it for testing.

What you’re trying to do appears to be dead-simple using the Satimage.osax.


# Satimage.osax MUST be installed.
set dataLine to paragraph 2 of csvData
set csvFields to find text "\"[^\"]+\"" in dataLine with regexp, all occurrences and string result
set csvFields to splittext csvFields using "[[:blank:]](?=\\[)" with regexp

Producing a list of lists (pretty printed in Script Debugger.

{ { "2015-12-09 08:39:18" }, { "[1.mp4] https://s3.amazonaws.com/jotformWidgets . f12151.mp4", "[23.mp4] https://s3.amazonaws.com/jotformWidgets . 3e7623.mp4", "[25.mp4] https://s3.amazonaws.com/jotformWidgets . 87b525.mp4", "[IMLS98596647-15.jpg] https://s3.amazonaws.com/jotformWidgets . 647-15.jpg", "[IMLS98596647-16.jpg] https://s3.amazonaws.com/jotformWidgets . 647-16.jpg", "[IMLS98596647-17.jpg] https://s3.amazonaws.com/jotformWidgets . 647-17.jpg", "[IMLS98596647-18.jpg] https://s3.amazonaws.com/jotformWidgets . ot;" }, { "[20.mp4] https://s3.amazonaws.com/jotformWidgets . bb2720.mp4", "[21.mp4] https://s3.amazonaws.com/jotformWidgets . 1d5b21.mp4", "[22.mp4] https://s3.amazonaws.com/jotformWidgets . e51222.mp4", "[IMLS98596647-7.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-7.jpg", "[IMLS98596647-8.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-8.jpg", "[IMLS98596647-9.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-9.jpg", "[IMLS98596647-10.jpg] https://s3.amazonaws.com/jotformWidgets . 647-10.jpg", "[IMLS98596647-11.jpg] https://s3.amazonaws.com/jotformWidgets . 647-11.jpg", "[IMLS98596647-12.jpg] https://s3.amazonaws.com/jotformWidgets . 647-12.jpg", "[IMLS98596647-13.jpg] https://s3.amazonaws.com/jotformWidgets . 647-13.jpg", "[IMLS98596647-14.jpg] https://s3.amazonaws.com/jotformWidgets . ot;" }, { "[18.mp4] https://s3.amazonaws.com/jotformWidgets . 530018.mp4", "[17.mp4] https://s3.amazonaws.com/jotformWidgets . 7fca17.mp4", "[16.mp4] https://s3.amazonaws.com/jotformWidgets . e99316.mp4", "[IMLS98596647-1.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-1.jpg", "[IMLS98596647-2.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-2.jpg", "[IMLS98596647-3.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-3.jpg", "[IMLS98596647-4.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-4.jpg", "[IMLS98596647-5.jpg] https://s3.amazonaws.com/jotformWidgets . 6647-5.jpg", "[IMLS98596647-6.jpg] https://s3.amazonaws.com/jotformWidgets . .jpg" } }
As you can see splitting this up nicely took two lines of real code.


Chris


{ MacBookPro6,1 · 2.66 GHz Intel Core i7 · 8GB RAM · OSX 10.11.2 }
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯