Manipulating CSV or Excel text

pickup · August 10, 2010, 7:27pm

Hi Everyone,
I have had a request from some people to change an Excel Spreadsheet text to another formatting type.

For example:

From this format (an Excel file):

Sunday 6/17	Monday 6/18

7:00 AM I Love Jazz SOTA 14
7:30 AM BTM 12 SOTA 15
8:00 AM BTM 13 SOTA 16

to:

2010-06-17 7:00:00 I Love Jazz
2010-06-17 7:30:00 BTM 12
2010-06-17 8:00:00 BTM 13
2010-06-18 7:00:00 SOTA 14
2010-06-18 7:30:00 SOTA 15
2010-06-18 8:00:00 SOTA 16

These files are for a server schedule that is created in Excel (the top one) and I want to create a CSV in the second format to import into the scheduler. Can Applescript manipulate through data this way?

Thank you so much!

Yvan_Koenig · August 11, 2010, 10:33am

I’m not sure that I understand well.

Is the original entry a single row of datas splitted in several ones by the MacScripter parser ?

Is the returned item a single row splitted . or really six ones ?

Which is the delimiter used in the Excel datas, TAB or chunk of spaces ?

Yvan KOENIG (VALLAURIS, France) mercredi 11 aoÃ»t 2010 12:32:15

McUsr · August 11, 2010, 3:21pm

Hello.

If you just make them change the way the are entering the data like it is in the second one, then you could just
export it as a CSV file, (the spread sheet). I can’t see that this rearrangement of the input data should lead to any practical difficulties, it would maybe be practical to have the day of the week in one column in front of the others.

You would then of course only export the columns with the data you need, and see if it is ok to do it manually, then you have figured out the steps to program it.

Yvan_Koenig · August 11, 2010, 8:46pm

Is which is described below what you really want to achieve ?

I tried to display the different steps of the process.

17 juin 2010 18 juin 2010
07:00 I Love Jazz SOTA 14
07:30 BTM 12 SOTA 15
08:00 BTM 13 SOTA 16

17 juin 2010 18 juin 2010
07:00 I Love Jazz 07:00 SOTA 14
07:30 BTM 12 07:30 SOTA 15
08:00 BTM 13 08:00 SOTA 16

17 juin 2010 18 juin 2010
17 juin 2010 07:00 I Love Jazz 18 juin 2010 07:00 SOTA 14
17 juin 2010 07:30 BTM 12 18 juin 2010 07:30 SOTA 15
17 juin 2010 08:00 BTM 13 18 juin 2010 08:00 SOTA 16

17 juin 2010 07:00 I Love Jazz
17 juin 2010 07:30 BTM 12
17 juin 2010 08:00 BTM 13
18 juin 2010 07:00 SOTA 14
18 juin 2010 07:30 SOTA 15
18 juin 2010 08:00 SOTA 16

I repeat that it would be useful to know what is used as values separator.

May be TAB
may be comma
may be semi-colon
may be chunk of spaces

I apologize but as I’m not a sooth sayer, I can’t guess this kind of thing.

Yvan KOENIG (VALLAURIS, France) mercredi 11 aoÃ»t 2010 22:45:37

pickup · August 13, 2010, 8:44pm

Hi Guys,
Sorry, to clarify, the first part is an excel spreadsheet with the data you see separated into their own respective cell. THis is done because our magazine has to be formatted this way.

Sunday 6/17 Monday 6/18
7:00 AM I Love Jazz SOTA 14
7:30 AM BTM 12 SOTA 15
8:00 AM BTM 13 SOTA 16

However, our playout scheduler takes a txt file in a tab delimited format that is somewhat different.

2010-06-17 7:00:00 I Love Jazz
2010-06-17 7:30:00 BTM 12
2010-06-17 8:00:00 BTM 13
2010-06-18 7:00:00 SOTA 14
2010-06-18 7:30:00 SOTA 15
2010-06-18 8:00:00 SOTA 16

What I want to do is export the Excel data out as a tab delimited txt file and then manipulate the data to match the data of my second example. This way, all I then would have to do is import that file into the playout schedule and avoid having to program the schedule two different ways. What I am not understanding is how to move text around using Applescript and the write it to a file.

I am sorry about the confusion and thank you so much for your suggestions and help.

Yvan_Koenig · August 14, 2010, 9:34am

I apologize but I’m not a sooth sayer.
At this time, it’s always obscure.
I don’t know if the original datas are :

Sunday 6/17 Monday 6/18 7:00 AM I Love Jazz SOTA 14 7:30 AM BTM 12 SOTA 15 8:00 AM BTM 13 SOTA 16

or

Sunday 6/17 Monday 6/18
7:00 AM I Love Jazz SOTA 14
7:30 AM BTM 12 SOTA 15
8:00 AM BTM 13 SOTA 16

Yvan KOENIG (VALLAURIS, France) samedi 14 aoÃ»t 2010 11:34:21

alastor933 · August 14, 2010, 12:06pm

Confused about this:
assuming the Excel table starts in cell A1, Is “Sunday 6/17” the value in cell A1 or in cell B1?

I’d assume it’s B1, so the dates would be the headers for the columns of server names (if that’s what they are).

I would recommend exporting as comma-delimited vars.
Then use this excellent routine by the late Kai Edwards to convert to a list of lists, which the script can write out to another file in any format you like.
Caveat: it returns an empty field as a double quote.
Its output would look like this:

set excelData to {{"", "Sunday 6/17", "Monday 6/18"}, {"7:00 AM", "I Love Jazz", "SOTA 14"}, {"another row"}, {"etc"}}

The actual field separator is of little consequence: use a script property to declare it.
This makes it easy to switch between different output formats (recycle your scripts :)).

McUsr · August 14, 2010, 1:27pm

Hello.

I have yet another solution for you, but I really haven’t the time to implement it for you.

I sense that you really wish to have the date formats as specified. (Not everything in my scheme may work as I haven’t even tested it manually.

a.) Create a new sheet.
b.) Format one column with the date according to your liking.
c.) For each date in the “date row” copy that value three times to the date column.
d.) Duplicate the process for the time values in the column besides the “date column”
e.) Transpose the songs from the row into a column besides the times. That should go.
f.) Export the sheet as CSV

Yvan_Koenig · August 14, 2010, 4:57pm

I apologize but, as far as I know, when we read a text file or a CSV one, we aren’t getting a list of lists.
We get a text item or a list of text items (paragraphs) and we must know the delimiter used to build the needed lists.

Here, I assume that the datas are in a chunk of text with three values separated by TAB character in every line.


(*
Yvan KOENIG (VALLAURIS, France)
2010/08/14
*)
--=====

on run
	set source to "	17 juin 2010	18 juin 2010
07:00	I love Jazz	SOTA 14
07:30	BTM 12	SOTA 15
08:00	BTM 13	SOTA 16"
	
	(*
may be 
set source to read file "path:to:TSV:file:containing:the:datas"
*)
	(*
may be :
set source to the clipboard as text
*)
	
	if source contains tab then
		set delim to tab (* TSV datas *)
	else if source contains "," then
		set delim to "," (* original CSV datas *)
	else if source contains ";" then
		set delim to ";" (* CSV used when the decimal separator is comma *)
	else
		error "Unknown values separator !"
	end if
	
	set les_paragraphes to paragraphs of source
	repeat with ref_d_un_paragraphe in les_paragraphes
		set contents of ref_d_un_paragraphe to my decoupe(ref_d_un_paragraphe, delim)
	end repeat
	set date_1 to my date_ISO(item 2 of item 1 of les_paragraphes)
	set date_2 to my date_ISO(item 3 of item 1 of les_paragraphes)
	set |les donnÃ©es| to items 2 thru -1 of les_paragraphes
	set liste_1 to {}
	set liste_2 to {}
	
	repeat with une_reference in |les donnÃ©es|
		set {une_heure, titre_1, titre_2} to contents of une_reference
		copy date_1 & tab & une_heure & ":00" & tab & titre_1 to end of liste_1
		copy date_2 & tab & une_heure & ":00" & tab & titre_2 to end of liste_2
	end repeat
	copy liste_2 to end of liste_1
	
	set p2d to "" & (path to desktop)
	set nom_du_fichier to (do shell script "date +_%Y%m%d-%H%M%S.txt")
	tell application "System Events" to make new file at end of folder p2d with properties {name:nom_du_fichier}
	write my recolle(liste_1, return) to file (p2d & nom_du_fichier)
end run

--=====

on date_ISO(d)
	set d to date d
	return "" & year of d & "-" & text -2 thru -1 of ("0" & (month of d as integer)) & "-" & text -2 thru -1 of ("0" & day of d)
end date_ISO

--=====

on decoupe(t, d)
	local oTIDs, l
	set oTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to d
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

--=====

on recolle(l, d)
	local oTIDs, t
	set oTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to d
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

--=====

Yvan KOENIG (VALLAURIS, France) samedi 14 aoÃ»t 2010 18:56:57

pickup · August 15, 2010, 9:48pm

Hi All,
Thank you so much for your help. I will give these a try and see what I come up with. I really appreciate your input.

alastor933 · August 16, 2010, 7:59am

No, obviously not.
I wasn’t aiming to provide a full recipe…

set csvData to readFile(csvFile) -- open & read CSV file
set csvList to csvToList(csvData) --  convert input to list of lists, using Kai's routine

One point about csv files in general: fields containing commas are enclosed in double quotes.
Some institutions (like my bank) want to be super-safe, and their files will have all fields in double quotes. Kai’s routine can deal with that.

Yvan_Koenig · August 16, 2010, 9:26am

Thank’s for sharing your knowledge but it has a serious hole.

At this time what I wrote remains true.

The kai’s code fail to decipher a csv file built by European products like Numbers or Excel.
In these files, the separator is the semi-+colon:

“;Date;"Mme AZERTYUIOP WXCVBN - CCHQ n° 12345678901
LibellÃ©";"DÃ©bit
euros";"CrÃ©dit
euros";
0;;solde au 31/12/2010;;12 345,67;12 345,67
1;04/01/2010;debt#1;12,34;;12 333,33
2;04/01/2010;debt #2;567,00;;11 766,33
3;05/01/2010;debt #3;88,00;;11 678,33
4;05/01/2010;debt #4;900,00;;10 778,33
5;06/01/2010;entry #1;;987,00;11 765,33”

the kai’s handler return something like :

{{“;Date;Mme AZERTYUIOP WXCVBN - CCHQ n° 12345678901
LibellÃ©;DÃ©bit
euros;CrÃ©dit
euros;"0;;solde au 31/12/2010;;12 345"67;12 345"67"1;04/01/2010;debt#1;12"34;;12 333"33"2;04/01/2010;debt #2;567"00;;11 766"33"3;05/01/2010;debt #3;88"00;;11 678"33"4;05/01/2010;debt #4;900"00;;10 778"33"5;06/01/2010;entry #1;;987"00;11 765"33”}}

when the correct result would be :

{{“"”, “"Date"”, “""Mme AZERTYUIOP WXCVBN - CCHQ n° 12345678901”}, {“LibellÃ©""”, “""DÃ©bit”}, {“euros""”, “""CrÃ©dit”}, {“euros""”, “"”}, {“0"”, “""”, “"solde au 31/12/2010"”, “""”, “"12 345,67"”, “"12 345,67”}, {“1"”, “"04/01/2010"”, “"debt#1"”, “"12,34"”, “""”, “"12 333,33”}, {“2"”, “"04/01/2010"”, “"debt #2"”, “"567,00"”, “""”, “"11 766,33”}, {“3"”, “"05/01/2010"”, “"debt #3"”, “"88,00"”, “""”, “"11 678,33”}, {“4"”, “"05/01/2010"”, “"debt #4"”, “"900,00"”, “""”, “"10 778,33”}, {“5"”, “"06/01/2010"”, “"entry #1"”, “""”, “"987,00"”, “"11 765,33”}}

Here is the code which does the correct job and the kai’s handler as I got it thru Google.


--set texte_csv to read file ("" & (path to desktop & "4tests.csv"))
set texte_csv to ";Date;\"Mme AZERTYUIOP WXCVBN - CCHQ  n° 12345678901
LibellÃ©\";\"DÃ©bit
euros\";\"CrÃ©dit
euros\";
0;;solde au 31/12/2010;;12 345,67;12 345,67
1;04/01/2010;debt#1;12,34;;12 333,33
2;04/01/2010;debt #2;567,00;;11 766,33
3;05/01/2010;debt #3;88,00;;11 678,33
4;05/01/2010;debt #4;900,00;;10 778,33
5;06/01/2010;entry #1;;987,00;11 765,33"

set texte_csv to my remplace(texte_csv, ";", quote & ";" & quote)
set liste_lignes to paragraphs of texte_csv

repeat with une_reference in liste_lignes
	set contents of une_reference to my decoupe(contents of une_reference, ";")
end repeat
(*
set en_liste to my CsvToList(texte_csv)
en_liste
*)

liste_lignes

--=====

on decoupe(t, d)
	local oTIDs, l
	set oTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to d
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

--=====
(*
replaces every occurences of d1 by d2 in the text t
*)
on remplace(t, d1, d2)
	local oTIDs, l
	set oTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to d1
	set l to text items of t
	set AppleScript's text item delimiters to d2
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end remplace

--=====

(* revised by kai *)
on CsvToList(t)
	set quot to "\""
	set comma to ","
	set oldDelimiters to text item delimiters
	set quoteSubstitute to ASCII character 0
	set newLineSubstitute to ASCII character 1
	set commaSubstitute to ASCII character 2
	set text item delimiters to quot & quot
	set t to t's text items
	set text item delimiters to quoteSubstitute
	set t to t as string
	
	(* suggested alternative fix *)
	
	set text item delimiters to comma & quoteSubstitute & comma
	set t to t's text items
	set text item delimiters to comma & comma
	set t to t as string
	
	set text item delimiters to quot
	script o
		property l : t's text items
	end script
	repeat with i from 1 to count o's l by 2
		set text item delimiters to comma
		set t to text items of o's l's item i
		
		(* repeat loop removed from here *)
		
		set text item delimiters to commaSubstitute
		set t to t as string
		set text item delimiters to newLineSubstitute
		set o's l's item i to t's paragraphs as string
	end repeat
	set text item delimiters to ""
	set t to o's l as string
	set text item delimiters to quoteSubstitute
	set o's l to t's text items
	set text item delimiters to quot
	set t to o's l as string
	set text item delimiters to newLineSubstitute
	set o's l to t's text items
	set text item delimiters to commaSubstitute
	repeat with i from 1 to count o's l
		set o's l's item i to text items of o's l's item i
	end repeat
	set text item delimiters to oldDelimiters
	return o's l
end CsvToList

--=====

(* original version revised by T&B *)

on Csv_ToList(t)
	set quot to "\""
	set comma to ","
	set oldDelimiters to text item delimiters
	set quoteSubstitute to ASCII character 0
	set newLineSubstitute to ASCII character 1
	set commaSubstitute to ASCII character 2
	set text item delimiters to quot & quot
	set t to t's text items
	set text item delimiters to quoteSubstitute
	set t to t as string
	set text item delimiters to quot
	script o
		property l : t's text items
	end script
	repeat with i from 1 to count o's l by 2
		set text item delimiters to comma
		set t to text items of o's l's item i
		
		-- Added this to fix empty quotes:
		if t contains quoteSubstitute then
			repeat with columnN from 1 to length of t
				if item columnN in t is quoteSubstitute then
					set item columnN in t to ""
				end if
			end repeat
		end if
		
		set text item delimiters to commaSubstitute
		set t to t as string
		set text item delimiters to newLineSubstitute
		set o's l's item i to t's paragraphs as string
	end repeat
	set text item delimiters to ""
	set t to o's l as string
	set text item delimiters to quoteSubstitute
	set o's l to t's text items
	set text item delimiters to quot
	set t to o's l as string
	set text item delimiters to newLineSubstitute
	set o's l to t's text items
	set text item delimiters to commaSubstitute
	repeat with i from 1 to count o's l
		set o's l's item i to text items of o's l's item i
	end repeat
	set text item delimiters to oldDelimiters
	return o's l
end Csv_ToList


--=====
(* kai's original version *)

on |csv as list| from t
	set d to text item delimiters
	set q to ASCII character 0
	set p to ASCII character 1
	set c to ASCII character 2
	set text item delimiters to "\"\""
	set t to t's text items
	set text item delimiters to q
	set t to t as string
	set text item delimiters to "\""
	script o
		property l : t's text items
	end script
	repeat with i from 1 to count o's l by 2
		set text item delimiters to ","
		set t to text items of o's l's item i
		set text item delimiters to c
		set t to t as string
		set text item delimiters to p
		set o's l's item i to t's paragraphs as string
	end repeat
	set text item delimiters to ""
	set t to o's l as string
	set text item delimiters to q
	set o's l to t's text items
	set text item delimiters to "\""
	set t to o's l as string
	set text item delimiters to p
	set o's l to t's text items
	set text item delimiters to c
	repeat with i from 1 to count o's l
		set o's l's item i to text items of o's l's item i
	end repeat
	set text item delimiters to d
	o's l
end |csv as list|

If you are able to edit the handler to get it doing a correct job, I will change my advice but at this time, I repeat that I must know the delimiter to be able to decipher CSV files.

Yvan KOENIG (VALLAURIS, France) lundi 16 aoÃ»t 2010 11:25:49

alastor933 · August 16, 2010, 12:49pm

Thanks for pointing that out. I have never used data from Excel, so I was not aware of this.
I suppose I just was lucky to pick a bank which delivers comma-separated data.

Yvan_Koenig · August 16, 2010, 7:33pm

I don’t use Excel but for tests, I asked a friend to send to me a ‘CSV’ created on a system using the decimal comma.
Like those created by FileMaker, Bento and Numbers (among numerous ones) they use the semi-colon to reduce the frequency of valus needing to be enclosed between quotes (every decimal numbers in such a case°.
In my own life this is not a problem, I use TSV files but many users are sticked to what I feel as a ridiculous format.

Yvan KOENIG (VALLAURIS, France) lundi 16 aoÃ»t 2010 21:33:05