Convert XLS files into CSV preserving UTF-8 encoding

I’ll mend it! :smiley:

This is the script to merge it with.

I think my last script should work, and this to merge the resulting files with. I used DJ Bazzie Wazzie’s code as a starting point.

set theFiles to choose file with multiple selections allowed
set saveFile to POSIX path of (((path to desktop) as text) & "MergedCSVFile.csv")

do shell script "cp " & quoted form of POSIX path of item 1 of theFiles & space & quoted form of saveFile

if (count of theFiles) = 1 then return
set x to quoted form of POSIX path of item 2 of theFiles
set theFiles to items 2 thru -1 of theFiles
repeat with fileToAdd in theFiles
	set x to x & " " & quoted form of POSIX path of fileToAdd
end repeat

-- return
do shell script ("for i in  " & x & " ; do cat $i |sed 1d >>" & quoted form of saveFile & " ;done ")

I mended the applescript addressing the idosyncracies of “;” in a “;” delimited CSV file above.

WOW! These last 2 work perfectly! :smiley:

How can we combine all of this in 1 script now?

1 “ Select files to merge — WE HAVE IT :slight_smile:
2 - If they are xls|xlsx files convert them to UTF-8 csv — WE HAVE THE SCRIPT BUT HOW TO PROCESS THEM ALL IN LOOP?
3 - Type output filename and select output folder — MISSING
4 - Merge keeping the header row only of the first file — WE HAVE THE SCRIPT
5 - Save in selected output folder with selected filename — MISSING

Thanks so much for your help :slight_smile:

If there are semicolons in your input, it may break, the only thing to do, is to replace them. Tools that looks for field separators, doesn’t consider escaping in-field field separators!

This one converts and merge in a single call.


set lesFichiersExcel to choose file of type {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls"} with multiple selections allowed

character 2 of (0.5 as text)
if result is "," then
	set delim to ";"
	# set sep1000 to character id 160
else
	set delim to ","
	# set sep1000 to ","
end if

tell application "System Events" to tell disk item (item 1 of lesFichiersExcel as text)
	set {outFile, fileName, ext} to {path of container, name, name extension}
end tell
set outFile to outFile & (text 1 thru -(1 + (length of ext)) of fileName) & "csv"
set openFile to open for access file outFile with write permission
set eof openFile to 0

my convertAfile(item 1 of lesFichiersExcel, 1, delim, openFile)
repeat with i from 2 to count lesFichiersExcel
	my convertAfile(item i of lesFichiersExcel, 2, delim, openFile)
end repeat
close access openFile
end

on convertAfile(xPath, startRow, delim_, openFile_)
	local lastCol, lastRow, r, cellVal, rowStr, e, n, sep, errmsg
	
	tell application "Microsoft Excel"
		activate
		try
			open xPath
			set lastCol to count of columns of used range of active sheet
			set lastRow to count of rows of used range of active sheet
			
			repeat with r from startRow to lastRow
				set cellVal to (value of cell r of column 1 of active sheet) as text
				if cellVal contains delim_ then set cellVal to quote & cellVal & quote # ADDED
				set rowStr to cellVal
				repeat with c from 2 to lastCol
					set cellVal to (value of cell r of column c of active sheet) as text
					if cellVal contains delim_ then set cellVal to quote & cellVal & quote # ADDED
					set rowStr to rowStr & delim_ & cellVal
				end repeat
				write rowStr & return to openFile_ as «class utf8»
			end repeat
			
		on error e number n
			
			-- Chris Stone
			set sep to "------------------------------------------"
			set errmsg to sep & return & "Error: " & e & return & sep & return & "Error 
		Number: " & n & return & sep
			tell application "SystemUIServer"
				activate
				display dialog errmsg
			end tell
			set fail to true
		end try
	end tell
end convertAfile

I don’t understand what is the variable fail supposed to achieve.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 18:51:26

Yep Yvan, that works.

Just a couple of questions.

1 - Can it accept CSV files as well, so that it’s: select files, if they are Excel files convert them to CSV, if they are CSV do nothing, merge files ?

2 - Is it possible to select the output filename and folder destination?

3 - Can the Excel conversion be done without having Excel open all the files?

Thanks

(1) I have an idea which would allow it to achieve that.
the scheme would be :
if a file is an Excel one, treat it in the existing handler.
if a file is a csv (supposed to be encoded as utf8,) treat it in an other handler.
Of course, it would be useful to build a special case for the first selected file.

(2) just require a call to display dialog to define the filename
and a call to choose folder to define the folder.

(3) I apologizes but I don’t know a way to convert an Excel file into a csv file without opening the app.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 19:14:52

Your point (1) sounds great :slight_smile:

(2) is it outFile for destination folder? Hmmm not sure where to define it eheh

No problem about (3), it can open Excel, it’s fine :slight_smile:

It is to error out outside the try block, as I try to exit on the level I entered. :slight_smile:

(The code is yanked in from something else, I’ll go back and remove it.)

Seems to work ok but I get a “Can’t get end” error at the end? :confused:

I tried splitting out this line to let the user select the outFile and fileName but it breaks it, I’m dumb :o

I apologizes but it was dinner time :slight_smile:

Here is the script :


on run
	local lesFichiersExcel, theFolder, ext, theName, thePathname, openFile, uneSource, i
	# I inserted the type identifier "public.comma-separated-values-text" because it's the one wore by the csv generated by the export scripts.
	set lesFichiersExcel to choose file with prompt "Select the files to merge" of type {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls", "com.apple.traditional-mac-plain-text", "public.comma-separated-values-text"} with multiple selections allowed
	
	# Extract the container, the name, the name extension of the first item in the list of selected ones
	tell application "System Events" to tell disk item (item 1 of lesFichiersExcel as text)
		set {theFolder, theName, ext} to {path of container, name, name extension}
	end tell
	set theName to (text 1 thru -(1 + (length of ext)) of theName) & "csv"
	
	# Define the name of the merged file
	display dialog "Define the merged file name" default answer theName
	set theName to text returned of result
	if theName does not end with ".csv" then set theName to theName & ".csv"
	
	# Select the folder in which the merged file will be saved
	choose folder with prompt "Location of the merged file" default location (theFolder as alias)
	
	# Build the pathname of the merged file
	set thePathname to (result as text) & theName
	
	tell current application to set openFile to open for access file thePathname with write permission # EDITED
	set eof openFile to 0
	
	# Treats the first source file (we keep the first row)
	set uneSource to item 1 of lesFichiersExcel
	tell application "System Events" to tell disk item (uneSource as text)
		type identifier
	end tell
	if result is in {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls"} then
		my treatAnExcelFile(uneSource, 1, openFile)
	else
		my treatACSVFile(uneSource, 1, openFile)
	end if
	
	# Treat the other source file(s) (we drop the first row)
	repeat with i from 2 to count lesFichiersExcel
		set uneSource to item i of lesFichiersExcel
		tell application "System Events" to tell disk item (uneSource as text)
			type identifier
		end tell
		if result is in {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls"} then
			my treatAnExcelFile(item i of lesFichiersExcel, 2, openFile)
		else
			my treatACSVFile(uneSource, 2, openFile)
		end if
	end repeat
	tell current application to close access openFile # EDITED
end run

--=====

# CAUTION, I removed a parameter so that the call to the two handlers use the same syntax !

on treatAnExcelFile(a_Pathname, start_Row, open_File)
	local delim, lastCol, lastRow, r, cellVal, rowStr, e, n, sep, errmsg
	character 2 of (0.5 as text)
	if result is "," then
		set delim to ";"
		# set sep1000 to character id 160
	else
		set delim to ","
		# set sep1000 to ","
	end if
	tell application "Microsoft Excel"
		activate
		try
			open a_Pathname
			set lastCol to count of columns of used range of active sheet
			set lastRow to count of rows of used range of active sheet
			
			repeat with r from start_Row to lastRow
				set cellVal to (value of cell r of column 1 of active sheet) as text
				if cellVal contains delim then set cellVal to quote & cellVal & quote # ADDED
				set rowStr to cellVal
				repeat with c from 2 to lastCol
					set cellVal to (value of cell r of column c of active sheet) as text
					if cellVal contains delim then set cellVal to quote & cellVal & quote # ADDED
					set rowStr to rowStr & delim & cellVal
				end repeat
				tell current application to write rowStr & return to open_File as «class utf8» # EDITED
			end repeat
			
		on error e number n
			tell current application to close access open_File # EDITED
			-- Chris Stone
			set sep to "------------------------------------------"
			set errmsg to sep & return & "Error: " & e & return & sep & return & "Error 
		Number: " & n & return & sep
			tell application "SystemUIServer"
				activate
				display dialog errmsg
			end tell
			set fail to true
		end try
		close active workbook saving no
	end tell
end treatAnExcelFile

--=====

on treatACSVFile(a_Pathname, start_Row, open_File)
	local desLignes, rowStr, e, n, sep, errmsg
	# EDITED
	try
		set desLignes to paragraphs start_Row thru -1 of (read a_Pathname as «class utf8»)
	on error
		set desLignes to paragraphs start_Row thru -1 of (read a_Pathname)
	end try
	
	try
		repeat with rowStr in desLignes
			write (rowStr as text) & return to open_File as «class utf8»
		end repeat
		
	on error e number n
		tell current application to close access open_File # EDITED
		-- Chris Stone
		set sep to "------------------------------------------"
		set errmsg to sep & return & "Error: " & e & return & sep & return & "Error 
		Number: " & n & return & sep
		tell application "SystemUIServer"
			activate
			display dialog errmsg
		end tell
		set fail to true
	end try
	
end treatACSVFile

I added a close access instruction in the handlers because leaving a file in openfor access state is weird.

As you see, I changed a lot of varnames because I was not at ease with some of them, mainly with outFile naming the container of the file.
I am accustomed to put an underscore in the name of parameters received by the handlers.

I hope that I forgot nothing.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 21:01:30

I hope you had a nice dinner Yvan :smiley:

Wow yes that works perfectly, however the special characters are again garbage now? They look like: Akusticky vylepšené pro kvalitu.

Sorry, my fault, I selected 12 xlsx files and 1 csv with them, didn’t see the csv :smiley:

Not your fault.

The script behave well with Excel files and with the CSV which I used in tests…
If we start with a CSV created by the late script exporting a single file, it required a bit of edition.
If we start with a CSV created by Excel, the underscores are already embedded in it. The script can’t repair them.

In the script above, I edited several instructions to get rid of error messages : error number -10004

When you receive CSV files, are there correctly encoded ?

May I know which tool is supposed to work upon the merged file ?

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 22:40:24

Thanks Yvan, I will keep testing the script this week.

I hope you don’t mind if I come back for some tweak :smiley:

The tool is a custom built CMS at work, where translated text is imported.

No problem.
For safe, I will subscribe to this thread :slight_smile:

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 23:21:28

Merci mon ami, bonne nuit :smiley:

Hello

I brought some small changes :slight_smile:


#=====

on run
	local lesFichiersExcel, theFolder, theExt, theName, thePathname, theOpenFile, uneSource, i
	
	# "com.apple.traditional-mac-plain-text" is the type identifier of the csv generated by the export scripts.
	# "public.comma-separated-values-text" is the type identifier of the csv generated by the export feature of Numbers.
	set lesFichiersExcel to choose file with prompt "Select the files to merge" of type {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls", "com.apple.traditional-mac-plain-text", "public.comma-separated-values-text"} with multiple selections allowed
	
	# Extract the container, the name, the name extension of the first item in the list of selected files
	tell application "System Events" to tell disk item (item 1 of lesFichiersExcel as text)
		set {theFolder, theName, theExt} to {path of container, name, name extension}
	end tell
	set theName to (text 1 thru -(1 + (length of theExt)) of theName) & "csv"
	
	# Define the name of the merged file
	display dialog "Define the merged file name" default answer theName
	set theName to text returned of result
	if theName does not end with ".csv" then set theName to theName & ".csv"
	
	# Select the folder in which the merged file will be saved
	choose folder with prompt "Location of the merged file" default location (theFolder as alias)
	
	# Build the pathname of the merged file
	set thePathname to (result as text) & theName
	
	# Applying close access in the error handler was a bad idea.
	# So I try to close access here in case of a file remaining open
	try
		close access file thePathname
	end try
	set theOpenFile to open for access file thePathname with write permission # EDITED
	set eof theOpenFile to 0
	
	# Treats the first source file keeping its first row
	set uneSource to item 1 of lesFichiersExcel
	tell application "System Events" to tell disk item (uneSource as text)
		type identifier
	end tell
	if result is in {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls"} then
		my treatAnExcelFile(uneSource, 1, theOpenFile)
	else
		my treatAcsvFile(uneSource, 1, theOpenFile)
	end if
	
	# Treat the other source file(s) dropping the first row
	repeat with i from 2 to count lesFichiersExcel
		set uneSource to item i of lesFichiersExcel
		tell application "System Events" to tell disk item (uneSource as text)
			type identifier
		end tell
		if result is in {"org.openxmlformats.spreadsheetml.sheet", "com.microsoft.excel.xls"} then
			my treatAnExcelFile(uneSource, 2, theOpenFile)
		else
			my treatAcsvFile(uneSource, 2, theOpenFile)
		end if
	end repeat
	try
		close access theOpenFile # EDITED
	end try
end run

#=====

# CAUTION, I removed a parameter so that both handlers use the same parameters !

on treatAnExcelFile(a_Pathname, start_Row, the_OpenFile)
	local delim, lastCol, lastRow, r, cellVal, rowStr, e, n
	character 2 of (0.5 as text)
	if result is "," then
		set delim to ";"
	else
		set delim to ","
	end if
	
	tell application "Microsoft Excel"
		activate
		try
			open a_Pathname
			tell active sheet
				set lastCol to count of columns of used range # of active sheet
				set lastRow to count of rows of used range # of active sheet
				
				repeat with r from start_Row to lastRow
					tell row r
						set cellVal to (value of cell 1) as text
						if cellVal contains delim then set cellVal to quote & cellVal & quote # ADDED
						set rowStr to cellVal
						repeat with c from 2 to lastCol
							set cellVal to (value of cell c) as text
							if cellVal contains delim then set cellVal to quote & cellVal & quote # ADDED
							set rowStr to rowStr & delim & cellVal
						end repeat
					end tell
					tell me to write rowStr & return to the_OpenFile as «class utf8» # EDITED
					(*
# Without the « tell me  » statement we would get this kind of behavior :

(1) write "blah;blah;blah
" to 1632 as «class utf8»
--> error number -1708
(2) «event ascrgdut»
--> error number -1708
(3) write "blah;blah;blah
" to 1632 as «class utf8»
--> error number -10004
end tell

tell current application 
write "blah;blah;blah
" to 1632 as «class utf8»
end tell

# Hardcoding the « tell current application  » statement, we spare the three operations issuing a non fatal error
# As I am lazy, I typed the shorter version « tell me »
*)
				end repeat
			end tell # active sheet
			
		on error e number n
			my ErrorHandler(e, n)
		end try
		close active workbook saving no
	end tell
end treatAnExcelFile

#=====

on treatAcsvFile(a_Pathname, start_Row, the_OpenFile)
	local rowStr, e, n
	# EDITED
	try
		paragraphs start_Row thru -1 of (read a_Pathname as «class utf8»)
	on error
		paragraphs start_Row thru -1 of (read a_Pathname)
	end try
	
	try
		repeat with rowStr in result
			# Here we aren't in a tell application . end tell block so there is no need for the « tell me  » statement 
			write (rowStr as text) & return to the_OpenFile as «class utf8»
		end repeat
	on error e number n
		my ErrorHandler(e, n)
	end try
end treatAcsvFile

#=====

on ErrorHandler(e_, n_)
	local sep
	# Chris Stone
	set sep to "------------------------------------------"
	my getLocalString("Finder", "PE26")
	item 1 of my decoupe(result, " ^0)")
	item 2 of my decoupe(result, "(")
	sep & return & my getLocalString("Finder", "NE86") & ": " & e_ & return & sep & return & result & ": " & n_ & return & sep
	tell application (path to frontmost application as string)
		display dialog result buttons {my getLocalString("Finder", "BN62"), my getLocalString("Finder", "AL3")} # default button 2 (or 1) as you want
	end tell
	# set fail to true
end ErrorHandler

#=====

on getLocalString(the_App, the_Key)
	tell application the_App to return localized string the_Key
end getLocalString

#=====

on decoupe(t, d)
	local oTIDs, l
	set oTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to d
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Yvan KOENIG (VALLAURIS, France) lundi 10 septembre 2012 10:31:38

Good morning Yvan :slight_smile:

What changes are they? :slight_smile:

Hello

Open both scripts side by side and compare the codes. :wink:

Yvan KOENIG (VALLAURIS, France) lundi 10 septembre 2012 10:42:36