Convert XLS files into CSV preserving UTF-8 encoding

Hi,

I often try to convert Excel files (xls, xlsx) into CSV files maintaining the accented and special characters.

The only way I can do it is to copy and paste the content into Numbers and Export it as CSV with UTF-8 encoding.

Is there a way to use AppleScript to convert Excel files into CSV files preserving the UTF-8 encoding?

Thanks

If the characters are all exporting OK as MacRoman, you could use something like this:

set fileRef to (open for access (choose file) with write permission)
set theData to read fileRef as text
set eof fileRef to 0
write theData to fileRef as «class utf8»
close access fileRef

Hello Shane

As the original file is described as xls or xlsx, your scheme will return a lot of garbage.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 09:18:29

You’re right, of course. I was thinking Excel .csv files, and assuming they encoded MacRoman correctly.

This seems to work but it breaks the column if it finds a comma, hmmm


tell application "Microsoft Excel"
    activate

    set outFile to (path of active workbook)
    set fileName to (name of active workbook)
    set outFile to (outFile & ":" & fileName & ".csv")
    set openFile to open for access file outFile with write permission
    set eof openFile to 0

    set lastCol to count of columns of used range of active sheet
    set lastRow to count of rows of used range of active sheet

    repeat with r from 1 to lastRow
        set rowStr to (value of cell r of column 1 of active sheet)
        repeat with c from 2 to lastCol
            set cellVal to (value of cell r of column c of active sheet)
            set rowStr to rowStr & "," & cellVal
        end repeat
        write rowStr & return to openFile as «class utf8»
    end repeat
    close access openFile    
end tell

Excel has a HUGE AppleScript dictionary (which I never used). Have you looked at it?
It has a ‘save as’ command, with a ‘file format’ property. Looks like it might be what you need.
You’d avoid composing the CSV formatted text in the script.

A comment on that scriptlet: it has read/write commands within an application tell block.
It’s considered ‘good practice’ not to do that. First gather all data from Excel, then do the writing.

Thanks, I tried doing the Save As and selecting UTF-8 but it will still spit out garbage
 Hmmm

How should the block above be without read/write inside the tell block?

And why is it breaking the column when it finds a comma? :slight_smile:

Thanks

Hi. :slight_smile:

Save Clean CSV file from Excel using Applescript - Stack Overflow

Darn


This method should work, did not test

set theText to {} -- gather output

tell application "Microsoft Excel"
	activate
	set outFile to (path of active workbook)
	set fileName to (name of active workbook)
	set outFile to (outFile & ":" & fileName & ".csv")
	
	set lastCol to count of columns of used range of active sheet
	set lastRow to count of rows of used range of active sheet
	
	repeat with r from 1 to lastRow
		set rowStr to (value of cell r of column 1 of active sheet)
		repeat with c from 2 to lastCol
			set cellVal to (value of cell r of column c of active sheet)
			set rowStr to rowStr & "," & cellVal
		end repeat
		-- what have we?
		log rowStr
		-- add this line to the list
		set end of theText to rowStr & return
		
	end repeat
end tell

-- list to text
set {TID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return}
set theText to theText as text
set AppleScript's text item delimiters to TID

-- write text
set openFile to open for access file outFile with write permission
set eof openFile to 0
write theText & return to openFile as «class utf8»
close access openFile

I added a log statement; check the events pane to see output lines as they are built. An empty cell should show up as an empty string.

Edit:

Ah, McUsr found a scriplet for Excel. Yes, garbage might be the result of ‘misspelled’ code


That works but does not keep UTF-8 encoding, all the extra characters are garbled :frowning:

This gives this error:
Can’t make file {missing value, “:”, missing value, “.csv”} into type «class fsrf».

Um, to which script does this apply??

It would help to know which line causes the error.
But it looks like Excel didn’t return the name & path of the workbook - which suggests there was no open workbook.
The result is a list, because not all parts of ‘outfile’ are text.

Sorry, it shows this line


set openFile to open for access file outFile with write permission

and gives this error


Result:
error "Can't make file {missing value, \":\", missing value, \".csv\"} into type «class fsrf»." number -1700 from file {missing value, ":", missing value, ".csv"} to «class fsrf»

Hello

Excel dictionary states :

save – Save an object.
reference – the object for the command
[as Unicode text] – The file type in which to save the data.
[in alias] – The file in which to save the object.

So I guess that an instruction like :


tell application "Microsoft Excel"
	activate
	tell active workbook
		set dossier to path & ":"
		set fileName to name
	end tell
end tell
tell application "System Events"
	name extension of disk item (dossier & fileName)
	set csvName to (text 1 thru -(1 + (length of result)) of fileName) & "csv"
	set csvPath to (dossier & csvName)
	if not (exists disk item csvPath) then
		make new file at end of folder dossier with properties {name:csvName}
	end if
end tell
tell application "Microsoft Excel"
	save active workbook as "file type for csv" in alias csvPath
end tell
 

would do the trick.
Alas I don’t know which is the string to use as file type for csv

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 12:29:00

Yes, that was the only possibility, in this case. Can only occur when no workbook is open: Excel will return ‘missing value’ for name & path.

Well
 I have Office 2004, and that entry is quite different - no “Unicode text” anywhere. 2004 seems not to know about text encoding.
Pendolo, what version yours? Regulus’s script presumably works with 2008.

Hello.

CSV Mac file format is the correct string from Excel 2008’s dictionary, try that!

Hopefully it works!

if not, we’ll use iconv on the darn file! :slight_smile:

McUsr reminds me a bit of Sarah Palin, if she had taken a sudden interest in scripting.

Hi. :slight_smile:

This saves an Exel worksheet as csv with utf8 encoding for me.

set fn to ((path to desktop folder as text) & "arb1.csv")
set pxn to quoted form of POSIX path of fn
do shell script "touch " & pxn
tell application "Microsoft Excel"
	save active sheet in fn as CSV Mac file format
end tell

Sarah Palin
 LOL :smiley:

Nop it still breaks the special characters

I’m testing with some Czech language, in one of the columns the sentence is:

And in the CSV it spits out this:

Thanks

On my side I wrote :


tell application "Microsoft Excel"
	activate
	tell active workbook
		set dossier to path & ":"
		set fileName to name
	end tell
end tell

tell application "System Events"
	name extension of disk item (dossier & fileName)
	set csvName to (text 1 thru -(1 + (length of result)) of fileName) & "csv"
	set csvPath to (dossier & csvName)
	if not (exists disk item csvPath) then
		make new file at end of folder dossier with properties {name:csvName}
	end if
end tell

tell application "Microsoft Excel"
	(*
	tell active workbook
		save workbook as filename csvPath file format CSV file format
	end tell
	*)
	save active sheet in csvPath as CSV file format
	close workbook 1 saving no
end tell

Alas mine and yours export a file which is perfectly treated when I open them with Numbers but, if I open them in Excel, I get odd results which you may see at :
https://www.box.com/s/tjsdv0gx38p8qlqckkra

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 14:15:19