Convert XLS files into CSV preserving UTF-8 encoding

Hmm I still get broken characters

Hello!

This is with the former iconv conversion to utf-8, I just assumed the conversion was done correctly earlier, as I opened a file with my accented characters, and they displayed correctly, in macroman. :slight_smile:


set {thp, thn} to {(path to desktop folder as text), "arb1.csv"}

set fn to (thp & thn)
set pxn to quoted form of POSIX path of fn
do shell script "touch " & pxn
tell application "Microsoft Excel"
	save active sheet in fn as CSV Mac file format
end tell
set newfn to uniqueFilename(thp, thn)
set newpxfn to quoted form of POSIX path of (thp & newfn)
do shell script "iconv -f macroman -t utf-8 " & pxn & " >" & newpxfn

to uniqueFilename(pthToFol, fileName)
	-- http://macscripter.net/edit.php?id=154515
	
	local fnend, mfile, msuf, i, numext, newname
	set fnend to (length of fileName) - (offset of "." in (reverse of characters of fileName as text))
	
	set mfile to text 1 thru fnend of fileName
	
	set msuf to text (fnend + 1) thru -1 of fileName
	
	set i to 0
	set numext to ""
	try
		repeat
			
			set tf to (pthToFol & mfile & numext & msuf)
			set asa to tf as alias
			set i to i + 1
			set numext to " " & i
		end repeat
		
	on error
		set newname to (mfile & numext & msuf)
	end try
	return newname
end uniqueFilename

For sure, it’s what I wrote :

Alas mine and yours export a file which is perfectly treated when I open them with Numbers but, if I open them in Excel, I get odd results which you may see at :
https://www.box.com/s/tjsdv0gx38p8qlqckkra

The created csv use macRoman encoding.
I opened it with TextEdit
Saved it as a text file using Utf8 encoding.
I renamed the text file as .csv
You may compare the contents here :
https://www.box.com/s/d6cuqbep8zigvnz725kj

Excel treats the modified file wrongly too.
Numbers treats the modified file correctly too.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 14:29:25

Ehehe same result with special characters replaced by _ :mad: :lol:

Hello McUsr

You build an Utf8 csv file but alas, Excel, oPenOffice and probably libreOffice open it flawlessly.

The value :

/Important/Æ’ Script utiles/Æ’ récupÚre chemin/récupÚre/récupÚreChemin.app

become

/Important/Æ’ Script utiles/Æ’ rĂƒÆ’Â©cupĂƒÆ’Âšre chemin/rĂƒÆ’Â©cupĂƒÆ’Âšre/rĂƒÆ’Â©cupĂƒÆ’ÂšreChemin.app

I repeat that given my practice, only Numbers open it correctly.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 14:59:17

Hi!

I think the macroman is an abbreviaton for western mac roman, I think Chech should use MACCENTRALEUROPE so please substitute that for macroman!

If that doesn’t work, look at This post. :slight_smile:

Hello Yvan. If you open it in Excel again then it may show the wrong result, as it expects macroman, since it saves in macroman, but try opening the csv file with text wrangler, using encoding utf-8, and you will see that it displays the result correct. :slight_smile:

I know that McUsr but csv files aren’t built to be opened in a Word Processor which is unable to display them in columns.
They are built to be opened in spreadsheet applications.

What is foolish (at least from my point of view) is the fact that when I export a spreadsheet from Numbers to Excel, Excel ask me which is the used encoding but it doesn’t ask when I export as csv. :rolleyes:

If the asker want to keep its local characters, he may work this way:

Use the script(s) which he prefer.
Open the resulting csv with Numbers which costs only 16ñ‚¬ then export from Numbers to Excel.
This way, he will have a correct Excel file.

I will write a script doing the trick.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 15:15:33

Would it be wrong to tweak this script? :slight_smile:

I don’t understand why it breaks the column when a comma is found


tell application "Microsoft Excel"
    activate

    set outFile to (path of active workbook)
    set fileName to (name of active workbook)
    set outFile to (outFile & ":" & fileName & ".csv")
    set openFile to open for access file outFile with write permission
    set eof openFile to 0

    set lastCol to count of columns of used range of active sheet
    set lastRow to count of rows of used range of active sheet

    repeat with r from 1 to lastRow
        set rowStr to (value of cell r of column 1 of active sheet)
        repeat with c from 2 to lastCol
            set cellVal to (value of cell r of column c of active sheet)
            set rowStr to rowStr & "," & cellVal
        end repeat
        write rowStr & return to openFile as «class utf8»
    end repeat
    close access openFile    
end tell

Or from programs like awk or perl, that use ascii data. I just wanted you to open it in text wrangler to see the encoding, nothing more. I have hunch that the op wants to process his csv-files with awk. :wink:

Hello.

Try inserting:

set AppleScript's text item delimiters to "" 

on top of your script, and see if that helps :wink:

Nop, still breaks at the comma :smiley:

Would be fine to know exactly what the asker wish to do with the file.

Here is the announced script which convert a csv file generated by one of our scripts into an Excel file treating correctly the non ASCII characters :


set csvFile to choose file of type {"public.comma-separated-values-text"}
tell application "System Events"
	tell disk item (csvFile as text)
		set {ext, fileName, dossier} to {name extension, name, path of container}
	end tell
	set xlsName to (text 1 thru -(1 + (length of ext)) of fileName) & "xls"
	set xlsPath to (dossier & xlsName)
	if not (exists disk item xlsPath) then
		make new file at end of folder dossier with properties {name:xlsName}
	end if
end tell

tell application "Numbers"
	open csvFile
	save document 1 in (xlsPath as alias)
	close document 1 without saving
end tell

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 15:38:49

That is all it takes, and 16ñ‚¬ :smiley:

Now, if the op inserts MACCENTRALEUROPE for MACROMAN or CP1250 if that doesn’t work, his problems should be solved.

I come to think of it, that if he meets a comma, and the values are comma separated, then this may lead to problems, , may he is better off using “;” for “,” when generating the csv.

He can choose his field separator in awk anyways. with FS=“;”.

Sorry guys, maybe I didn’t make clear enough what I need to do.

I often get Excel files that contain the same text in different languages, so I need to merge them to create an import file that needs to be CSV UTF-8.

So the workflow would be:
1 “ Select files to merge
2 - If they are xls|xlsx files convert them to UTF-8 csv
3 - Type output filename and select output folder
4 - Merge keeping the header row only of the first file

Point 4 can be achieved using the following script; points 1, 2 and 3 are still a question mark eheh :slight_smile:

set theFiles to choose file with multiple selections allowed
set fileName to text returned of (display dialog "Type file name:" default answer ".csv")
set saveFile to POSIX path of (((path to desktop) as text) & fileName)

do shell script "cp " & quoted form of POSIX path of item 1 of theFiles & space & quoted form of saveFile
if (count of theFiles) = 1 then return
repeat with fileToAdd in theFiles
	set x to quoted form of POSIX path of fileToAdd
	do shell script "cat " & x & " | awk 'BEGIN {RS=\"\\r\\n\"; getline}{print $0}' >> " & quoted form of saveFile
end repeat

(1) I clearly wrote the cost in my precedent message.

(2) I made my tests running the system in French so the csv files were using the semi-colon as delimiter :slight_smile:

I also tried to replace the semi-colons by TAB characters and save the edited file as xyz.txt.

Excel import it wrongly too.
It seams that Microsoft didn’t fully understand what the switch to Unicode imply.

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 15:54:20

What are you wanting to do with the merged CSV file ?

If you want to open it with Excel, you will always get odd results.

Exporting the csv as xls with Numbers do the wanted trick (at least with the encoding used by Excel in French).
I’m not sure with the one used for Czech.

If you really need to open the final file in Excel, I will try to get a link allowing you to download the trial version of iWork '09 (Apple removed it from its servers).

[b]You may get the trial version thru :

https://discussions.apple.com/message/19118374

the wanted link is in Peggy’s message dated 31 juil. 2012 18:33[/b]

Yvan KOENIG (VALLAURIS, France) samedi 8 septembre 2012 16:01:34

Hello!

Try accumulating all your output in one string pendolo, that you write out when your reading is done, and see if that helps!

Enjoy your day, I’m out! :slight_smile:

it needs to be imported into a content management system, it doesn’t need to open in Excel. Only some some source files are received in xls so they need to be converted into csv.

No need to open the final file in Excel

Opening the file in Numbers, which I have, and exporting works, but I want to automate this with a script, because there are from 10 to 30 files to process every time.

:slight_smile:

I’m using this file as a test, can you try it with this?

http://www.sendspace.com/file/hq55j9

Hello!

After some testing I found this to work!

tell application "Microsoft Excel"
	activate
	
	set outFile to (path of active workbook)
	set fileName to (name of active workbook)
	set outFile to (outFile & ":" & fileName & ".csv")
	try
		set openFile to open for access file outFile with write permission
		set eof openFile to 0
		
		set lastCol to count of columns of used range of active sheet
		set lastRow to count of rows of used range of active sheet
		
		repeat with r from 1 to lastRow
			set cellVal to (value of cell r of column 1 of active sheet) as text
			if cellVal is not "" then
				if text 1 of cellVal as text is not "," then
					set rowStr to cellVal
				else
					set rowStr to "0" & cellVal
				end if
			else
				set rowStr to ""
			end if
			repeat with c from 2 to lastCol
				set cellVal to (value of cell r of column c of active sheet) as text
				if cellVal is not "" then
					if text 1 of cellVal is not "," then
						set rowStr to rowStr & "," & cellVal
					else
						set rowStr to rowStr & "," & "0" & cellVal
					end if
				else
					set rowStr to rowStr & ","
				end if
				
			end repeat
			write rowStr & return to openFile as «class utf8»
		end repeat
		close access openFile
	on error e number n
		
		-- Chris Stone
		set {cr, sep} to {return, "------------------------------------------"}
		set errmsg to sep & cr & "Error: " & e & cr & sep & cr & "Error 
		Number: " & n & cr & sep
		tell application "SystemUIServer"
			activate
			display dialog errmsg
		end tell
		set fail to true
	end try
end tell