Converting 1800+ CSV files to HTML via Applescript and Excel

I have 1800+ CSV contact files that I must convert to HTML files in Excel. The orginal source files vary widely in length and structure, but I have been able to structure them using TextWrangler and a few other programs.

To date, only by using Excel’s conversion engine can I automatically get the superior consistency and error-tolerance I need to process what are poorly-formed and inconsistent CSV files into well-formed HTML output. With it, I can then will run each file through a really nifty program called Anthracite by Metafy in order to export the tables into arrays. These arrays will then import into Filemaker or some other database for final normalization. A better technical person than I could find a better way to do this, but I know this will work if I can

But of course I have hit the limit of operator knowledge. I’m an analyst, not a programmer. I’ve been working on this stuff for weeks.The learning curve has been steep, not the least of which is due to the fact I’ve had to use a bunch of different new tools to get to where I’m at now. Upstream there was a bunch of preprocessing that had to happen just to convert my old files into this format and get rid of the unnecessary tags and other information, but that’s another story. There may be other ways to do this, but I am focused on doing it this way, and have invested huge hours to get it 95% done. It’s the final 5% that’s killing me.

I have cobbled together a little Applescript that will automate the process of opening the CSV file in Excel 2004, then converting and saving it as an HTML file. It works beautifully, as simple as it is. However, the AppleScript as written will only convert one file at a time. I need a way to have this script loop through the chosen directory, choose each successive file, convert it, and go back to the next in line in the same folder. I’m sure its a simple fix, but I can’t find any reference or example to help me chunk through this directory one file at a time. I’ve been scouring for reference for this but just can’t fnd it.

Here’s the entirety of my script, to date. I hope those knowledgeable folks out there will understand just getting to this point is harder than it looks for a non-scripter like me. And in fact, I got this far only by scouring news groups and picking up tips here and code left there by extremely talented and generous people whose prior work allowed me to finally write this script for Excel 2004 using OS X 10.4.7:

set the_file to “Users:mynameesktop:TESTfolder:Uniquecontactdata 1.csv”
tell application “Microsoft Excel”
activate
open the_file
save active workbook in the_file & “_new.html” as HTML
close active workbook without saving
end tell

I need to revise the AppleScript in order to have it run through, alter, re-name and save-as all the files to HTML.

It may be important to note that I have 1800+files, sequentially numbered “1.csv” through “2200.csv”. Because there are numbers missing from this sequence, my gut tells me I also need some type of “auto skip if file not present” function and/or auto error-handling.

Can anyone help me in this quest? Any help of any sort would greatfully appreciated.

Thanks !

Hello

I hope that this old skeleton may help.

–[Script Enregistre à la source]

(*

Yvan KOENIG
*)
property english : true

property msg99 : “” – globale

if english is true then
set _ to choose file with prompt “Choose the file(s)” with multiple selections allowed (* of type {“CWGR”, “CWWP”, “CWSS”, “CWDB”, “CWPT”, “CWPR”} )
else
set _ to choose file with prompt “Choisir le(s) fichier(s)” with multiple selections allowed (
of type {“CWGR”, “CWWP”, “CWSS”, “CWDB”, “CWPT”, “CWPR”} *)

end if

open (_ as list)

– ================

on open (sel)

if msg99 = "" then my prepareMessages()

tell application "Finder"
	try
		set sel to entire contents of folder sel (* utile si on a déposé un dossier
		¢ useful if we dropped a folder *)
	end try
	
	repeat with elem in sel
		set laClasse to (class of elem)
		if (laClasse is not folder) and (laClasse is not application file) then my TraiteLeFichier(elem)
	end repeat
	
end tell -- to Finder

end open

– ================

on TraiteLeFichier(zefile_)
try

	(* replace my code by yours *)
	
	-- from HERE
	
	tell application "Finder" to set leDossier to (container of zefile_) as text
	
	tell application "AppleWorks 6"
		open zefile_
		set leNom to name of document 1
		if leNom does not end with ".cwk" then ¬
			set leNom to leNom & ".cwk"
		try
			tell document 1 to save document 1 in file (leDossier & leNom)
			(* enregistre dans le dossier Documents,

¢ save in the “Documents” folder *)
end try
close document 1
end tell – to AppleWorks

	-- thru HERE
	
on error MsgErr number NroErr
	if NroErr is not -128 then
		beep 2
		tell application (path to frontmost application as string) to ¬
			display dialog "" & NroErr & " : " & MsgErr with icon 0 ¬
				buttons {msg99} giving up after 20
	end if
	return
end try

end TraiteLeFichier

– ================

on prepareMessages()
if english is true then
set msg99 to " Oops "
else
set msg99 to " Vu "
end if
end prepareMessages

–[/SCRIPT]

Yvan KOENIG (from FRANCE mercredi 20 septembre 2006 17:18:08)

Yvan

Thank you very much for your contribution. I’m afraid my Applescript skills are so limited that I have been struggling to integrate my Excel script into your looping solution. This morning another source has provided me the following script, the syntax and logic of which are better matched to my limited skills. However, I cannot make this script integrate with my Excel conversion module either. The problem may have something to do with the need to set and/or open the first file in a directory as an alias, which my Mac currently is refusing to do. There is also lots of chatter on the various newsgroups and forums about the word “item” being troublesome in Applescript. I am running down both of these issues and will report back to update you.

In the meantime, here is the most recent non-working version of that script:

tell application “Finder”
set theFolder to folder “Users:iamtheuser:Desktop:TESTfolder” --put your folder path in the quotes
set theNumber to N (the number of items in theFolder)
repeat with i from 1 to theNumber
set theFile to the name of item i of theFolder
tell application “Microsoft Excel”
activate
open the_file
save active workbook in the_file & “_new.html” as HTML
close active workbook without saving
end tell
end repeat
end tell

Thank you again for your help, and I will return to update you on the progress I make.

Rob @ decisioninformatics

Hello

Here is the message I posted to respond to the same question on an other forum:

–[Script CSV2HTML]

(*

Yvan KOENIG
21 septembre 2006
*)
property english : true

property msg99 : “” – globale

if english is true then
set _ to choose folder with prompt “Choose the folder storing files to ‘repair’”
else
set _ to choose folder with prompt “Choisir le dossier contenant les fichiers à corriger”
end if

tell application “Finder” to set laListe to every file in folder _

open (laListe)

– ================

on open (listeFichiers)

if msg99 = "" then my prepareMessages()
tell application "Finder"
	repeat with leFichier in listeFichiers
		set laClasse to (class of leFichier)
		if (laClasse is not folder) and (laClasse is not application file) then my TraiteLeFichier(leFichier)
	end repeat
end tell -- to Finder

end open

– ================

on TraiteLeFichier(the_file)

try
set tf to the_file as text
if tf ends with “.csv” then set tf to text 1 thru -5 of tf
(* removes the .csv extension *)

tell application “Microsoft Excel”
activate
open the_file
save active workbook in tf & “_new.html” as HTML
close active workbook without saving
end tell

on error MsgErr number NroErr
if NroErr is not -128 then
beep 2
tell application (path to frontmost application as string) to ¬
display dialog “” & NroErr & " : " & MsgErr with icon 0 ¬
buttons {msg99} giving up after 20
end if
return
end try
end TraiteLeFichier

– ================

on prepareMessages()
if english is true then
set msg99 to " Oops "
else
set msg99 to " Vu "
end if
end prepareMessages

–[/SCRIPT]

You may use it from the Script Editor, it will ask you to choose the folder containing your files.
You may save it as an application and run it. It will also ask you to choose.
Last not least, when saved as an application, you may drag and drop the folder’s icon on the script’s icon and it will do its duty.

CAUTION

As it is written, the script removes the “.csv” extension from the file path before adding “_new.html”.

Yvan KOENIG

The original poster wants to save as HTML. In Excel X, I don’t see that in the dictionary; ‘as’ doesn’t include HTML and there is no other form. This may require GUI scripting the File menu to select “Save as Web Page” if the OP is using Excel X (rather than 2004, which I don’t have). This script, like Yvan’s, will open the file in Excel and save the workbook in a separate folder (I called ‘htmls’), but it doesn’t convert them.

Is there an Excel dictionary entry for saving as web page in the latest version?

set theFolder to ((path to desktop as text) & "TESTfolder") as alias --put your folder path in the quotes
tell application "Finder"
	set theFiles to files of theFolder
	set fileCount to count theFiles
	repeat with i from 1 to fileCount
		set tName to (path to desktop folder as text) & "htmls:" & (name of item i of theFiles) & "_new.html"
		-- doesn't remove the ".csv" extention
		tell application "Microsoft Excel"
			Activate
			Open (item i of theFiles) as text
			Save document 1 In tName
			Close document 1 without saving
		end tell
	end repeat
end tell

Hello

The piece of code included in my script was the one passed by Decisioninformatics in a message he sent on AppleScript-user@lists.apple.com:

[i]My core processing script is below. I hope those knowledgeable folks out
there will understand just getting to this point is harder than it looks for
a non-scripter like me. And in fact, I got this far only by scouring news
groups and picking up tips here and code left there by extremely talented
and generous people whose prior work allowed me to finally write this script
for Excel 2004 using OS X 10.4.7:

set the_file to “Users:mynameesktop:TESTfolder:Uniquecontactdata 1.csv”
tell application “Microsoft Excel”
activate
open the_file
save active workbook in the_file & “_new.html” as HTML
close active workbook without saving
end tell[/i]

As I don’t own this Microsoft product, I simply encapsulated it in a droplet skeleton.

Yvan KOENIG

We’ll have to wait for him, Yvan.

Yvan and Adam:

I’m fortunate you guys are willing to help me on this. Thank you for your assistance.

I am using Excel 2004, Adam, and it does perform the necessary function on a standalone basis with the scriptlet I provided earlier, which Yvan refers to. Yes Excel 2004 allows for saving files to HTML. I understand from various lists that the Excel X Applescripting functionality is much different from E2004; E2004 supposedly maps almost 1:1 to VisualBasic. (And now, MSFT has announced it will no longer support VBA in the new Mac Office products.)

Anyway.

I am trying to get your script to function on my setup. The stopper right now is that Excel wants to open the .DS_Store file, and doesn’t know what to do with it. Neither do I. It is currently marked as “invisible” but is clearly visible in the folder structure. I don’t know how to make it go away yet. Seems like I should be able to put some global filter on the system directory services, or, filter out by file types at the top of the Applescript, or otherwise filter out invisibles and any other file formats than CSVs. E2004 offers something like this but I don’t know how to implement it, or if it is the right solution.

I will be working on this later tonight and all day tomorrow if need be. I will post more information here as it becomes available. Thank you again for your support in this. (Is it always this difficult, or is it, as they say, “operator error”? lol.)

(By the way, Yvan is correct about my posts on other boards. This is the first time I’ve ever used boards in this manner, or at all, really, and didn’t realize I may have been committing a faux pas with multiple posts. I have had input from several people, and I will do my best to try to keep all of them updated on this effort, including you. And I will gladly share everything I know and learn with the people who are helping me. I hope I can pull it together with your help and do something of value to share with you. thanks again.)

Rob at Decisioninformatics (-at-) gmail.com

Perhaps something like this then:


set theFolder to ((path to desktop as text) & "TESTfolder") as alias
set theDestination to (path to desktop folder as text) & "htmls:"

set theFiles to list folder theFolder without invisibles
set fileCount to count theFiles
repeat with i from 1 to fileCount
	tell application "Finder" to set tName to theDestination & (name of item i of theFiles) & "_new.html"
	-- doesn't remove the ".csv" extention
	tell application "Microsoft Excel"
		Activate
		Open (item i of theFiles) as text
		Save document 1 In tName As HTML
		Close document 1 without saving
	end tell
end repeat

Untested because I don’t have 2004.

Hello

Removing the invisible files is of good practice but if the folder contains files which Excel can’t open it the script will faill too.

In the revised version I added a filter ignoring every file whose name extension is not “csv”.
With such a filter we may assume that all will works fine.
If you don’t agree with my new name syntax you may remove the line which strips the name extension.

--[script CSV2HTML]

(*

Yvan KOENIG
21 septembre 2006
  *)
property english : true

property msg99 : "" -- globale

if english is true then
	set _ to choose folder with prompt "Choose the folder storing files to 'repair'"
else
	set _ to choose folder with prompt "Choisir le dossier contenant les fichiers à corriger"
end if

tell application "Finder" to set laListe to every file in folder _

open (laListe)

-- ================

on open (listeFichiers)
	
	if msg99 = "" then my prepareMessages()
	tell application "Finder"
		repeat with leFichier in listeFichiers
			set laClasse to (class of leFichier)
			if (laClasse is not folder) and (laClasse is not application file) and (name extension of leFichier is "csv") then my TraiteLeFichier(leFichier)
		end repeat
	end tell -- to Finder
end open

-- ================

on TraiteLeFichier(the_file)
	try
		set tf to the_file as text
		if tf ends with ".csv" then set tf to text 1 thru -5 of tf (* you may remove it if you want to keep the ".csv" in the name*)
		(* removes the .csv extension *)
		
tell application "Microsoft Excel"
activate
open the_file
save active workbook in tf & "_new.html" as HTML
close active workbook without saving
end tell        

	on error MsgErr number NroErr
		if NroErr is not -128 then
			beep 2
			tell application (path to frontmost application as string) to ¬
				display dialog "" & NroErr & " : " & MsgErr with icon 0 ¬
					buttons {msg99} giving up after 20
		end if
		return
	end try
end TraiteLeFichier

-- ================

on prepareMessages()
	if english is true then
		set msg99 to " Oops "
	else
		set msg99 to " Vu "
	end if
end prepareMessages

--[/SCRIPT]

Yvan KOENIG (from FRANCE vendredi 22 septembre 2006 13:51:11)

Adam, and Yvan, thanks again for your help. Comments on Adam’s two most recent posts follow below; I hope to attempt to implement Yvan’s solution in the next hour and will update under separate post.

Adam, the script below named in your honor directs Excel to attempt to open the .DS_Store file and then both Excel and Applescript choke on the result. I am heartened by the fact that Excel is responding properly by trying to open a file, but don’t know how to filter out the invisibles as stated earlier.

SSS.AdamBell092106.scpt

set theFolder to ((path to desktop as text) & “testcopy2”) as alias --put your folder path in the quotes
tell application “Finder”
set theFiles to files of theFolder
set fileCount to count theFiles
repeat with i from 1 to fileCount
set tName to (path to desktop folder as text) & “htmls:” & (name of item i of theFiles) & “_new.html”
– doesn’t remove the “.csv” extention
tell application “Microsoft Excel”
activate
open (item i of theFiles) as text
save document 1 in tName
close document 1 without saving
end tell
end repeat
end tell

Adam’s follow-up script below returns the error “can’t get the name of “file1.csv”.” and highlights in grey tint the word name in "(name of item i…). Now, File1.csv is the actual name of the first file, so Finder is actually finding that name (and skipping the invisibles) but doesn’t seem to know what to do with it after that point.

SSS.AdamBell092206.scpt

set theFolder to ((path to desktop as text) & “testcopy2”) as alias
set theDestination to (path to desktop folder as text) & “htmls:”

set theFiles to list folder theFolder without invisibles
set fileCount to count theFiles
repeat with i from 1 to fileCount
tell application “Finder” to set tName to theDestination & (name of item i of theFiles) & “_new.html”
– doesn’t remove the “.csv” extention
tell application “Microsoft Excel”
activate
open (item i of theFiles) as text
save document 1 in tName as HTML
close document 1 without saving
end tell
end repeat

I am going to try to run down these errors over the next few hours (it’s 1040am EST here) and also try to implement Yvan’s script again. I will check back here after noon EST.

Thanks again for your help, Adam and Yvan.

Rob at DecisionInformatics at gmail dot com

Model: Powerbook G4 1.5ghz
AppleScript: 1.10.7
Browser: Safari 419.3
Operating System: Mac OS X (10.4)

This works for me (except the As HTML part - fix that to the proper form):


set theFolder to ((path to desktop as text) & "TESTfolder") as alias --put your folder path in the quotes
tell application "Finder" to set theFiles to (files of theFolder)
set fileCount to count theFiles
repeat with i from 1 to fileCount
	set fName to text 1 thru -5 of ((name of item i of theFiles) as text)
	if fName does not start with "." then
		set tName to (path to desktop folder as text) & "htmls:" & fName & "_new.html"
		tell application "Microsoft Excel"
			Activate
			Open (item i of theFiles) as text
			Save document 1 In tName As HTML -- fix this line to proper format for Excel 2004
			Close document 1 without saving
		end tell
	end if
end repeat

The problem with the last script was that I forgot that getting the files as list returns the names, not references.

Gentlemen, the results are in.

Yvan, I copied your script into my ScriptEditor, and it ran flawlessly, the first time, no changes required. It processed the first batch of 1886 files in about 22 minutes, or about 85 files per minute on my G4 1.5ghz Powerbook. That work would have taken me dozens of hours to try to perform manually (as well as repetitive motion injuries to my fingers). As it was, it took me dozens of hours to get to this solution, thanks to the work of you, Adam and several others, but now it is repeatable. I like the fact it has stylish Gallic comments throughout, which I don’t fully understand but which look very cool. I am a bit depressed that the solution seems so very far above my abilities; you must have been working in Applescript for some time, or have a natural talent for it. Thank you very much for your assistance in this effort. This is a very big win. (Please let me know if you would approve of me reposting your script on the other two boards I have been posting to – AppleInsider and Apple’s applescript mailing list. If you approve, I would like to attribute it to you as you have written in the script itself. Please let me know your thoughts on this.)

Adam, I continued to try your script revs for some time, but there were multiple errors in retrieving the file name, counting, etc. I admire the compactness and simplicity of your approach, and I will post here again tomorrow to show you the errors I was receiving.

I will also post some more information about how I intend to use the data Yvan’s script is helping me manage. Basically, I am creating an interactive visualization using the data to show the distribution and location of the companies in the contact database. It is driven by a program called iMap, which is developing into a great tool for me – now more so because I will be able to use my data to drive it. I’m also exploring the use of Google maps, which provides some similar and different features.

Thank you both again, and I hope you will both email your contact information to me at decisioninformatics (at) gmail (dot) com.

Check back tomorrow for more updates on this project.

Rob at DecisionInformatics at decisioninformatics (at) gmail (dot) com.

No need to repair my version Rob if Yvan’s is doing the task perfectly for you, though I’m certainly willing to help if you want to understand what’s going on.

My version labours under the handicap of being tested using Excel X, so the Excel portion of Yvan’s won’t compile on my machine. I also have XSpell.osax installed for which ‘english’ is a dictionary word and Yvan uses it as a property name, so that fails unless I change the name of the property everywhere. Though I could easily fix that, I can’t do anything about the Excel part. With the ‘as HTML’ part of the Excel portion of mine commented out, the rest runs perfectly for me, identifying the .csv files, counting them, opening them one after the other in Excel X and saving them in a separate folder with the new name, so I probably have some other conflict or you do. Mine assumes that all the .csv files are in a folder on the desktop, Yvan’s makes you choose it, so the simplest possible test would be to comment out my first line and substitute a choose folder for it.

Contact info for both of us is in the column on the left - just click Email. In addition, vis-a-vis this board, I can always be reached at adam at macscripter dot net.

Finally, keep on plugging with AppleScript. I wrote my first “hello, world” script just over 18 months ago and didn’t really know why more complex things didn’t work. This board helped immensely. Once you get the ‘hang’ of AppleScript you learn to ‘think’ in it, but I think the biggest hurdle is learning to read dictionaries and in spite of the crummy explanations there, begin to understand them. When you really get tangled, folks like Nigel Garvey and Kai (and others I don’t mean to insult by leaving them out) understand in depth what’s happening - they are the ‘scholars’ of AS.

Hello

I will respond to two messages.

  • 1 - Of course, you may post my script to the two other forums where you asked your original question.
    I add that most often, with this kind of script, I use the first entry point (the one opening a dialog) only during the tests. When finished, I uses it as a droplet which, from my point of view, if easier.

  • 2 - Thank you Adam, it’s the first time that I am alerted about the conflict between the “english” property name.
    I posted many scripts with it but no comment so I was unaware of this oddity.
    In new scripts, I will replace it by englishMsg.
    Like you, I didn’t compile the Excel code.
    When writing the script, I commented the Excel block. Our friend wrote that he worked in the single file version so I had no reason to doubt what he said.

Now I add comments which are not responding to any question.
If you wish that the script treats also files stored in subfolders of the main directory, it’s quite easy to add this feature.
In fact the initial skeleton was doing this kind of things.

Yvan KOENIG (from FRANCE samedi 23 septembre 2006 13:35:23)