I keep tripping on AS. Help with this script please

I’ve learned a lot from this site, and have received some great help from the forums and the tutorials. But there are some basic stuff that I keep getting caught on, and it is taking me forever to do seemingly simple things. If you can, please help me put the final touches on this script.

I have an Automator action that gets some .xml files and places them in ~/Desktop/gcal. The files’ names are all similar to: “P31625.xml”.

Now I have to convert these into an .ics format. Right now the script is working on each file separately, which doesn’t make sense. I should probably concatenate them and then do the editing. I looked at the ‘cat’ command, but I haven’t found how to pass the list of files to the command (syntax: cat file1.xml file2.xml > outputfile.xml).

The script that I have loops through each file, fixing the line endings to CRLF, then deleting a few tags that I don’t need. But there are things that I need to add to the file, and I can’t figure out how to get it to do that at a specified place. I have used Adam Bell’s excellent tutorial and key snippets to frankenstein some code to search for text within specified tags because some of the text needs to be duplicated (some URL links, for example).

The final file should be a combination of all the individual .xml files, but in the .ics format. I’ve included a short sample of the .xml file, a basic .ics file (line endings are CRLF), and the script I currently have (don’t laugh).

Current Applescript:


set myFolder to ((path to desktop as Unicode text) & "gcal")
tell application "Finder" to set theFiles to (every file of folder myFolder whose name extension is "xml")
set myFolder2 to "Xamego:Users:mlafleur:Desktop:gcal:" --this is needed for TextWrangler to open below

--probably combine all files here to avoid the Repeat loop:  cat file1.txt file2.txt > file1and2.txt
	#set joinCommand to "cat " & quoted form of theFiles & " > joined.xml"  --doesn't work
	#do shell script joinCommand

repeat with j from 1 to the count of theFiles
	tell application "Finder" to ¬
		set thisFilePP to quoted form of POSIX path of (item j of theFiles as alias)
	tell application "Finder" to ¬
		set thisName to (name of item j of theFiles)
	
	set theCommand to "sed -i .1bak -e 's/\"/\\\\\"/g' " & thisFilePP  --I think this is no longer needed given the Find/Replace below
	do shell script theCommand
	
	tell application "Finder" to ¬
		set myFile2 to myFolder2 & thisName as alias
	
	tell application "TextWrangler"
		open myFile2
		tell text 1 of text window 1
			set currentDoc to (read (test1))
			set ex1 to extractBetween(currentDoc, "<xls>", "</xls>") of me -- extract the URL
			set tURL to "http://www.cepal.org" & ex1
			#now find a way to insert this into the text, with \\r\\n
			
			set searchOptions to {search mode:literal, starting at top:true}
			replace "><" using ">\\r\\n<" options searchOptions
			------------
			#at begining of file add "BEGIN:VCALENDAR\\r\\nMETHOD:PUBLISH\\r\\nX-WR-TIMEZONE:America/Santiago\\r\\nPRODID:-//Apple Inc.//iCal 3.0//EN\\r\\nCALSCALE:GREGORIAN\\r\\nX-WR-CALNAME:CEPALTEST\\r\\nVERSION:2.0\\r\\nX-APPLE-CALENDAR-COLOR:#B027AE"  --how?
			---------
			#add "BEGIN:VEVENT\\r\\nSEQUENCE:0" before each event  --how?
			replace "<pagina>" using "" options searchOptions
			replace "<abstracto>" using "" options searchOptions
			replace "</abstracto>" using "" options searchOptions
			replace "<noticia>" using "" options searchOptions
			replace "<autor>1</autor>" using "" options searchOptions
			replace "<idioma>ES</idioma>" using "" options searchOptions
			replace "<Contacto>" using "" options searchOptions
			replace "</Contacto>" using "" options searchOptions
			replace "<agrupadores>" using "" options searchOptions
			replace "</agrupadores>" using "" options searchOptions
			replace "</noticia>" using "" options searchOptions
			replace "</pagina>" using "" options searchOptions
			replace "<?xml version=\\\"1.0\\\" encoding='ISO-8859-1'?>" using "" options searchOptions --update this
			
			replace "<Titulo>" using "DESCRIPTION:" options searchOptions
			replace "</Titulo>" using "" options searchOptions
			
			replace "<link>" using "UID:http://www.cepal.org" options searchOptions
			replace "</link>" using "" options searchOptions
			
			replace "</jdoid>" using "\\r\\nEND:VEVENT" options searchOptions --to end the event
			------------
			#need to add "END:VCALENDAR" at the EOF
			---------
		end tell
		
		save text document thisName
		close text window 1
		
	end tell
end repeat

to extractBetween(SearchText, startText, endText)
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to startText
	set endItems to text of text item -1 of SearchText
	set AppleScript's text item delimiters to endText
	set beginningToEnd to text of text item 1 of endItems
	set AppleScript's text item delimiters to tid
	return beginningToEnd
end extractBetween

I think I would take a different approach. This technique of trying to extract stuff from XML by hand seems like it would be error prone. I tried it using the XML support in the System Events application and this is what I ended up with:

set gcalPath to (path to desktop folder as Unicode text) & "gcal:"

set aFile to alias (gcalPath & "test0.xml")

set outputFilePath to gcalPath & "test0.ics"

set vEventLines to {}

--repeat for each XML file
tell application "System Events"
	set xmlFile to XML file (aFile as Unicode text)
	set noticia to XML element "noticia" of XML element "pagina" of contents of contents of xmlFile
end tell
-- repeat for each noticia element (are there multiple per file?)
extractNoticiaData(noticia)
set vEventLines to vEventLines & buildVEventLines(result)
-- end repeat
--end repeat

writeVEventsLines(vEventLines, outputFilePath)

--tell application "Aquamacs Emacs" to open file outputFilePath -- Just for my verification purposes

to extractNoticiaData(noticia)
	using terms from application "System Events"
		set titulo to value of XML element "Titulo" of noticia
		set fecha_descrip to value of XML element "fecha_descrip" of noticia
		set link to value of XML element "link" of noticia
		set ciudad to value of XML element "Ciudad" of noticia
	end using terms from
	{titulo:titulo, fechas:fecha_descrip, link:link, ciudad:ciudad}
end extractNoticiaData

to buildVEventLines(noticiaData)
	-- Parse dates
	set {startDate, endDate} to splitOn(fechas of noticiaData, " al ")
	set startYMD to convertDate(startDate)
	set endYMD to convertDate(endDate)
	set vEventLines to {"BEGIN:VEVENT", "SEQUENCE:0", "DESCRIPTION:" & titulo of noticiaData, "UID:http://www.cepal.org" & link of noticiaData, "TRANSP:OPAQUE", "URL;VALUE= URI:http://www.cepal.org" & link of noticiaData, "DTSTART:" & startYMD, "SUMMARY:" & titulo of noticiaData, "DTEND:" & endYMD, "LOCATION:" & ciudad of noticiaData, "END:VEVENT"}
end buildVEventLines

to convertDate(dt)
	set {d, m, y} to splitOn(dt, "/")
	return pad(y, 4, "0") & pad(m, 2, "0") & pad(d, 2, "0")
end convertDate

to pad(str, width, char)
	set prefix to ""
	repeat with i from length of str to (width - 1)
		set prefix to prefix & char
	end repeat
	prefix & str
end pad

to splitOn(str, sep)
	set {otid, text item delimiters} to {text item delimiters, {sep}}
	set parts to text items of str
	set text item delimiters to otid
	parts
end splitOn

to writeVEventsLines(vEventsLines, outputFilePath)
	set outputFileRef to open for access outputFilePath with write permission
	try
		set eof of outputFileRef to 0
		set allLines to {"BEGIN:VCALENDAR", "METHOD:PUBLISH", "X-WR-TIMEZONE:America/Santiago", "PRODID:-//Apple Inc.//iCal 3.0//EN", "CALSCALE:GREGORIAN", "X-WR-CALNAME:CEPALTEST", "VERSION:2.0", "", "X-APPLE-CALENDAR-COLOR:#B027AE"} & vEventsLines & {"END:VCALENDAR"}
		repeat with l in allLines
			set l to l & (ASCII character 13) & (ASCII character 10)
			write l to outputFileRef as «class utf8»
		end repeat
		close access outputFileRef
		true
	on error m number n from o partial result r to t
		close access outputFileRef
		error m number n from o partial result r to t
	end try
end writeVEventsLines

I am not completely happy with the code, but I thought I would post it as an example of an alternative approach. It is the first time I have used System Events’ XML support so I am not sure if I made any rookie mistakes with that part of the code. It seemed to worked for me with the sample data you included (I closed the noticia and pagina tags that seemed to lack closing tags).

Wow, there is a lot to chew there. I will have to go back to the drawing board and work on this code. I’ll see how far it gets me, but it looks promising and a lot less clunky that what I had. There is a lot here that I had never seen, and I’ll have to play with it to learn a bit.

Thanks a bunch, I’ll post an update after I get it working well.

What a great Christmas present. My wife will be happy that I will spend less time trying to get this working. Here is the final version of the script, with the Repeat loop to parse every file. Only thing I need to do in the future is script the addition/update in iCal, but that is easy to do. Thanks again!

set gcalPath to (path to desktop folder as Unicode text) & "gcal:"

set outputFilePath to gcalPath & "test0.ics"

set vEventLines to {}

set myFolder to ((path to desktop as Unicode text) & "gcal")
tell application "Finder" to set theFiles to (every file of folder myFolder whose name extension is "xml")
repeat with j from 1 to the count of theFiles
	tell application "Finder" to ¬
		set thisName to (name of item j of theFiles)
	set aFile to alias (gcalPath & thisName)
	
	tell application "System Events"
		set xmlFile to XML file (aFile as Unicode text)
		set noticia to XML element "noticia" of XML element "pagina" of contents of contents of xmlFile
	end tell
	extractNoticiaData(noticia)
	set vEventLines to vEventLines & buildVEventLines(result)
end repeat

writeVEventsLines(vEventLines, outputFilePath)

to extractNoticiaData(noticia)
	using terms from application "System Events"
		set titulo to value of XML element "Titulo" of noticia
		set fecha_descrip to value of XML element "fecha_descrip" of noticia
		set link to value of XML element "link" of noticia
		set ciudad to value of XML element "Ciudad" of noticia
	end using terms from
	{titulo:titulo, fechas:fecha_descrip, link:link, ciudad:ciudad}
end extractNoticiaData

to buildVEventLines(noticiaData)
	-- Parse dates
	set {startDate, endDate} to splitOn(fechas of noticiaData, " al ")
	set startYMD to convertDate(startDate)
	set endYMD to convertDate(endDate)
	set vEventLines to {"BEGIN:VEVENT", "SEQUENCE:0", "DESCRIPTION:" & titulo of noticiaData, "UID:http://www.cepal.org" & link of noticiaData, "TRANSP:OPAQUE", "URL;VALUE= URI:http://www.cepal.org" & link of noticiaData, "DTSTART:" & startYMD, "SUMMARY:" & titulo of noticiaData, "DTEND:" & endYMD, "LOCATION:" & ciudad of noticiaData, "END:VEVENT"}
end buildVEventLines

to convertDate(dt)
	set {d, m, y} to splitOn(dt, "/")
	return pad(y, 4, "0") & pad(m, 2, "0") & pad(d, 2, "0")
end convertDate

to pad(str, width, char)
	set prefix to ""
	repeat with i from length of str to (width - 1)
		set prefix to prefix & char
	end repeat
	prefix & str
end pad

to splitOn(str, sep)
	set {otid, text item delimiters} to {text item delimiters, {sep}}
	set parts to text items of str
	set text item delimiters to otid
	parts
end splitOn

to writeVEventsLines(vEventsLines, outputFilePath)
	set outputFileRef to open for access outputFilePath with write permission
	try
		set eof of outputFileRef to 0
		set allLines to {"BEGIN:VCALENDAR", "METHOD:PUBLISH", "X-WR-TIMEZONE:America/Santiago", "PRODID:-//Apple Inc.//iCal 3.0//EN", "CALSCALE:GREGORIAN", "X-WR-CALNAME:CEPALTEST", "VERSION:2.0", "", "X-APPLE-CALENDAR-COLOR:#B027AE"} & vEventsLines & {"END:VCALENDAR"}
		repeat with l in allLines
			set l to l & (ASCII character 13) & (ASCII character 10)
			write l to outputFileRef as «class utf8»
		end repeat
		close access outputFileRef
		true
	on error m number n from o partial result r to t
		close access outputFileRef
		error m number n from o partial result r to t
	end try
end writeVEventsLines

Browser: Safari 523.10
Operating System: Mac OS X (10.5)