Writing HTML with international characters

Hi,

My iTunes AppleScript (using AppleScript 2.0.1 under Mac OS X 10.5.5) writes out a list of my albums to an HTML file.

My code includes all the necessary HTML tags to format the page properly.

However, when I bring the page up in Safari (3.1.2), any albums with international characters (e.g., special Spanish characters like é, á, ñ, etc.) have those characters appear as strange symbols instead.

I have come to understand that this has to do with the AppleScript 2.0 text encoding (Unicode: utf-16) as opposed to the older ISO 8859-1 text encoding apparently expected as a default by browsers (at least relative to my Automatic locale configuration on my iMac).

I tried writing the text out as «class utf8», but that did not do the trick.

Can anyone point me to AppleScript code to convert text (utf-16) to ISO 8859-1 when writing to a text file (e.g., HTML page)?

Thanks in advance,

Steve

Hi,

have you also changed the encoding of your html code to UTF-8? ISO-8859 is outdated

No, I did not. I have not specified any encoding for all the pages I create. ISO 8859-1 may be outdated, but it still seems to be the default for most browsers.

In any case, can you tell me the proper modern way to define the encoding in an HTML page. I am not sure I should use the META tag method.

Thanks for the quick reply :slight_smile:

All modern browsers can read utf-8 encoded pages automatically if the encoding is specified in the header
I’m not a HTML specialist, but the HTML-code could look like this

[code]

[/code]

That did the trick!

Thanks so much, Stefan!

I will change all my HTML-writing scripts to comply to the newer encoding :slight_smile:

sbondi, if it helps any…

I’ve just posted an “iTunes Library Lister” app that does similar work to your project. If you dig into its applescript you’ll see how to write your output files as either mac-roman (say, for a text editor or further processing that doesn’t like unicode) or as utf-8 (for web pages).

Shameless plug – iTLL uses perl for the heavy-lifting, so it’s fast; listing 9 fields for each item of a 10,000 item library in HTML format takes about 6 seconds. Maybe you’ll find something in it you can use for your work…

Hi S2,

Wow, yours does most of what mine does … and it sounds like it is super fast! I guess this would be the case since you solicit Perl to help you read right from the source (the iTunes XML file) and I go the slower way through the iTunes API.

Thanks for the info,

Steve

BTW, you may wanna check your link, since it comes up with an error on your page. Note: I ignored the error, then just clicked Downloads then iTunes scripts and eventually found it :slight_smile:

FWIW, there’s a sample script, iTunes_playlist_to_html.py, included in the appscript source distribution that renders the names, artists, albums and durations of 10,000 tracks in about 3 seconds on an MBP, so I don’t think your bottleneck is iTunes scripting interface. More likely causes of poor performance:

  1. Iterating over large numbers of elements and getting their properties one at a time will be extremely slow; instead, ask for properties of all elements at once, e.g.:
set {trackNames, trackArtists, trackAlbums} to {name, artist, albums} of every track of current playlist
  1. Your own code is inefficient, for example, building up a large string via concatenation is much slower than appending each substring to a list and then coercing that list to a string afterwards.

  2. The AppleScript interpreter is dog slow at pretty much everything except sending commands to scriptable applications. Some bottlenecks can be ameliorated, e.g. there are well-known hacks for making it iterate over lists in linear instead of quadratic time, but if you’ve a lot of data to crunch and performance is important then either find an application or shell utility to push the work on to, or use a faster language.

Hi hhas,

Thanks for that info!

FYI, my script performance breaks down as follows:

a. 20 seconds

  1. loop through 9,500 tracks
  2. gather 4 properties of the track field
  3. create ~80-character sort string
  4. QSort the 9,500 strings (I think this takes about 10 seconds)
    b. 60 seconds
  5. loop through 9,500 tracks
  6. gather 10 properties of track field
  7. create a playlist for each of my 700 albums
  8. conditionally create special playlists for “tracks requiring attention” (e.g., missing artwork, more than 1 artwork)
  9. write out 2 HTML files (one as an album title summary and another with album/track details)

I will review my code for any inefficiencies based on your suggestions.

  1. “Iterating over large numbers of elements and getting their properties one at a time will be extremely slow; instead, ask for properties of all elements at once”

I had thought about this, and decided on first loading the desired properties into a list; e.g., the statement for task b.2 above is:

	tell application "iTunes"
		-- put fields into temporary list for ideally better performance with all "gets" in one statement
		set trackItemFields to {¬
			(album artist of track trackIndex of library playlist 1 as text), ¬
			(album of track trackIndex of library playlist 1 as text), ¬
			(bondi's addLeadingZeros(track number of track trackIndex of library playlist 1 as integer, 1) as text), ¬
			(name of track trackIndex of library playlist 1 as text), ¬
			(artist of track trackIndex of library playlist 1 as text), ¬
			(genre of track trackIndex of library playlist 1 as text), ¬
			(bondi's addLeadingZeros(year of track trackIndex of library playlist 1 as integer, 3) as text), ¬
			(time of track trackIndex of library playlist 1 as text), ¬
			(date added of track trackIndex of library playlist 1 as date), ¬
			(count artworks of track trackIndex of library playlist 1)}
	end tell

Do you think this addresses the “get all elements at once”?

  1. “Your own code is inefficient, for example, building up a large string via concatenation is much slower than appending each substring to a list and then coercing that list to a string afterwards.”

Although I have no large strings and put some thought into efficiencies, inefficent code is definitely possible, especially with my being new to AppleScript.

  1. “The AppleScript interpreter is dog slow at pretty much everything except sending commands to scriptable applications. Some bottlenecks can be ameliorated, e.g. there are well-known hacks for making it iterate over lists in linear instead of quadratic time, but if you’ve a lot of data to crunch and performance is important then either find an application or shell utility to push the work on to, or use a faster language.”

I think AppleScript performance will be acceptable for most of my Mac scripting needs. Especially if I focus on efficiencies :slight_smile:

Thanks again,

Steve

Sorry, all coercions in your script are completely useless !!
Properties like name, artist, genre, time are text anyway, date added is date, track number is integer, there is no coercion required at all.
And I guess, that your addLeadingZero handler returns also text


tell application "iTunes"
	-- put fields into temporary list for ideally better performance with all "gets" in one statement
	tell track trackIndex of library playlist 1
		set trackItemFields to {album artist, album, bondi's addLeadingZeros(track number, 1), ¬
			name, artist, genre, bondi's addLeadingZeros(year, 3), time, date added, count artworks}
	end tell
end tell

Thanks again, Stefan.

I had suspected it was a bit of overkill, but I was uncomfortable with how the data types were represented in the iTunes structures. From the Dictionary and your comments, I now understand them to be clearly defined (most as Text).

In any case, your suggestion is much cleaner, and actually appeared to shave about 5 seconds off of the last 60 seconds!

I will review my code for unnecessary coercions.

BTW, can you say off the top if the “single statement” which gets 9 properties is a lot faster than getting the properties in separate “set” statements (e.g., using 9 individual variables instead of a little list)?

the single statement is indeed faster, but if you want to assign a lot of variables to several properties, you can use also this syntax


tell application "iTunes"
	set {album artist:_albumArtist, album:_album, track number:_trackNumber, name:_name, artist:_artist, genre:_genre, year:_year, time:_time, date added:_dateAdded} to track trackIndex of library playlist 1
end tell

No. You are still getting a single property from a single track one at a time. That’s fine if you’re only dealing with a single track or handful of tracks, but if you want it to scale to entire iTunes libraries then you need to get a single property from all tracks at once.

tell library playlist 1
    set allAlbumArtists to album artist of every track
    set allAlbums to album of every track
    set allTrackNumbers to trackNumber of every track
    ...
end tell

Yes, it’s weird and counterintuitive (welcome to AppleScript’s world), but if you’re dealing with hundreds or thousands of track elements it’s the only way you’re going to wrangle decent performance out of iTunes scripting interface. Of course, you still have the problem of AppleScript’s atrocious lack of efficiency in iterating over those lists, but like I say there are standard hacks for coping with that.

I did this, and all went well initially, but I have run into another snag.

First, I write out a file “as «class utf8»”, and can confirm in TextWrangler that indeed the international characters are being saved correctly :slight_smile:

However, when I then read back in that file (also “as «class utf8»”), the international characters get out of whack.

Even simply writing out what I just read in yields incorrect international characters (the output file seen in TextWrangler does not match the input file seen in TextWrangler):

set inFileRef to open for access inputPathedFileName
set textLines to read inFileRef as «class utf8» using delimiter linefeed
close access inFileRef

set outFileRef to open for access outputPathedFileName with write permission
set eof of outFileRef to 0
repeat with textLine in textLines
  write (textLine & linefeed) to outFileRef as «class utf8»
end repeat
close access outFileRef

In the end, the html pages that I generate from re-processed utf-8 text files now end up displaying whacky characters in Safari :frowning:

Any help is greatly appreciated :slight_smile:

Well, after testing a bunch of failed theories, I finally came upon a solution that works!

It seems like there is something in the “read” command above that causes incorrectly coerced characters when the “using delimiter linefeed” is used to create the list of lines.

To get around this, I used the following restructured code to accomplish the same thing as the “read” lines above, but this time with correctly coerced characters that produce the desired output in the rewritten file :slight_smile:

set inFileRef to open for access inputPathedFileName
set allText to read inFileRef as «class utf8»
close access inFileRef
set savedDelimiters to AppleScript's text item delimiters
set AppleScript's text item delimiters to {linefeed}
set textLines to get every text item of allText
set AppleScript's text item delimiters to savedDelimiters

This should do the same


set textLines to paragraphs of (read inputPathedFileName as «class utf8»)

Thanks, Stefan. That does the trick too! (Note: I am sure you meant “inFileRef” and not “inputPathedFileName”).

Any reason why the first statement works and the second does not work?

set textLines to paragraphs of (read inFileRef as «class utf8»)
set textLines to read inFileRef as «class utf8» using delimiter linefeed

I meant inputPathedFileName, open for access is not necessary and therefore there is no inFileRef.
Nevertheless inputPathedFileName must be an alias in this syntax.

I have no idea, why the delimiter in conjunction with class utf8 doesn’t work.