List efficiency tricks -- Wow!

General · February 4, 2005, 2:42am

[Edit: Hmmm… since writing this message, I retried the code that I thought was slow, realizing that I’d missed one extra use of “my” in front of “theTracks”, and that seems to be the real key to speed, not having the list pre-sized.

I’d still be curious about pre-sizing a list if possible, and I’m even more curious now about why the “my” keyword does the speed magic that it does.]

I’m wondering if there’s a way to automatically create a list with N items, sort of like declaring an array a[N] in another language. I wouldn’t really care if each item is a zero or an empty string or whatever.

Why?

I just discovered how changing all of the items in a long list is much, MUCH faster than building a new list by sequentially doing “set end of aList to…”.

For instance, this code to create a list which pairs each iTunes track with its database IDs:

tell application "iTunes"
  set lib to library playlist 1
  set tracksByID to every file track in lib
  set trackCount to count tracksByID
  set theIDs to database ID of every file track of lib

  repeat with i from 1 to trackCount
    set item i of my tracksByID to {item i of my theIDs, item i of my tracksByID}
    --test
    if i mod 100 = 0 then my reportProgress("" & i)
  end repeat

-- blah, blah
end tell

…is much faster (especially with nearly 4000 songs in iTunes) than this code which I originally started with:

tell application "iTunes"
  set lib to library playlist 1
  set theTracks to every file track in lib
  set trackCount to count theTracks
  set theIDs to database ID of every file track of lib
  set tracksByID to {}

  repeat with i from 1 to trackCount
    set end of my tracksByID to {item i of my theIDs, item i of my theTracks}
    --test
    if i mod 100 = 0 then my reportProgress("" & i)
  end repeat

-- blah, blah
end tell

How much faster? From around 30 seconds down to less than one second.

It seems kind of silly for tracksByID to start out as a list of tracks without IDs, but the way I wrote the code was the easiest way I could see to make sure tracksByID was the right size for what I wanted to store in it. If I could have done something like “set count of tracksByID to trackCount” or ‘set tracksByID to “” repeated trackCount times’ – I would have done that.

Any way to get where I’m trying to go here?

Nigel_Garvey · February 4, 2005, 10:08pm

See the end of this recent thread for a brief discussion about referencing list variables.

There’s no built-in function for declaring N-item lists in AppleScript. For your present purpose, reusing an old list of the same length is a good idea. If you needed to preserve the old list for any reason, you could simply duplicate it and “reuse” the duplicate:

copy theTracks to tracksByID -- NB. 'copy', not 'set'.
repeat with i from 1 to trackCount
  set item i of my tracksByID to {item i of my theIDs, item i of my theTracks}
end

But it’s very easy to write a list-declaration handler if you need one:

on createList(n)
  script o
    property newList : {}
  end script
  
  set mv to missing value
  repeat n times
    set end of o's newList to mv
  end repeat
  
  return o's newList
end createList

hhas · February 5, 2005, 12:58pm

AppleScript uses vector lists [1] which have a constant lookup time, but adds some extra code to check for circular references [IIRC] which reduces this to linear lookup time since each item in the list is being checked. The reference trick circumvents the additional code somehow, giving you constant lookup time again - plus the ability to crash AS if you don’t take care.

(This is just sloppy design by AppleScript’s designers, btw: other languages manage fixed-time lookups on vector lists without any trouble. But Apple are unlikely to fix this given they’ve finite time and resources and much more important work to do, so it’s just something you’ll have to live with.)

As for appending to lists, the longer the list the longer it takes to grow [2]. AppleScript was designed to run on machines with just a few MB of RAM, so optimises for low memory consumption at the cost of poorer performance (that’s their excuse anyway, and again don’t hold your breath waiting for it to change:p).

tell application "iTunes"
  set lib to library playlist 1
  set tracksByID to every file track in lib
  set trackCount to count tracksByID
  set theIDs to database ID of every file track of lib

  repeat with i from 1 to trackCount
    set item i of my tracksByID to {item i of my theIDs, item i of my tracksByID}
  end repeat

I’d suggest using AppleMods’ List library which provides a very efficient recomposeList() command for just this sort of task:

-- auto-generated library loading code
property _Loader : run application "LoaderServer"

----------------------------------------------------------------------
-- DEPENDENCIES

property _List : missing value

on __load__(loader)
	set _List to loader's loadLib("List")
end __load__

----------------------------------------------------------------------

-- main code

__load__(_Loader's makeLoader())

tell application "iTunes"
	set lib to library playlist 1
	set tracksByID to every file track in lib
	set theIDs to database ID of every file track of lib
end tell
set tracksByID to _List's recomposeList({theIDs, tracksByID})

If you’ve not used AppleMods’ libraries before, you’ll need to download and install AppleMods’ Loader system first. Run the Loader installer, then download and add the List library to the ASLibraries folder. You can use the LoaderWizard applet to generate the library loading code to paste at the top of your script.

BTW, the List library also includes a highly-optimised makeList() command for those times where you do need to pre-build an empty list for speed, but in this case recomposeList() will do everything you need.

HTH

has

[Technical explanations below if you really want to know why AppleScript’s lists behave as they do…]

[1] A vector list is just a block of memory at address M, containing the memory addresses of each item of the list. To look up the item at index N in the list, the AppleScript interpreter just adds N to M and looks up the memory location for that number to find the address of the item it wants. And because locating the item takes only simple addition (rather than, say, counting along each item in the list till it gets to the one it wants), it will always takes the same amount of time to do regardless of how big or small the list is. Unfortunately, AppleScript’s rather dump error prevention code spoils all this by then iterating across each item anyway (unless you deliberately bypass it using the magical reference kludge, though this introduces some ugliness of its own).

[2] When the AppleScript interpreter allocates a block of memory to hold a list, it adds a bit of free space at the end, allowing some room for future growth. Once that spare space fills up, however, the only way to add another item is by allocating a new, larger block of memory somewhere else and moving the existing list data over to that. This means copying each item one at a time, making it a linear-time operation. Smarter languages increase the amount of extra space proportional to the size of the list, so the bigger the list the more items can be added before it has to do another move. AppleScript always adds the same amount of spare space, however (enough for an additional 15 items), so that the amount of copying being done increases quite drastically as the list grows larger.