XML parser running slowly

technomorph · May 25, 2018, 2:25am

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”

My original XPath that I finally figured out was this:
//COLLECTION/ENTRY/.[//@RATING=‘Search In Playlists’]

for most of those ENTRY/INFO elements, very few actually have RATING attributes
so I figured I would try an XPath that would only find those that actually had a rating using:
//ENTRY/INFO[@RATING]/…

here is my trimmed code:

set {theResults, theError} to (theXMLDoc's nodesForXPath:"//ENTRY/INFO[@RATING]/.." |error|:(specifier))

set theResults2 to {}
repeat with aResults in theResults		
	set trackRATING to (aResults's nodesForXPath:"//INFO/attribute::RATING" |error|:(missing value))'s firstObject()'s stringValue() as text
	
	if trackRATING is "Search In Playlists" then
		set end of theResults2 to aResults
	end if
	aResults's detach()
end repeat

which still after 4plus hours no real results

PS I am running SD6.0.8 on 10.10
I have a system install of 10.11 on a drive. So I’ve yet to try this all on SD7 and 10.11.
I’m guessing this could make a huge difference?

One other thought that i’m having is am I possibly crossing into iTunes item Object subclass
Track with the naming of my trackRATING variable?

thanks

Shane_Stanley · May 25, 2018, 3:51am

You don’t want to be doing XPath queries in a loop – the point of using XPath is to avoid things like loops as much as possible.

It’s still not clear what you’re after, but if it’s actually:

something like this should do it:

set {theResults, theError} to (theXMLDoc's nodesForXPath:"//ENTRY/INFO[@RATING='Search In Playlists']/.." |error|:(reference))

But you don’t say what you want to do with the ENTRY elements when you’ve found them.

technomorph · May 27, 2018, 10:38am

ame:

Hi Nigel,

just for info for all (you and Shane know this) I tested the following script:


property myArray : missing value

set myString to LoopWithString() -- test changing the handler

on LoopWithString()
	set startDate to current date
	set myString to ""
	repeat with j from 1 to 50000
		set myString to myString & j
	end repeat
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 6 seconds on MacBook Pro 2015
end LoopWithString

on LoopWithArraySlow()
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 37 seconds on MacBook Pro 2015 
end LoopWithArraySlow

on LoopWithArrayFast()
	-- Note the use of word "my" to access the array
	set startDate to current date
	set myArray to {}
	repeat with j from 1 to 50000
		copy j to end of my myArray
	end repeat
	set myString to myArray as string
	set endDate to current date
	set elapsedTime to endDate - startDate
	return elapsedTime -- 1 seconds on MacBook Pro 2015 
end LoopWithArrayFast

These result with loop of 50,000.
I remember when I discovered the trick about the “magic” word “my” many years ago…

Stefano - Ame

Hey Ame,
Thanks for this.
Can you explain to me the advantage of using “my”?
Also I’m trying to understand / analyze your code and I’m confuse about the last two.
As it seems like the code is exactly the same.
What is it that I’m missing that makes the code on the last one execute so fast?

Thanks

Nigel_Garvey · May 27, 2018, 12:29pm

Hi technomorph.

The AppleScript Language Guide describes how (but not why!) using a reference to a list variable, instead of just using the variable directly, can speed up access to the items and properties of the list if it’s very large. (https://developer.apple.com/library/content/documentation/AppleScript/Conceptual/AppleScriptLangGuide/reference/ASLR_classes.html#//apple_ref/doc/uid/TP40000983-CH1g-DontLinkElementID_587)

In the ASLG example, the ‘a reference to’ operator is used to set another variable containing a reference to the list variable; but it’s also possible (and slightly faster still) to write the reference directly into the script code by including the owner of the variable in the code. eg.:

item 1000 of bigList -- Using a list variable directly.
item 1000 of my bigList -- Referencing the list variable as something belonging to the current script.
item 1000 of its bigList -- Referencing a list variable in another script.

It’s not possible to reference local variables, only properties, globals, or run-handler variables. But if you need to use the technique inside a handler, you can set up a temporary script object with a property set to the list and use references to that property:

on myHandler(myList)
	script o
		property bigList : myList
	end script
	
	item 1000 of o's bigList
end myHandler

Referencing list variables only speeds up access to the lists’ items and properties — ie. the list variable references must be parts of references to items or properties of those lists. It doesn’t speed up operations on the lists themselves, such as counting, ‘contains’, or concatenation.

Shane_Stanley · May 28, 2018, 12:13am

You’re making it hard to follow. What are you planning to do with these attribute names and values?

IAC, you don’t need all that stuff. Try this:

set propNames to theResults's valueForKeyPath:"attributes.name"
set propValues to theResults's valueForKeyPath:"attributes.stringValue"

technomorph · May 29, 2018, 5:44am

Hi yes i guess I should make that clear.

I’m working with the DJ Software Traktor (and a bit of iTune)
Their own library system has some flaws that I’m trying to work on in my own.
Their library system is XML 1.0 based and they name them with extension .NML
The main library system is called collection.NML
You can export out a single playlist playlist.NML from the software that carries pretty much
the same structure as the main collection.NML but just a great deal smaller.
You can then reimport that playlist.NML into the software and it will merge in the
main library and update as need be (I still need to do some investigation on exactly what
get’s updated and what gets potentially lost but thats in the future)

1) relocating missing ENTRY files:
(a little background iTunes when you change Track Name, Artist, Album, etc
depending on your iTunes management preferences, it will renames and relocates
the file based on the Artists Name/LP Name/Track Name. I like this management.
iTunes has a way better Library management system, than Traktor does so most
folks use iTunes for their main management, and then play in Traktor. Once
you’ve changed a Track in iTunes and it relocates it. Traktor will say it is missing.
It has a relocate function and you can point it to the new file. But it you have
numerous missing files. It can automatically try to refind them, but this is very
slow and can be inaccurate)

want to figure out a way to do better searching with the option to ask the user
“I found these? Which one should I replace it with”
also I’ll be looking into using: iTunes NSPredicate, mdls with mdfind and kMD, your FileTagsLib
also some of Doug’s iTunes scripts (or my modified versions)

2) dealing with duplicates: both physical files, and duplicate XML entries

finding possible duplicates
helping select which ones would be best to keep (based on bit rate, and other quality factors)
transferring some of the meta data (attributes) between each ENTRY
possibly recalculating (adjusting) some of the attributes based on different lengths that
may be different between each of the ENTRY’s
removing those “old” ENTRY’s
updating the library PLAYLISTS that had those old ENTRY’s with the selected ENTRY
saving all the new info to the master LIBRARY

I’ve accomplished some of these things by using:

iTunes Side

Doug’s Scripts Dupin software
my own custom applescripts for iTunes
Traktor Side:
manually figuring out the selects and what to copy
export the smaller PLAYLIST.NML file
using an XML editor to edit and copy/paste ELEMENTS and ATTRIBUTES
reimporting that edited PLAYLIST.NML file into TRAKTOR and having it update the main LIBRARY

I’ve really being digging what I’ve been able to create on my own via AppleScript with other
tasks on the iTunes side and would love to develop applescripts to accomplish what I’m looking
to do and share them with the Traktor community.

Stages I’m going to work thru:

Parsing XML and creating an array or list of the ENTRY’s I wish to process b[/b]
Present a dialog to ask user to select which ATTRIBUTES would like to collect b[/b]
Gather all of the selected ATTRIBUTES for the ENTRY’s into a list or a record (partially DONE)
Show this data in a table b[/b]
then depending on the different tasks I listed above I will need to:
- analyze, compare, sort, evaluate and create new lists/sublists/record/sub-records
- present the new data to the user and ask them to manual select some choices (filters)
- create a new array based on the choices made by the user
- update the other PLAYLIST node replacing removed ENTRYS with the new ENTRYS
- update the master XML with the new data
- oh and of course create a backup of the master XML in case anything goes wrong

all of these are great ways for me to learn more bout applescript which I’m loving.
Also running into issues, figuring out how to get around them and then also figuring
out of to make it all more efficient and user friendly at the same time.

All of your input has been so amazingly helpful!

thanks again

technomorph · May 29, 2018, 6:46am

Yes this did it. It’s what I had originally started with but 10.10.5 didn’t like it.
Definitely needed to make sure SD was in Source view mode.

the Xpath now takes 3 seconds to complete!

I will now look into using what you’ve suggested here now:

I was trying to use the xPath method you had in your final code above:

few problems.

I found I did have to put it into a loop. I could not just use:

set trackTITLE to ((theResults's nodesForXPath:"//ENTRY/@TITLE" |error|:(missing value))'s valueForKey:"stringValue") as list

what won’t work for me is the lists that creates, I won’t be able to merge them properly
as for example the trackAUDIO_ID list that was created. Was not of the same length (item count) as the other list because one of the ENTRY’s did not have that attribute. This will be the case for many of my other attributes I’d be working with.
as you mentioned before running the multiple Paths inside the loop was slow.

I will use your valueForKeyPath: method and report back!

thanks again

Kerry

technomorph · May 29, 2018, 7:55am

works great but it only gives me the root element of the
ENTRY attribute names and attribute values

how can I also get the same for the children of the ENTRY
that include:

LOCATION
INFO
BPM
CUES_V2

Edit one thought I have is I could set up a variable that adds
The node to the Xpath and then run it again.
Doing the same for each element?

Or can I Xpath my results?
From ADC
The NSXMLNode class defines an XPath method that can be quite useful when making XPath queries. As the name suggests, you can send an XPath message to any node object to get an XPath string describing that node’s location in a tree.

Shane_Stanley · May 29, 2018, 9:49am

I suspect you can do it with a different XPath query. You want to avoid loops as much as possible.

technomorph · May 30, 2018, 11:30am

yes i set up separate Path query for each subelement and works great.

thanks

XML parser running slowly

All ENTRY elements that have INFO children whose RATING attribute is “Search In Playlists”

which still after 4plus hours no real results

All ENTRY elements
that have INFO children
whose RATING attribute is “Search In Playlists”