XML management...

Rashef · November 21, 2009, 9:23pm

Hey,
I’m not a developer… but I’d like to learn something more!
I’m playing around with applescript and I’ve already made some useful (for me) script. Now I need to do something more complex…

The steps:

Download a XML file from internet (periodically updated);
Parse the XML file to extract two data: a title and a URL (video stream);
Show a dropdown list with titles…
Choosing an item opens the related video stream with VLC or QuickTimeX.

I guess this would need XCode but I really don’t know were to start. Do you think I could get something useful with AppleScript?

I’m trying and building a script step by step…

I use this to download the needed XML (and erase previus version):

do shell script "rm -f /tmp/file.xml &>/dev/null"
do shell script "curl [url=http://www.myurl.com/file.xml]www.myurl.com/file.xml[/url] --user-agent '" & UserAgent & "' --retry 5 --connect-timeout 10 --output /tmp/file.xml &>/dev/null"

Then I check if the file exists, otherwise alert and exit…

tell application "Finder"
	if exists POSIX file "/tmp/file.xml" then
				
					--> file parsing, listing, dropdown, etc...
				
	else
		display alert "List is unavailable!"
	end if
end tell

The XML file starts with the tags and : inside lots of are listed, with this schema:

The only data I need are the value of name into the first tag and the related url between .
For example:

For each set I need a “The first link” item that opens http://www.myfirstlink.com/servlet.htm?n=1234.

Can I hope to succeed?

Fenton · November 22, 2009, 5:05pm

I think you could do all this with AppleScript, though I don’t know much about playing video streams. But likely the part you’re most worried about is the 1st part, getting AppleScript list(s) from the xml. There is an xml/xsl transformation tool, which you can run via do shell script. You may not know xsl, but what you want is not all that complex. I’d do it, but I’m supposed to be working this morning (argh). This AppleScript shows the basic use:


set xml_file to quoted form of POSIX path of (choose file with prompt "XML File")
set xsl_file to quoted form of POSIX path of (choose file with prompt "XSL File")

-- return results here
do shell script "xsltproc " & xsl_file & " " & xml_file

-- write xml file with results
-- set out_file to "~/Desktop/Result.xml"
-- do shell script "xsltproc -o " & out_file & " " & xsl_file & " " & xml_file

Imagine if you had 2 xsl stylesheets, one to extract the video names, one to extract the URLs. Use 2 separate transformations. The transformation results would be 2 text files. You could then read the files, get the paragraphs, to produce 2 AppleScript lists (of equal # of items). You’d use the 1st list for a “choose from list,” then get the item from the 2nd list at the same position. I’m not sure the most efficient method to do that; but hopefully someone does.

Rashef · November 22, 2009, 6:54pm

Thanks for your help… xsltproc seems to be able to convert from XSL to XML only.
Following your suggestion I used some bash knowledge to parse the XML file:


do shell script "grep -B 1 '<url>' /tmp/file.xml|sed /--/d | sed s/^\ *// > /tmp/file.tmp"

UPDATE: it works in terminal, but AppleScript warns me about an unknown token inside the second sed - it seems it doesn’t like the space in here s/^\ *//.

Apart of the warning, in terminal I get something like that (for each entry):

According to what you wrote, it would be easier to work with two separate list (title and url).
I don’t know if this works:


do shell script "while read line ; do echo -en $line|awk -F '"' '{print $2}'; echo "=="; done < /tmp/file.tmp > /tmp/title.tmp"
do shell script "while read line ; do echo -en $line|awk -F '<url>' '{print $2}'|awk -F '</url>' '{print $1}'; echo "=="; done < /tmp/file.tmp > /tmp/url.tmp"

This way I should have two file (title.tmp and url.tmp)… but I don’t know how to proceed…

Thanks again!

Fenton · November 22, 2009, 9:54pm

No, you’re misunderstanding what xsl does. It determines the output. It can be xml, it can be html, it can be text. It can get all of or any part of the xml, including its elements (if xml output) or not. Example of the beginning of the xml (not the declaration tho), for a text output:

<xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:output method=“text” version=“1.0” encoding=“utf-8”/>

It can produce a text file of only the names, then another (using a slightly different xsl file) of the URLs. There no need at all to resort to Unix or AppleScript parsing techniques. XSL does that, and does it better, if the source is XML.

But you need to learn some XSL to use it, or get someone else to. What you need is simple enough. I’m just too busy (still) to do it.

Fenton · November 22, 2009, 10:14pm

XML

XSL for Names (return-separated)

XSL for URLs

Inside the tags are supposed to be the ampersand#13semicolon (return character)
It gets translated, so can’t show as is.

Fenton · November 22, 2009, 10:17pm

P.S. You could use #10 instead, for a line feed. AppleScript doesn’t care which. It can read the paragraphs of a text file with either.

Rashef · November 23, 2009, 11:31am

Oh thanks… Now it’s clear!
I followed your suggestion and I tried this:

where /tmp/name.xsl contains what you wrote as “XML for Names”, but the output file is not creted… using verbose mode I got:

I don’t know if it’s important: the source XML file misses the first line

and it start/ends with .

Rashef · November 23, 2009, 1:52pm

I got it working this way (RETURN key stripped away):

XLS for names:

XLS for URLs:

I don’t know why but there is an extra return for each and the first line of each group of urls/names starts with 4 spaces.

I noticed that each set of tags contains some with different url for the same stream: what if I had to parse only the url of the second for each ?
This will semplify things as I got 287 URLs and about 29 sets: I guess 29 items list is more user-friendly…

Anyway, how to fill a list with those items?

Thanks, you will be my god hereafter…

Fenton · November 23, 2009, 3:55pm

Sorry, I didn’t see the line above the XML where you stated the enclosing tags. The following will not have blanks (hopefully), nor an extra return at the end. It uses a couple of XSL functions, which are pretty self-evident. XSL doesn’t have a lot of functions, but it has some basic ones. XSLT 2 has more, but I mostly use FileMaker, which only supports XSLT 1 (sigh). This the name one; you can modify for the url one.

Fenton · November 23, 2009, 3:59pm

An alternative way to get an item is to use the “find anywhere” syntax. It is useful if the item’s tag might exist a different levels of the hierarchy. But I seldom use it. I think it is slower.

<xsl:for-each select=“//videounit”>

Rashef · November 24, 2009, 10:06am

Sorry, but it doesn’t produce any output. I tried this:

and it works!

Before talking about option lists… does it exist a XSL function allowing to select only the first (or the 2nd/3rd/4th…) videoUnit URL for each set? Some URL repeat twice or more into the same (with different title so I cannot trim if I want to preserve the same order of title and URL).

Fenton · November 25, 2009, 4:37pm

“blank”? But you said “block” earlier. But it doesn’t matter, as you’ve learned you to do it, and could now build your own. Basic XSL is not that difficult, just different from “programing”.

Yes, as far as specifying which occurrence of an element to get, XSL supports basic “array” syntax.

<xsl:for-each select=“block/sets/set/items/item/units/videounit[1]”>

would only get data from the 1st instance (XSL node counting is not zero-based).

But I don’t know if that would solve your problem of duplicates. You would have to go thru and find the duplicate urls, then remove them and their corresponding name in the other list, as the two lists must remain synchronized.

Rashef · November 25, 2009, 11:14pm

Uops… only a typo here…

I guess it should since most of duplicates occur in the same block…

Thank you teacher!

Rashef · November 27, 2009, 5:23pm

Hi again!

Looking at XML I found out that the exact title of the video is in this form:

How to extract “name” from that CDATA tag?

thanks again

Rashef · November 27, 2009, 11:15pm

Done:

But sometimes there are unicode chars inside CDATA. Since I’m using text method, how to convert them?
I found out translate() function but I cannot understand how to apply within the other functions…

Fenton · November 28, 2009, 5:30am

I’m a little confused by you last post. I thought you wanted to output XML to import into FileMaker. In which case I believe you’d want to keep the CDATA sections, which are a way of protected text from the xml parser (or visa versa). The xml author is saying, “there could be characters in this element’s contents which would get translated by the parser, or could be illegal in xml content data, so I’m telling the parser to skip this stuff” (like “>”, “<”, “&”, pretty common characters).

Since you are not outputing text (?) but xml, you’ll want to keep the CDATA. There is an output element attribute for this, where you provide a whitespace-separated list of elements to encase (or keep encased) in CDATA.

<xsl:output method=“xml” version=“1.0” encoding=“utf-8” indent=“yes” cdata-section-elements=“title” />

I have not used that for quite a while however.

What I did figure out was a way to remove duplicate url elements. It is pretty much the same method you’d use in any language. Sort the things, if the next one is the same as the previous, it’s a duplicate. I looked at the previous one (url in this case). I also ran into that same “extra whitespace” problem, which was screwing up the comparison until I removed it.

This shows the rather verbose but effective XSL version of “If this, this, Else If that, that, Else whatever”
(actually AppleScript needs a 2 step Else, If)

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:output method=“xml” version=“1.0” encoding=“utf-8” indent=“yes” />
<xsl:template match=“/”>

<xsl:for-each select=“block/sets/set/items/item/units/videounit”>
<xsl:sort select=“url” order=“ascending” />
xsl:choose
<xsl:when test=“position() = 1”>

<xsl:value-of select=“@name” />

<xsl:value-of select=“url” />

</xsl:when>
xsl:otherwise
<xsl:if test=“url != normalize-space(preceding-sibling::*)”>

<xsl:value-of select=“@name” />

<xsl:value-of select=“url” />

</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>

</xsl:template>
</xsl:stylesheet>

Rashef · November 28, 2009, 7:15am

Never said that!
Sorry if I wasn’t clear…
I need to extract (video streaming related) URLs from a XML file. The goal is to use an AppleScript that allows the user to simply select a URL from the list (drop-down or whatever can be useful) to get it opened with either VLC (if installed) or QuickTime X.
Since it’s quite impossible to remember what the URL points to, I thought to get the “title” of the streaming too.

This is an example and I asked how to extract “verbose title” and the content between the tag.
You suggested me to use a XSL to convert data into two different TXT, to “choose from list”.
Thanks to your help I was able to get these data.

But “verbose title” seems to be more a “short description” of the streaning. Looking at this loooong, long XML file I noticed a tag which is parallel to . The third occurrence of this tag (in each “set”) contains the exact title of the streaming between the tag:

I was able to get the “name” inside CDATA too.
Now I have a list of titles and a list of URLs (without duplicates) and I need to build up a drop-down or select box with titles (that open the related URLs).

Since titles contains lots of unicode chars (& #39; - without space - for ', & #224; for Ã etc…) I wonder if AppleScript will show them either like unicode or ASCII…

Fenton · November 30, 2009, 2:40am

You are probably past this by now. I’m afraid I got a little confused there for a while. For some reason I got it stuck in my head that you were producing XML (mostly because that’s what I often do, to import into FileMaker). But in your case you’re just producing text files.

An easy to use tool I’ve used for this kind of conversion is the TextCommands scripting addition. It has a simple decode URL command that takes care of a lot of character entities from web text.