Identicle HTML tags - choosing which one to parse

Ok, so using cURL and applescript’s text item delimeters i can fairly easily parse say my global rank in Halo using the bungie.net site

Heres the catch, In Halo 3 you have your Social Stats and your Ranked Stats. I’m trying to get my Kills in Social but it just keeps grabbing the ranked kills because it comes first in the HTML source code. They are both embedded in identicle tags and the only difference is the actual number im trying to extract. Is there a way to choose which one I want?

Would you mind posting the html source that contains both.

Cheers,

Craig

trying to extract the social kills


set y to do shell script "echo " & quoted form of html_text & " | /usr/bin/ruby -e 'puts STDIN.read.grep(/Kills/)[1].scan(/textWrap\">(\\d+)<.*/)[0][0]'"

Cheers,

Craig

Awesome, could you explain to me how that works? Thanks so much!

Sure,

The html_text is “sent” to ruby using the “|” pipe character.
The “-e” lets ruby know this is one line of code.
“puts” is one of the ruby print commands.
“STDIN.read” does exactly that, it reads what is being “piped” in.
“grep” is finding every line in the text that contains “Kills”.
“[1]” is the second item in the array that was returned by grep.
“scan” searches the text, in this case only the one line, and returns what
is between the “()”

"/textWrap">(\d+)<./" this is a regular expression
It searches the text for ‘textWrap">’ and then it looks for ‘\d+’ digits
followed by a ‘<’
The ‘+’ after the ‘\d’ lets regex know there must be one or more digits.
The ‘.’ matches any character and the '
’ matches zero or more of the
previous expression.

If you want to learn regular expressions I would suggest the book
“Mastering Regular Expressions by Jeffrey E.F. Friedl”

Here is a breakdown of what each return value looks like.


#!/usr/bin/env ruby -w

input_text = IO.read("/Users/craig/Desktop/MacScripterSolutions/hendo.html")

kills_array = input_text.grep(/Kills/)

["<tr><td class=\"statTableLeft\"><p class=\"textWrap\">Kills:</p></td><td class=\"statTableRight\"><p class=\"textWrap\">2979</p></td></tr>\n", "<tr><td class=\"statTableLeft\"><p class=\"textWrap\">Kills:</p></td><td class=\"statTableRight\"><p class=\"textWrap\">10433</p></td></tr>\n"]

digits_array = kills_array[1].scan(/textWrap">(\d+)<.*/)

[["10433"]]

digits = digits_array[0][0]

"10433"

Cheers,

Craig

Thanks for the awesome explanation. I understand it now, thanks a lot :slight_smile:

Hm, I’m getting an error when I try this:
This is my code:

property gt : "AceHENDO13"
set theSource to (do shell script "/usr/bin/curl 'http://www.bungie.net/stats/Halo3/CareerStats.aspx?player=" & gt & "'")
set kills to do shell script "echo " & quoted form of theSource & " | /usr/bin/ruby -e 'puts STDIN.read.grep(/Kills/)[1].scan(/textWrap\">(\\d+)<.*/)[0][0]'"
display dialog kills

I get the error:

You can try this.

Save this in a file on your desktop named ‘retrieveStats.rb’
If you save it somewhere else make sure to change the path in
the calling AppleScript.

Call it with this.


set pathToRubyFile to (path to desktop) & "retrieveStat.rb" as string
do shell script "ruby " & quoted form of POSIX path of pathToRubyFile

I prefer to call an external file rather than try to cram it in a one liner through do shell script.

Let me know how it goes,

Craig

That may not work because AceHENDO13 is actually dynamic and should be able to be any gamertag. This is all part of an AS Studio application but I figured I’d post it here because the problem wasn’t really relevant to AS Studio itself. But basically how it works is there is a text field where you type the gamertag and hit enter, it gets the stats. If the selected segment on a segmented control is social it uses your method Craig and if it’s on ranked it just uses simple html parsing to get the info.

Change to this:

And


set userNumber to "AceHENDO13"
set pathToRubyFile to (path to desktop) & "retrieveStat.rb" as string
do shell script "ruby " & quoted form of POSIX path of pathToRubyFile & space & userNumber

Since you are putting this in an app you may want to add a little protection
to make sure a user number was supplied to the ruby script.

Thanks Craig, I need to do this task with 4 different text fields (kills, deaths, games, kill/death ratio), is there a quick way to do this or should i make 4 seperate ruby files?

This is not the most graceful solution but I am running out of time tonight.
Change the paths to match your configuration.

set theSource to (do shell script "/usr/bin/curl 'http://www.bungie.net/stats/Halo3/CareerStats.aspx?player=AceHENDO13'")
set theFile to (path to desktop) & "html.txt" as string
set eof file theFile to 0
write theSource to file theFile

set theValues to paragraphs of (do shell script "ruby ~/desktop/retrieveStats.rb")

I’ll give that shot! Thanks :slight_smile: