text item delimiters: Search HTML output from curl - I'm stumpted.

I’m stumpted. I download & store the html data in a variable and load that into this handeler. It works like I want to esentailly weed out a large portion of the HTML for the section I want. Because of how the html is coded on the site I’m trying to get I can’t run this once and get the value I want so I need to cut down the html I have from the output of this handeler the first time. It works just fine the first time, but not the second, it can’t get “Can’t get text item 2 of (insert html giberish here)”

I used macscripter.net/articles/447_0_10_29_C/ as a guide

The Handeler:

on search_page(pageHTML, StartFilter, EndFilter)
	log "--StartFilter: " & StartFilter & return & return
	log "--EndFilter: " & EndFilter & return & return
	set Txt1 to pageHTML as string
	set Search1 to StartFilter as string
	set Search2 to EndFilter as string
	
	set TID to AppleScript's text item delimiters
	set text item delimiters to Search1
	set Rslt1 to text item 2 of Txt1
	set text item delimiters to Search2
	set Rslt2 to text item 1 of Rslt1 as string
	
	set ResultTxt to Rslt2 as string
	
	set AppleScript's text item delimiters to TID
	
	return ResultTxt
end search_page

Calling the Handler:


set LargeBlockSearch_FIRST to "<td colspan=6 class=\"notered\">Pending Property Values</td></tr>"
set LargeBlockSearch_LAST to "<!-- Current Use Values -->"


set smallBlockSearch_FIRST to "<td width=\"100\" align=\"center\" style=\"border-bottom: 1px inset\">" & return & "							<span class=\"notesans\">Market Total</span></td>" & return & return & "						<td width=\"60\" align=\"right\" style=\"border-right: 1px inset; border-bottom: 1px inset\">" & return & " 							<span class=\"emphless\">"
set smallBlockSearch_LAST to "</span></td>

						</tr>

						<!-- Current Use Values -->"

set ValueURL to "http://web5.co.snohomish.wa.us/propsys/asr-tr-propinq/PrpInq02-ParcelData.asp?PN=" & ParcelID  -- ParcelID is "29071600400600"
set URLresults to (do shell script "curl " & ValueURL)

set LargeBlock to search_page(URLresults, LargeBlockSearch_FIRST, LargeBlockSearch_LAST)
set SmallBlock to search_page(LargeBlock, smallBlockSearch_FIRST, smallBlockSearch_LAST)

I want SmallBlock to give me the “Market Total” value under Assessor’s Property Data → Property Values → Pending Property Values. SmallBlock should be equal to $268,500 for this parcel. I will end up feeding a bunch of parcelIDs through this in a loop eventually. The value I’m looking for is about 2/3 of the way down the page. I don’t want to use safari and can’t rely on much of the values or positions of most of the content because it all changes with each parcel.

I’m stumpted and would really appreciate any help you guys could give me.

Model: MBP 15" (1stGen)
Browser: Safari 419.3
Operating System: Mac OS X (10.4)

I ran into a similar problem today where I had to change tab to “\t” for my script to work. This was not a problem until very recently.

Try changing return to “\r” in your script (wherever return means carriage return).

Nope, thanks for the thought but it didn’t work. Same error and after I click OK to accept the error it highlights “text item 2” of “set Rslt1 to text item 2 of Txt1”

Do I have to escape ?

Hi,

this works for the specified ParcelID

property ParcelID : "29071600400600"
set ValueURL to "http://web5.co.snohomish.wa.us/propsys/asr-tr-propinq/PrpInq02-ParcelData.asp?PN=" & ParcelID
set txt to (do shell script "curl " & ValueURL)

set {TID, text item delimiters} to {text item delimiters, "Pending Property Values"}
set txt to text item 2 of txt
set text item delimiters to "Current Use Values"
set txt to text item 1 of txt
set text item delimiters to "Market Total"
set txt to text item 2 of txt
set text item delimiters to "emphless\">"
set txt to text item 2 of txt
set text item delimiters to "<"
set txt to text item 1 of txt
set text item delimiters to TID
txt

It looks like StefanK beat me to it, but here’s what I came up with…

set ParcelID to "29071600400600"

set Search1 to "Pending Property Values"
set Search2 to "Market Total"
set Search3 to "emphless\">"
set Search4 to "<"

set ValueURL to "http://web5.co.snohomish.wa.us/propsys/asr-tr-propinq/PrpInq02-ParcelData.asp?PN=" & ParcelID -- ParcelID is "29071600400600"
set URLresults to (do shell script "curl " & ValueURL)

set the_result to my search_URL(URLresults, Search1, Search2, Search3, Search4)

on search_URL(pageHTML, Search1, Search2, Search3, Search4)
	set Txt1 to pageHTML as string
	set TID to AppleScript's text item delimiters
	set text item delimiters to Search1
	set Rslt1 to text item 2 of Txt1
	set text item delimiters to Search2
	set Rslt2 to text item 2 of Rslt1
	set text item delimiters to Search3
	set Rslt3 to text item 2 of Rslt2
	set text item delimiters to Search4
	set Rslt4 to text item 1 of Rslt3
	set AppleScript's text item delimiters to TID
	return Rslt4 as string
end search_URL

Stephan;
On myMachine, “txt” is a reserved word, but the script works beautifully if I change txt to something else. [scriptDB.osax} /Adam

Your responese work great for me. Thank you all for your input. What was causing the error? Was I trying to use deliminators that were too large / too complex?

Most likely: that the line endings on the HTML and the line endings in your delimiters didn’t match, or that the text type didn’t match for some of the symbols [ASCII, UTF-8, or UTF-16]

Thanks Adam,

Unfortunetly I have a new problem that should be easy to fix.

I have a text file that has a bunch of deliminated values as below:

cat|AREA|PERIMETER|TAXACCT_|TAXACCT_ID|TAX_KEY|PARCEL_ID|STATUS|TAX_YEAR|LRSN|SOURCE|COMMENT|CREATEDATE|EDITDATE|DELETEDATE|EDITOR|GIS_ACRES 131|30886.87667|828.26329|217887|217887|220812|00611300009100|A|2007|370895|13||2000/09/25|1850/01/01|1850/01/01||0.71 132|31350.19241|808.51993|217888|217888|220813|00611300009300|A|2007|370896|13||2000/09/25|1850/01/01|1850/01/01||0.72
filePath is a file reference from a choose file dialog box elsewear in the script. rowNum is to ignore the first row. I’m trying to get seventh value in each row. The script only get’s the first “00611300009100” and ignores"00611300009300" for the second. Both should be added to the ParcelIDList list. Looks like the issue is with the paragraphs part. When I count paragraphs of the file it outputs to 1

-- Filter ParcelID from string from line of text
on PopulateParcelIDs(filePath)
    set fileID to open for access file (filePath as string)
    
    set fileLines to paragraphs of (read fileID)
    set rowNum to 1
    repeat with this_row in fileLines
        if rowNum is 0 then
            display dialog this_row as string
            set rowNum to rowNum + 1
        else if rowNum > 0 then
            set OldDelimiters to AppleScript's text item delimiters
            set text item delimiters to "|"
            set ParcelIdFromFile to text item 7 of this_row
set AppleScript's text item delimiters to OldDelimiters
            set end of ParcelIDList to ParcelIdFromFile
            set rowNum to rowNum + 1
        end if
    end repeat
    close access fileID
end PopulateParcelIDs

Thanks again…

try this

set filePath to choose file
set ParcelIDList to my PopulateParcelIDs(filePath)

on PopulateParcelIDs(filePath)
	set filePath to filePath as string
	set fileLines to paragraphs 2 thru -1 of (read file filePath)
	set ParcelIDList to {}
	set OldDelimiters to AppleScript's text item delimiters
	set text item delimiters to "|"
	repeat with this_row in fileLines
		set ParcelIdFromFile to text item 7 of this_row
		set end of ParcelIDList to ParcelIdFromFile
	end repeat
	set AppleScript's text item delimiters to OldDelimiters
	return ParcelIDList
end PopulateParcelIDs