Reading Addresses from Online Yellow Pages?

I’m looking to populate a mailing list database from an online yellow pages, like Yahoo! Yellow Pages. Basically, I want to mail a brochure to every print shop in the area. The idea is to do the search manually, then run a script, click “next” manually (for multi-page results), click a script, and so on to populate a FileMaker database.

I’m only familiar with Yahoo! Yellow Pages, but if you know an online yellow pages that is easier to be “read” by AppleScript, I’m open to suggestions.

I know how to get data to FileMaker, but I’m not sure how to go about reading data from a web page in a predictable way. Not sure which browser to use (I use Firefox, but maybe Safari scripts better, or some Unix-a-ma-gig would work better?).

Looking for suggestions on technique and methodology, but code snippets are never turned down. :wink:

Don’t worry, this isn’t to “steal” from an online page to generate junk mail…it’s to keep me from having to key-in a bunch of addresses by hand for my small freelance graphics business. I just did a batch of 75 addresses and that was tedious enough…I’m looking at a few hundred or even over a thousand for the next batch. Yuck. :frowning:

Thanks in advance,

Kevin;

The best way to get stuff from any web page is to use curl to get the page text directly and then parse the html (using text item delimiters) for the data you want. I wrote a tutorial on getting the weather some time ago that illustrates the ideas.

Reading it now…one question though…in your example it looks like you have a specific, repeatable URL in mind. What if I wanted to do something like this where a human being (me) determines the URL to parse via the browser. In other words, I navigate to a page, then trigger the script. Then navigate to the next page I want parsed, then trigger the script.

And if there are multiple pieces of information to get…multiple records per pages with multiple fields each…how would you loop/parse that?

cURL looks cool, just not sure how I’d take advantage of it for:

a) browser-driven parse (maybe some way to “get the current URL in the browser” command?)
b) multiple-record parse (maybe use the delimiters trick to build an array somehow for each page then load FileMaker with the data in “batches” rather than record-by-record?)

Thanks Adam!

May turn out to be impossible, Kevin; some web sites seem to block cURL downloads that don’t have appropriate headers, and I couldn’t figure out what Yahoo! Yellow Pages wanted:

http://yahoo.yellowpages.ca/search/?ei=UTF-8&stype=si&src=yahoo&what=Restaurants&where=Halifax+NS” works from a browser, but not with cURL even if I use the -A ‘Mozilla/4.0’ option.

What’s worse, Yahoo! is blocking downloading of the source of the page in both Camino and Safari, so I can’t even get the HTML that way.

I think your only route may be to use another source.

What do you mean by this? I just did a “View Source” in the browser and Safari and Firefox both show it just fine. Or are you using an AppleScript command?

ADDENDUM:

I got Safari to spit-out source code from Yahoo! Yellow Pages for the current window simply with this:

tell application "Safari"
	set pageHTML to source of document 1
end tell

If it helps, looks like www.superpages.com has HTML source that is cleaner to work with and more straightforward to search.

Curl does work if you use it like this example.
-L option enables curl to try and follow the redirect.

set url2 to "http://yahoo.yellowpages.ca/search/?stype=si&src=yahoo&what=Restaurants&where=Halifax+NS&x=43&y=13"
set locationURL to do shell script "curl -L " & quoted form of url2 & " -o ~/documents/searchYahoo.html"

You do not need to shunt the result to a file like this, that just for you to easily see the results in this example

If I have this URL open in Safari…
http://yp.yahoo.com/py/ypResults.py?&city=San+Leandro&state=CA&zip=94578-2725&uzip=94578&country=us&msa=5775&cs=9&ed=kYIreK160SxlUFZDYiXmKO0Ko3kegDBrEF9bzGtUJ.FeaBchsuTXW6fbocPVqw_O8nw1aNYUVKHG&stx=95479435&stp=y&desc=Printing+Facilities&offset=0&FBoffset=80&sp=1&doprox=1&sorttype=distance

…and run this script…

tell application "Safari"
	set currentURL to URL of document 1
end tell

set url2 to currentURL
set locationURL to do shell script "curl -L " & quoted form of url2 & " -o ~/documents/searchYahoo.html"

…then Script Debugger and Script Editor both hang. Script Debugger acts like it’s involved in a long task or a infinite loop and can be cancelled. Script Editor gets hung with no indication it’s doing anything and has to be Force Quit.

Can I safely assume I’m missing something about cURL?

So far it seems more complicated than just having Safari itself spit-back the source since I have to browse to the page manually anyway. What is the purpose of using cURL when manually browsing to a page and AppleScript can acquire source from an open web page?

My SD4 doesn’t hang with this version (thanks for the -L switch Mark) and it does get the HTML for the page, but it doesn’t produce results for restaurants by name - it’s database driven so it links to the names.

Well, this isn’t elegant, but it does the trick through brute force:

tell application "Safari"
	set pageHTML to source of document 1
end tell

tell application "BBEdit"
	activate
	make new text window with properties {contents:pageHTML}
	
	--strip HTML Markup
	replace "<[^<>]*>" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--fix carriage returns
	replace "
" using "
" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--fix special characters
	replace " " using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	
	--remove non-data top and bottom stuff
	find "Miles**" searching in text 1 of window 1 options {search mode:literal, starting at top:true} with selecting match
	set selectionOffset to (offset of selection) + 7
	select characters 1 thru selectionOffset of text window 1
	delete selection
	
	find "** Distances" searching in text 1 of window 1 options {search mode:literal, starting at top:true} with selecting match
	set selectionOffset to (offset of selection) - 12
	select characters selectionOffset thru -1 of text window 1
	delete selection
	
	--strip superfluous space, tabs, and other artifacts and useless data
	replace "Map" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "See reviews on Local" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "  " using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "	" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace " 
" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "^\\(.*
" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	replace "




" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace ", CA" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "

[0-9]*\\.[0-9]*

" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--go to beginning of file
	select insertion point before character 1 of text window 1
end tell

All that remains is to move the data into FileMaker. I’ll post the finished code when I’m done. Maybe in the meantime someone else will find a prettier way to do this. :wink:

It’s rough and it’s ugly, but it works and it’s fast enough for my purposes. I can clean it up gradually over the next few days as I’m using it.

tell application "Safari"
	set pageHTML to source of document 1
end tell

tell application "BBEdit"
	activate
	make new text window with properties {contents:pageHTML}
	
	--strip HTML Markup
	replace "<[^<>]*>" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--fix carriage returns
	replace "
" using "
" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--fix special characters
	replace " " using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	
	--remove non-data top and bottom stuff
	find "Miles**" searching in text 1 of window 1 options {search mode:literal, starting at top:true} with selecting match
	set selectionOffset to (offset of selection) + 7
	select characters 1 thru selectionOffset of text window 1
	delete selection
	
	find "** Distances" searching in text 1 of window 1 options {search mode:literal, starting at top:true} with selecting match
	set selectionOffset to (offset of selection) - 12
	select characters selectionOffset thru -1 of text window 1
	delete selection
	
	--strip superfluous space, tabs, and other artifacts and useless data
	replace "Map" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "See reviews on Local
" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "Web Site" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "  " using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "	" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace " 
" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "^\\(.*
" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	replace "




" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace ", CA" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "

[0-9]*\\.[0-9]*

" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
end tell

tell application "BBEdit"
	activate
	
	--get addresses from file
	repeat with addressNumber from 1 to 20
		tell text window 1
			set addressBlockLineStart to ((addressNumber - 1) * 7) - (addressNumber - 2)
			
			select insertion point before line addressBlockLineStart
			set lineBeginOffset to offset of selection
			select insertion point after line addressBlockLineStart
			set lineEndOffset to (offset of selection) - 1
			set companyName to (characters lineBeginOffset thru lineEndOffset) as text
			
			select insertion point before line (addressBlockLineStart + 1)
			set lineBeginOffset to offset of selection
			select insertion point after line (addressBlockLineStart + 1)
			set lineEndOffset to (offset of selection) - 1
			set companyAddress to (characters lineBeginOffset thru lineEndOffset) as text
			
			select insertion point before line (addressBlockLineStart + 2)
			set lineBeginOffset to offset of selection
			select insertion point after line (addressBlockLineStart + 2)
			set lineEndOffset to (offset of selection) - 1
			set companyCity to (characters lineBeginOffset thru lineEndOffset) as text
		end tell
		
		tell application "FileMaker Pro Advanced"
			activate
			open file "OSXT:Users:kquosig:Desktop:Test.fp7"
			create new record
			go to last record --go to new record
			
			set cell "Company" of current record to companyName
			set cell "Street Address" of current record to companyAddress
			set cell "City" of current record to companyCity
		end tell
		
	end repeat
end tell

tell application "BBEdit"
	close window 1 without saving
end tell

tell application "FileMaker Pro Advanced"
	close document 1
end tell

EDIT: Made some changes based on some problems I ran into this morning.

Here’s the final script I’ve been using. Works really well, have done thousands of records with no problems. If anyone has any ideas to do this better, let me know!

I added a check so that records without addresses from Yahoo! would not be added to the database and a check to make sure duplicates weren’t entered by using Filemaker-side scripts to do quick Finds, whose results load global FileMaker variables that AppleScript can check against. Works great, decently fast.

Made extensive use of BBEdit’s ability to switch between literal and GREP searches, which allowed me to “drill down” into various removals of sets of HTML code and excess fluff that gets in the way of pulling the data. Probably could break this code into sub-handlers, but was just lazy and needed to get this project done ASAP. :wink:

To use script:

–Select an area in Yahoo! Yellow pages to search
–Do a search, drill into a category as needed.
–Switch from “Sponsored Businesses” to “Distance”
–run script
–click “next” link in Yahoo! YP
–run script again, and so on

If anyone wants details on the FileMaker side of things, let me know.

tell application "Safari"
	set pageHTML to source of document 1
end tell

tell application "BBEdit"
	activate
	make new text window with properties {contents:pageHTML}
	
	--strip HTML Markup
	replace "<[^<>]*>" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--fix carriage returns
	replace "
" using "
" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	
	--fix special characters
	replace " " using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	
	--remove non-data top and bottom stuff
	find "Miles**" searching in text 1 of window 1 options {search mode:literal, starting at top:true} with selecting match
	set selectionOffset to (offset of selection) + 7
	select characters 1 thru selectionOffset of text window 1
	delete selection
	
	find "** Distances" searching in text 1 of window 1 options {search mode:literal, starting at top:true} with selecting match
	set selectionOffset to (offset of selection) - 12
	select characters selectionOffset thru -1 of text window 1
	delete selection
	
	--strip superfluous space, tabs, and other artifacts and useless data to standardize data "records"
	replace "Map" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "See reviews on Local
" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "Web Site" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "  " using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "	" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace " 
" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "^\\(.*
" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
	replace "




" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace ", CA" using "" searching in text 1 of window 1 options {search mode:literal, starting at top:true}
	replace "

[0-9]*\\.[0-9]*

" using "" searching in text 1 of window 1 options {search mode:grep, starting at top:true}
end tell

tell application "BBEdit"
	activate
	
	--get addresses from file
	repeat with addressNumber from 1 to 20
		tell text window 1
			set addressBlockLineStart to ((addressNumber - 1) * 7) - (addressNumber - 2)
			
			select insertion point before line addressBlockLineStart
			set lineBeginOffset to offset of selection
			select insertion point after line addressBlockLineStart
			set lineEndOffset to (offset of selection) - 1
			set companyName to (characters lineBeginOffset thru lineEndOffset) as text
			
			select insertion point before line (addressBlockLineStart + 1)
			set lineBeginOffset to offset of selection
			select insertion point after line (addressBlockLineStart + 1)
			set lineEndOffset to (offset of selection) - 1
			set companyAddress to (characters lineBeginOffset thru lineEndOffset) as text
			
			select insertion point before line (addressBlockLineStart + 2)
			set lineBeginOffset to offset of selection
			select insertion point after line (addressBlockLineStart + 2)
			set lineEndOffset to (offset of selection) - 1
			set companyCity to (characters lineBeginOffset thru lineEndOffset) as text
		end tell
		
		tell application "FileMaker Pro 8"
			activate
			open file "OSXT:Users:kquosig:Desktop:Print Shops, 070601.fp7"
			
			--check for duplicate address
			if companyAddress is not "" or "
" then
				set cell "g_Address_Search" of current record to "\"" & companyAddress & "\""
				do script "Duplicate Pre-Check"
				
				if cell "g_IsDupe" of current record is "no" then
					set newRecord to create new record
					
					set cell "Company" of newRecord to companyName
					set cell "Street Address" of newRecord to companyAddress
					set cell "City" of newRecord to companyCity
				end if
			end if
		end tell
		
	end repeat
end tell

tell application "BBEdit"
	close window 1 without saving
end tell

tell application "FileMaker Pro 8"
	close document 1
end tell