Getting source code of multiple tabs in Safari

Hello clever people!

I’m new to Applescript and I’m currently writing a script that will do a substantial amount of producing my weekly email newsletter. What the script should do is load the What’s New page of the website I edit, ask me how many new articles I’ll be covering in the email, load the required number of articles into new tabs (by following the top n links in the What’s New list) and then extract information from each of those tabs.

Getting the urls and loading the tabs is fine. And if I deal with a tab at a time I can get the source code of each page and extract the data I need (by dumping the source into textwrangler). It seems to me that it would be more elegant to do this using a repeat - cycle through the tabs, put the urls into a list, copy the source code and do something with it (can I put the entire source code of a page as a list item?).

But here’s my problem. Getting the source doesn’t seem to work if it’s included within a repeat statement. I get a message telling me the variable (itemSource) is undefined. Any suggestions?

Here’s the relevant section of the script:

-- open safari and load the CW What's new page
tell application "Safari"
	activate
	set whatsnew to "http://www.damaris.org/cm/index.php?type=43&site_id=17"
	open location whatsnew
	
	-- Input number of CW items to be included
	display dialog "How many items?" default answer ""
	set ItemCount to text returned of result as integer
end tell

-- open tabs for  required number of CW items
tell application "Safari"
	set itemURLs to {}
	set tabcount to count of tabs of window 1
	repeat with i from 1 to ItemCount

		tell document 1
			-- First CW article URL is link number 57
			-- so link number set to 55 plus 2x the counter (i) 
			-- so as to skip all navigation links and the linked CW logo in each line of the table on What's New page
			set TabURL to (do JavaScript "document.links[" & (55 + (i * 2)) & "].href")
		end tell
		
		tell window 1
			set itemURLs to itemURLs & TabURL
			make new tab with properties {URL:TabURL}
			set tabcount to tabcount +1
			set itemsource to source of tab tabcount
		end tell
		
	end repeat

	-- i'm just putting the variable here to check the result in Script Editor rather than doing anything with it yet
	itemSource

end tell

Hi,

The first thing is that your variable itemsource should be the same in all instances, since AppleScript is case sensitive. itemsource and ItemSource would be considered two different variables.

Thanks T-Rex but unfortunately that’s not the problem. It still doesn’t work.

Reading around, it seems that Applescript is not normally case-sensitive (though with some things it is - system voices, for example). This is easily verified with my example by getting the value of TABCOUNT at the end instead of tabcount - it still gives the right answer.

Hi, not sure if I understood what you are going to achieve, but certainly you can create a list of source codes (if you don’t need the entire code maybe it makes sense to filter the source code first, otherwise the list elements will be quite huge pieces of text):

set sourceCodeList to {}
tell application "Safari"
	set c to count of tabs of window 1
	repeat with i from 1 to c
		set s to source of tab i of window 1
		set end of sourceCodeList to s
	end repeat
end tell
sourceCodeList

Ciao
Farid

Hi Tony,

If there is a case discrepancy in a variable, it’s fixed automatically, when you compile; this is not your issue. Your variable, itemsource, has no value without allowing the page to load. A several second delay is essential.

Thanks Farid for clarifying the list thing. The only reason I thought about putting the source codes into a list was to streamline the code, and then to dump the successive items of the list into a text document for manipulation before moving onto the next one.

Marc Anthony, thanks for pointing out the need for a delay. I’m sure you’re right that this is my problem. In the meantime I’ve done the task in a more long-winded way but I ran into the same problem of the page not loading. I finally figured out that was the isse there, but hadn’t then worked back to see that it was the cause of my original problem. Am I right in thinking I just need something as simple as ‘delay 5’?

Yes, inserting that delay (prior to “set itemsource to source of tab tabcount”) will probably work in most situations. It’s going to depend on the server’s response, though. You can use a ready-made subroutine from Apple that will loop until the page is done.
The below came from:
www.apple.com/applescript/archive/safari/jscript.01.html

--Sub-routine for determining when the front document has loaded:

-- call the sub-routine and pass the maximum number of seconds to wait for the page to be loaded:
if page_loaded(20) is false then error numner - 128

on page_loaded(timeout_value)
delay 2
repeat with i from 1 to the timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
return true
else if i is the timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded

Thanks again Marc Anthony. I have tried using this but I keep getting this message:

Is it because it comes after an instruction to create new tabs? This is how the relevant bit of the script looks now:

-- open safari and load the CW What's new page
tell application "Safari" to activate
	
	-- open What's new
	set whatsnew to "http://www.damaris.org/cm/index.php?type=43&site_id=17"
	open location whatsnew
	
	display dialog "How many items?" default answer "2" -- Input number of items to be included
	set ItemCount to text returned of result as integer
end tell

-- wait until page loaded
if page_loaded(20) is false then return

-- open tabs for  required number of CW items
tell application "Safari"
	set itemURLs to {}
	set itemSources to {}
	repeat with i from 1 to ItemCount
		-- First CW item URl should be link 57, but may change if page structure altered
		tell document 1
			-- link number set to 55 plus 2x the counter (i) so as to skip CW logo link on What's New page
			set TabURL to (do JavaScript "document.links[" & (55 + (i * 2)) & "].href")
		end tell
		
		tell window 1			
			set end of itemURLs to TabURL -- List of URLs
			make new tab with properties {URL:TabURL}
			if page_loaded(20) is false then then error number - 128
			set s to source of tab (i + 1)
			set end of itemSources to s -- creates list of page sources 
		end tell
	end repeat
end tell

on page_loaded(timeout_value)
	delay 2
	repeat with i from 1 to the timeout_value
		tell application "Safari"
			if (do JavaScript "document.readyState" in document 1) is "complete" then
				return true
			else if i is the timeout_value then
				return false
			else
				delay 1
			end if
		end tell
	end repeat
	return false
end page_loaded

Hi,

handlers within an application tell block have to be called with a reference to AppleScript itself using the keyword my


.
if my page_loaded(20) is false then error number -128
.

Note: the page_loaded handler affects always the active window.
After creating a new tab you should select the new tab, otherwise the page_loaded handler won’t work properly

Excellent, thanks StefanK. You people are all so helpful to beginners like me. Next question then: how do I select the tab that has just been created?

this is quite easy


.
set current tab to make new tab with properties {URL:TabURL}
.

Thanks. I think I should have known that. I’ve seen ‘current tab’ in other people’s scripts but having not used it yet it had leaked out of my porous brain.