I am trying to take the Page Source of a safari website and place it into excel because I want to pull the urls of the site as well as other information. I have already written formulas in excel that allow me to convert the pagesource. However, I would like an applescript to pull up the pageSources of a couple hundred websites that I already have logged.
I am a bit new to applescripting, but I have written a script that seems to work for some websites and not others. I think it might have to do with how the data is being transfered to excel, because I can transfer the data to textedit fine.
tell application "Safari"
open location "http://www.microsoft.com/mac/developers/default.mspx?CTT=PageView&clr=99-21-0&target=b2656752-dbbb-494e-ad09-031d08e7bc8e1033&srcid=cdb9a274-738f-4e4a-9e8c-83feda0485241033&ep=7"
set pSource to the source of the front document
set readSource to read file pSource
end tell
tell application "Finder"
activate
open document file "Sorting Info.xls" of folder "Research Data" of folder "Desktop" of folder "grashapa1" of folder "Users" of startup disk
end tell
tell application "Microsoft Excel"
activate
activate object workbook "Sorting Info.xls"
activate object worksheet "Sheet1"
set theRange to range "A1:A100" of sheet 1 of active workbook
set value of active cell to pSource
end tell
The website I have placed in works, however if I used a site such as http://finance.google.com/finance?q=ABC, the script fails to do the job.
Also, is their a way to make each line of the pageSource appear as one cell.
Or, is there a better way of pulling urls such as Income Statement without having to use excel
Browser: Firefox 2.0.0.14
Operating System: Mac OS X (10.5)
Hi grashapa1,
When you try to get the page source from a website in Safari, you should always ensure that it was completely loaded, otherwise you do not get the full source, but only part of it:
(This is not perfect code…)
set weblocurls to {"http://www.surtec.de", "http://www.atotech.com", "http://www.umicore.com"}
tell application "Safari"
repeat with weblocurl in weblocurls
make new document with properties {URL:weblocurl}
delay 3
set docloaded to false
repeat 10 times
delay 1
set docstate to (do JavaScript "document.readyState" in document 1)
if docstate is "complete" then
set docloaded to true
exit repeat
end if
end repeat
if docloaded is true then
tell me
activate
display dialog "Completely loaded:" & return & return & weblocurl
end tell
end if
close document 1
end repeat
end tell
Moreover you might be better off using the «curl» command to batch-download the source of websites:
set weburl to "http://www.apple.com"
set source to do shell script "curl " & weburl
I think I figured out one of the reasons that the pagesource doesn’t transfer.
For example, when I have the website http://www.surtec.de/ it is redirected to http://www.surtec.de/IndexE.html. The applescript is trying to read the wrong page source when I use the curl command.
I have given up on transfering the data to Excel, but instead combining it all in TextEdit, which I can do:
However, I guess I no longer have an applescript question, but more of an Excel question haha. When I copy the source code to Excel, it changes back to how it would look on the web.
Btw, here is my new script with that new addition that checks to see if the source is loaded.
set weblocurls to {"http://finance.google.com/finance?q=ABC", "http://finance.google.com/finance?q=AGN", "http://finance.google.com/finance?q=AMGN", "http://finance.google.com/finance?q=BIIB", "http://finance.google.com/finance?q=CAH", "http://finance.google.com/finance?q=CELG", "http://finance.google.com/finance?q=CEPH", "http://finance.google.com/finance?q=CVD", "http://finance.google.com/finance?q=DNA", "http://finance.google.com/finance?q=ELN", "http://finance.google.com/finance?q=FRX", "http://finance.google.com/finance?q=GENZ", "http://finance.google.com/finance?q=GILD", "http://finance.google.com/finance?q=HSP", "http://finance.google.com/finance?q=MCK", "http://finance.google.com/finance?q=NVO", "http://finance.google.com/finance?q=PPDI", "http://finance.google.com/finance?q=SHPG", "http://finance.google.com/finance?q=SNY", "http://finance.google.com/finance?q=TEVA"}
tell application "TextEdit"
activate
make new document
end tell
tell application "Safari"
activate
repeat with weblocurl in weblocurls
make new document with properties {URL:weblocurl}
delay 3
set docloaded to false
repeat 10 times
delay 1
set docstate to (do JavaScript "document.readyState" in document 1)
if docstate is "complete" then
set docloaded to true
exit repeat
end if
end repeat
if docloaded is true then
set pSource to the source of the front document
tell application "TextEdit"
activate
set paragraph 1 of the front document to pSource
end tell
end if
close document 1
end repeat
end tell
Thanks a bunch