Hi guys,
I found a script that will download all of the urls from a page. It works perfectly. I need help with the next step, to filter those results to exclude “x” and “y” and “z”. What would be the code to do that?
set site_url to "http://www.apple.com"
tell application "Safari"
activate
open location site_url
end tell
-- wait until page loaded
if page_loaded(20) is false then return
-- get number of links
set theLinks to {}
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
set linkCounter to num_links - 1
-- retrieve the links
repeat with i from 0 to linkCounter
tell application "Safari" to set end of theLinks to do JavaScript "document.links[" & i & "].href" in document 1
end repeat
theLinks
on page_loaded(timeout_value)
delay 2
repeat with i from 1 to the timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
return true
else if i is the timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded
The next step would be to open the remaining links in new safari or firefox tabs. IS this possible?
I really appreciate all of your help!
Thanks!
Hi, welcome to the forum.
Try this:
set site_url to "http://www.apple.com"
tell application "Safari"
if not (exists document 1) then reopen
set URL of document 1 to site_url
-- wait until page loaded
if my page_loaded(20) is false then return
-- get number of links
set myLinks to do JavaScript "var linkList = [];
for (i = 0; i<document.links.length; i++)
{
linkList.push(document.links[i].href);
}
linkList;" in document 1
repeat with alink in myLinks
if alink contains "itunes" then
set URL of document 1 to alink
--insert your code
end if
end repeat
end tell
on page_loaded(timeout_value)
delay 2
repeat with i from 1 to the timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
return true
else if i is the timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded
Thanks!
How do I limit the list to contain only “xxxx.com/store” Then after that how do I filter those results so that it only includes xxxx.com/store/zzz but does not contain xxxx.com/store/zzz=ref or xxxx.com/store/zzz?update
Thanks so much!
I actually figured out how to do that
yay!!
The next part is, how do I open the remaining links in document 1 as multiple tabs in safari (or firefox)?
Thanks!
if alink contains "xxxx.com/store" then
if alink contains "xxxx.com/store/zzz" and alink does not contain "xxxx.com/store/zzz=ref" and alink does not contain "xxxx.com/store/zzz?update" then beep
repeat with alink in myLinks
if alink does not contain "ipad" then
tell window 1 to set newTab to make new tab
set URL of newTab to alink
--insert your code
end if
end repeat
end tell
For some reason that isn’t working.
What do I need to do to get Firefox to open all of the links as separate tabs.
set site_url to "https://www.etsy.com/your/orders/sold?ref=si_ys_dd_sold_orders"
tell application "Safari"
activate
open location site_url
end tell
-- wait until page loaded
if page_loaded(30) is false then return
-- get number of links
set theLinks to {}
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
set linkCounter to num_links - 1
-- retrieve the links
repeat with i from 0 to linkCounter
tell application "Safari" to set end of theLinks to do JavaScript "document.links[" & i & "].href" in document 1
end repeat
theLinks
set nonExcludedURLs to {}
repeat with i from 1 to length of theLinks
if item i of theLinks does not contain "home" and item i of theLinks does not contain "sell" and item i of theLinks contains "https://www.etsy.com/your/orders/" and item i of theLinks does not contain "sold_orders" and item i of theLinks does not contain "open" and item i of theLinks does not contain "completed" and item i of theLinks does not contain "all" and item i of theLinks does not contain "canceled" and item i of theLinks does not contain "open" and item i of theLinks does not contain "unpaid" and item i of theLinks does not contain "unshipped" and item i of theLinks does not contain "from_user_id" and item i of theLinks does not contain "update" and item i of theLinks does not contain "your-notes" then
end if
end repeat
nonExcludedURLs
on page_loaded(timeout_value)
delay 2
repeat with i from 1 to the timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
set nonExcludedURLs to {}
return true
else if i is the timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded
Firefox is not scriptable. However, if it is your default browser you can open new tabs with:
open location "http://www.apple.com/itunes/"
Okay… since Firefox is not scriptable how do I open multiple tabs with safari?
To open multiple tabs in Safari read post #6
To open multiple tabs in Firefox read post #8
Hello.
FireFox is so little scriptable, that I guess, its usage cannot be endorsed by governments that has laws for letting disabled people work with computers. FireFox in that respect doesn’t support UI scripting at all! :mad:.
Here is a handler that opens a new tab in Safari’s frontmost window with the url specified.
loadUrlInNewSafariTab for "http://www.macscripter.net"
to loadUrlInNewSafariTab for anUrl
tell application "Safari"
tell its first window
set current tab to tab (index of (make new tab))
set URL of current tab to anUrl
end tell
end tell
end loadUrlInNewSafariTab
You’ll need to activate Safari when you are done. 
Hi,
Thanks. That worked. But how do I get it to open the links that are in document 1 as apposed to the set link of macscripter?
Hello.
I surmise theLinks list you have in your post #1 just contains the urls. then something like this would do the trick:
tell application "Safari"
-- your stuff for finding the links here
repeat with aLink in theLinks
loadUrlInNewSafariTab of me for aLink
end repeat
activate
end tell
-- the handler here..
I tried to rewrite the code with alinkes but it was not working so I went back to my original code. I am able to get to list all of the urls and then remove the links of the ones that I don’t want. It won’t work for you because you won’t have access to the link but where would I put the tab code and what would it be to work with what I have. I really appreciate your help 
set site_url to "https://www.etsy.com/your/orders/sold?ref=si_ys_dd_sold_orders"
tell application "Safari"
activate
open location site_url
end tell
-- wait until page loaded
if page_loaded(30) is false then return
-- get number of links
set theLinks to {}
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
set linkCounter to num_links - 1
-- retrieve the links
repeat with i from 0 to linkCounter
tell application "Safari" to set end of theLinks to do JavaScript "document.links[" & i & "].href" in document 1
end repeat
theLinks
set nonExcludedURLs to {}
repeat with i from 1 to length of theLinks
if item i of theLinks does not contain "home" and item i of theLinks does not contain "sell" and item i of theLinks contains "https://www.etsy.com/your/orders/" and item i of theLinks does not contain "sold_orders" and item i of theLinks does not contain "open" and item i of theLinks does not contain "completed" and item i of theLinks does not contain "all" and item i of theLinks does not contain "canceled" and item i of theLinks does not contain "open" and item i of theLinks does not contain "unpaid" and item i of theLinks does not contain "unshipped" and item i of theLinks does not contain "from_user_id" and item i of theLinks does not contain "update" and item i of theLinks does not contain "your-notes" then
set end of nonExcludedURLs to item i of theLinks
end if
end repeat
nonExcludedURLs
on page_loaded(timeout_value)
delay 2
repeat with i from 1 to the timeout_value
tell application "Safari"
if (do JavaScript "document.readyState" in document 1) is "complete" then
set nonExcludedURLs to {}
return true
else if i is the timeout_value then
return false
else
delay 1
end if
end tell
end repeat
return false
end page_loaded
Hello and well done! 
Just insert it where you return the value of nonExcludedUrls, that is the lines just containing nonExcluded Urls.
-- nonExcludedURLs
tell application "Safari"
repeat with aLink in nonExcludedURLs
loadUrlInNewSafariTab of me for aLink
end repeat
activate
end tell
-- the handlers here..
When I do that I get the error: «script» doesn’t understand the loadUrlInNewSafariTab message.
Hello you’ll have to add this handler to your script, and try again. 
to loadUrlInNewSafariTab for anUrl
tell application "Safari"
tell its first window
set current tab to tab (index of (make new tab))
set URL of current tab to anUrl
end tell
end tell
end loadUrlInNewSafariTab
PERFECT!! Thanks so much!!
Now my last question. Is there a way to print all open tabs? I have an extension in firefox that does that but since I can’t script in firefox is there a way to do that in safari?
Thanks!
There are tons of scripts here that uses UI Scripting to print a web page.
You’ll need something to iterate over the tabs in the front page, which is what I am providing below.
tell application "Safari"
tell window 1
set tc to count its tabs
repeat with tbn from 1 to tc
try
set current tab to tab tbn
-- code for printing the current tab here...
on error
beep
end try
end repeat
end tell
end tell
thanks!
Last questions (promise)? How do I exclude the first tab? The first tab is the page where all of the URLs are grabbed from and I don’t need to print that page. EDIT: I think I figured it out… change 1 to 2???
Also, how do I pause this section of the script to load after say 20 seconds pass? that will give time to all of the tabs to load and then there won’t be a problem with the print…
Thanks!
tell application "Safari"
tell window 1
set tc to count its tabs
repeat with tbn from 1 to tc
try
set current tab to tab tbn
tell application "Safari"
print document 1
end tell
on error
beep
end try
end repeat
end tell
end tell