Sunday, May 26, 2013

#1 2013-03-05 11:29:50 am

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Script to download urls and filter

Hi guys,

I found a script that will download all of the urls from a page. It works perfectly. I need help with the next step, to filter those results to exclude "x" and "y" and "z". What would be the code to do that?

Applescript:

set site_url to "http://www.apple.com"
tell application "Safari"
   activate
   open location site_url
end tell
-- wait until page loaded
if page_loaded(20) is false then return
-- get number of links
set theLinks to {}
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
set linkCounter to num_links - 1
-- retrieve the links
repeat with i from 0 to linkCounter
   tell application "Safari" to set end of theLinks to do JavaScript "document.links[" & i & "].href" in document 1
end repeat
theLinks

on page_loaded(timeout_value)
   delay 2
   repeat with i from 1 to the timeout_value
       tell application "Safari"
           if (do JavaScript "document.readyState" in document 1) is "complete" then
               
               return true
           else if i is the timeout_value then
               return false
           else
               delay 1
           end if
       end tell
   end repeat
   return false
end page_loaded

The next step would be to open the remaining links in new safari or firefox tabs. IS this possible?

I really appreciate all of your help!

Thanks!

Offline

 

#2 2013-03-05 12:44:30 pm

adayzdone
Member
From: New York
Registered: 2011-01-24
Posts: 408

Re: Script to download urls and filter

Hi, welcome to the forum.

Try this:

Applescript:

set site_url to "http://www.apple.com"
tell application "Safari"
   if not (exists document 1) then reopen
   set URL of document 1 to site_url
   
   -- wait until page loaded
   if my page_loaded(20) is false then return
   -- get number of links
   
   set myLinks to do JavaScript "var linkList = [];
for (i = 0; i<document.links.length; i++)
{
linkList.push(document.links[i].href);
}
linkList;"
in document 1
   
   repeat with alink in myLinks
       if alink contains "itunes" then
           set URL of document 1 to alink
           --insert your code
       end if
   end repeat
end tell

on page_loaded(timeout_value)
   delay 2
   repeat with i from 1 to the timeout_value
       tell application "Safari"
           if (do JavaScript "document.readyState" in document 1) is "complete" then
               
               return true
           else if i is the timeout_value then
               return false
           else
               delay 1
           end if
       end tell
   end repeat
   return false
end page_loaded

Offline

 

#3 2013-03-05 12:48:58 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

Thanks!

How do I limit the list to contain only "xxxx.com/store" Then after that how do I filter those results so that it only includes xxxx.com/store/zzz but does not contain xxxx.com/store/zzz=ref or xxxx.com/store/zzz?update

Thanks so much!

Offline

 

#4 2013-03-05 12:56:56 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

I actually figured out how to do that smile yay!!

The next part is, how do I open the remaining links in document 1 as multiple tabs in safari (or firefox)?

Thanks!

Offline

 

#5 2013-03-05 01:00:36 pm

adayzdone
Member
From: New York
Registered: 2011-01-24
Posts: 408

Re: Script to download urls and filter

How do I limit the list to contain only "xxxx.com/store"

Applescript:

if alink contains "xxxx.com/store" then

how do I filter those results so that it only includes xxxx.com/store/zzz but does not contain xxxx.com/store/zzz=ref or xxxx.com/store/zzz?update

Applescript:

if alink contains "xxxx.com/store/zzz" and alink does not contain "xxxx.com/store/zzz=ref" and alink does not contain "xxxx.com/store/zzz?update" then beep

Offline

 

#6 2013-03-05 01:17:01 pm

adayzdone
Member
From: New York
Registered: 2011-01-24
Posts: 408

Re: Script to download urls and filter

how do I open the remaining links in document 1 as multiple tabs in safari (or firefox)?

Applescript:

   repeat with alink in myLinks
       if alink does not contain "ipad" then
           tell window 1 to set newTab to make new tab
           set URL of newTab to alink
           --insert your code
       end if
   end repeat
end tell

Offline

 

#7 2013-03-05 01:23:04 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

For some reason that isn't working.

What do I need to do to get Firefox to open all of the links as separate tabs.

Applescript:

set site_url to "https://www.etsy.com/your/orders/sold?ref=si_ys_dd_sold_orders"
tell application "Safari"
   activate
   open location site_url
end tell
-- wait until page loaded
if page_loaded(30) is false then return
-- get number of links
set theLinks to {}
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
set linkCounter to num_links - 1
-- retrieve the links
repeat with i from 0 to linkCounter
   tell application "Safari" to set end of theLinks to do JavaScript "document.links[" & i & "].href" in document 1
end repeat
theLinks
set nonExcludedURLs to {}


repeat with i from 1 to length of theLinks
   if item i of theLinks does not contain "home" and item i of theLinks does not contain "sell" and item i of theLinks contains "https://www.etsy.com/your/orders/" and item i of theLinks does not contain "sold_orders" and item i of theLinks does not contain "open" and item i of theLinks does not contain "completed" and item i of theLinks does not contain "all" and item i of theLinks does not contain "canceled" and item i of theLinks does not contain "open" and item i of theLinks does not contain "unpaid" and item i of theLinks does not contain "unshipped" and item i of theLinks does not contain "from_user_id" and item i of theLinks does not contain "update" and item i of theLinks does not contain "your-notes" then
       
   end if
end repeat

nonExcludedURLs



on page_loaded(timeout_value)
   delay 2
   repeat with i from 1 to the timeout_value
       tell application "Safari"
           if (do JavaScript "document.readyState" in document 1) is "complete" then
               set nonExcludedURLs to {}
               
               return true
           else if i is the timeout_value then
               return false
           else
               delay 1
           end if
           
       end tell
   end repeat
   return false
end page_loaded

Offline

 

#8 2013-03-05 01:35:56 pm

adayzdone
Member
From: New York
Registered: 2011-01-24
Posts: 408

Re: Script to download urls and filter

Firefox is not scriptable. However, if it is your default browser you can open new tabs with:

Applescript:


open location "http://www.apple.com/itunes/"

Offline

 

#9 2013-03-05 01:38:50 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

Okay.. since Firefox is not scriptable how do I open multiple tabs with safari?

Offline

 

#10 2013-03-05 01:46:50 pm

adayzdone
Member
From: New York
Registered: 2011-01-24
Posts: 408

Re: Script to download urls and filter

Firefox is not scriptable. However, if it is your default browser you can open new tabs with:

To open multiple tabs in Safari read post #6
To open multiple tabs in Firefox read post #8

Last edited by adayzdone (2013-03-05 01:47:19 pm)

Offline

 

#11 2013-03-05 01:48:58 pm

McUsrII
Member
Registered: 2012-11-20
Posts: 949
Website

Re: Script to download urls and filter

Hello.

FireFox is so little scriptable, that I guess, its usage cannot be endorsed by  governments that has laws for letting disabled people work with computers. FireFox in that respect doesn't support UI scripting at all! mad.

Here is a handler that opens a new tab in Safari's frontmost window with the url specified.

Applescript:


loadUrlInNewSafariTab for "http://www.macscripter.net"
to loadUrlInNewSafariTab for anUrl
   tell application "Safari"
       tell its first window
           set current tab to tab (index of (make new tab))
           set URL of current tab to anUrl
       end tell
   end tell
end loadUrlInNewSafariTab

You'll need to activate Safari when you are done. smile

Last edited by McUsrII (2013-03-05 01:51:20 pm)


Filed under: safari

Offline

 

#12 2013-03-05 02:01:41 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

Hi,

Thanks. That worked. But how do I get it to open the links that are in document 1 as apposed to the set link of macscripter?

Offline

 

#13 2013-03-05 02:11:46 pm

McUsrII
Member
Registered: 2012-11-20
Posts: 949
Website

Re: Script to download urls and filter

Hello.

I surmise theLinks list you have in your post #1 just contains the urls. then something like this would do the trick:

Applescript:

tell application "Safari"
   -- your stuff for finding the links here
   
   repeat with aLink in theLinks
       loadUrlInNewSafariTab of me for aLink
   end repeat
   activate
end tell

-- the handler here..

Last edited by McUsrII (2013-03-05 02:12:52 pm)


Filed under: safari, scrape

Offline

 

#14 2013-03-05 03:50:33 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

I tried to rewrite the code with alinkes but it was not working so I went back to my original code. I am able to get to list all of the urls and then remove the links of the ones that I don't want. It won't work for you because you won't have access to the link but where would I put the tab code and what would it be to work with what I have. I really appreciate your help smile

Applescript:

set site_url to "https://www.etsy.com/your/orders/sold?ref=si_ys_dd_sold_orders"
tell application "Safari"
   activate
   open location site_url
end tell
-- wait until page loaded
if page_loaded(30) is false then return
-- get number of links
set theLinks to {}
tell application "Safari" to set num_links to (do JavaScript "document.links.length" in document 1)
set linkCounter to num_links - 1
-- retrieve the links
repeat with i from 0 to linkCounter
   tell application "Safari" to set end of theLinks to do JavaScript "document.links[" & i & "].href" in document 1
end repeat
theLinks
set nonExcludedURLs to {}


repeat with i from 1 to length of theLinks
   if item i of theLinks does not contain "home" and item i of theLinks does not contain "sell" and item i of theLinks contains "https://www.etsy.com/your/orders/" and item i of theLinks does not contain "sold_orders" and item i of theLinks does not contain "open" and item i of theLinks does not contain "completed" and item i of theLinks does not contain "all" and item i of theLinks does not contain "canceled" and item i of theLinks does not contain "open" and item i of theLinks does not contain "unpaid" and item i of theLinks does not contain "unshipped" and item i of theLinks does not contain "from_user_id" and item i of theLinks does not contain "update" and item i of theLinks does not contain "your-notes" then
       
       set end of nonExcludedURLs to item i of theLinks
   end if
end repeat
nonExcludedURLs


on page_loaded(timeout_value)
   delay 2
   repeat with i from 1 to the timeout_value
       tell application "Safari"
           if (do JavaScript "document.readyState" in document 1) is "complete" then
               set nonExcludedURLs to {}
               
               return true
           else if i is the timeout_value then
               return false
           else
               delay 1
           end if
           
       end tell
   end repeat
   return false
end page_loaded



Last edited by jwaldmann (2013-03-05 03:58:22 pm)

Offline

 

#15 2013-03-05 04:16:05 pm

McUsrII
Member
Registered: 2012-11-20
Posts: 949
Website

Re: Script to download urls and filter

Hello and well done! smile

Just insert it where you return the value of nonExcludedUrls, that is the lines just containing nonExcluded Urls.

Applescript:

-- nonExcludedURLs
tell application "Safari"
   repeat with aLink in nonExcludedURLs
       loadUrlInNewSafariTab of me for aLink
   end repeat
   activate
end tell

-- the handlers here..


Filed under: safari, tabs

Offline

 

#16 2013-03-05 04:17:55 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

When I do that I get the error: «script» doesn’t understand the loadUrlInNewSafariTab message.

Offline

 

#17 2013-03-05 04:24:18 pm

McUsrII
Member
Registered: 2012-11-20
Posts: 949
Website

Re: Script to download urls and filter

Hello you'll have to add this handler to your script, and try again. smile

Applescript:

to loadUrlInNewSafariTab for anUrl
tell application "Safari"
tell its first window
set current tab to tab (index of (make new tab))
set URL of current tab to anUrl
end tell
end tell
end loadUrlInNewSafariTab


Filed under: safari, tabs

Offline

 

#18 2013-03-05 04:25:57 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

PERFECT!! Thanks so much!!

Now my last question. Is there a way to print all open tabs? I have an extension in firefox that does that but since I can't script in firefox is there a way to do that in safari?

Thanks!

Offline

 

#19 2013-03-05 04:47:07 pm

McUsrII
Member
Registered: 2012-11-20
Posts: 949
Website

Re: Script to download urls and filter

There are tons of scripts here that uses UI Scripting to print a web page.

You'll need something to iterate over the tabs in the front page, which is what I am providing below.

Applescript:

tell application "Safari"
   tell window 1
       set tc to count its tabs
       repeat with tbn from 1 to tc
           
           try
               set current tab to tab tbn
               -- code for printing the current tab here...
           on error
               beep
           end try
       end repeat
   end tell
   
end tell


Filed under: safari, print, tabs

Offline

 

#20 2013-03-05 04:56:05 pm

jwaldmann
Member
Registered: 2013-03-05
Posts: 11

Re: Script to download urls and filter

thanks!

Last questions (promise)? How do I exclude the first tab? The first tab is the page where all of the URLs are grabbed from and I don't need to print that page. EDIT: I think I figured it out... change 1 to 2???

Also, how do I pause this section of the script to load after say 20 seconds pass? that will give time to all of the tabs to load and then there won't be a problem with the print...

Thanks!

Applescript:

tell application "Safari"
   tell window 1
       set tc to count its tabs
       repeat with tbn from 1 to tc
           
           try
               set current tab to tab tbn
               tell application "Safari"
                   print document 1
               end tell
               
               
           on error
               beep
           end try
       end repeat
   end tell
   
end tell

Last edited by jwaldmann (2013-03-05 05:02:46 pm)

Offline

 

#21 2013-03-05 05:05:26 pm

McUsrII
Member
Registered: 2012-11-20
Posts: 949
Website

Re: Script to download urls and filter

Hello.

Applescript:

repeat with tbn from 2 to tbc

Just ask, but maybe you should have a glance at the AppleScript Language Guide? It is free, and well written! smile

You will find it over at ADC


Filed under: safari, ASLG

Offline

 

Board footer

Powered by FluxBB

[ Generated in 0.063 seconds, 10 queries executed ]

RSS (new topics) RSS (active topics)