Scrapping select content from a Web Page in FireFox

HI all …

Using BBEdit, I wish to scrap (cURL?) select content from a Web Page in FireFox, and then paste in this info in a given format (at curser location).

(note: TextWranger is BBEdit’s little brother, so it would do fine assuming you don’t have BBEdit)

  1. The info is located at the same place, every time.
  2. There are generally two lines

I have to pull the device requirements from a iTunes page, and then format it a very specific way, go back again,and grab the application size, and then return, and paste at the end.

Is this possible? Please contact me I’ll, be delighted to pay anyone for their time.

Thanks!

Hello.

It is hard to achieve that with FireFox (15.01). Do you have to use FireFox? I see no way of doing this, but I may lack creativity.

You can’t get access to the URL of the FireFox window, and the document behind the window appears to have missing value with Script Debugger. :slight_smile:

So… you can’t execute a javascript to “get” the selection programatically. though javascript:document.getSelection() works from the url when entered manually. But as long as you can’t do that, keystroking the copy won’t work. :frowning:

Good Luck!

wow, I was just doing some data mining here. I can certainly use a javascript first, even if to use it to call the content itself from the web page. Then (he thinks) it should be a simple matter of getting BBEdit to format the text once it’s been pasted in.

In fact, I can this as a tw-step operation, and I’m definitely ok with that.

What I need to do is copy this info from an iTunes page:

https://itunes.apple.com/app/cute-savio/id606413378

(note the URL will be different every time)

I need to copy this info:

Compatible with iPhone, iPod touch, and iPad. Requires iOS 4.3 or later. This app is optimized for iPhone 5.

and …

Size: 14.3 MB

Note these are two physical locations on the same web page.

As long as I can click an insertion point inside BBEdit and get the stuff “in there”, that would be miles ahead of where we are now. :slight_smile:

I can see a javascript to call the contents of forms from a web page. The Applescript that does the rest of the grunt work could be stashed back (parked) in the scripts folder.

Does this help in any way?

Hi Ray,

q&d without any browser


set theSource to do shell script "curl [url=https://itunes.apple.com/app/cute-savio/id606413378]https://itunes.apple.com/app/cute-savio/id606413378"[/url]
set {TID, text item delimiters} to {text item delimiters, "<span class=\"label\">Size: </span>"}
set theSize to trimLessThan(text item 2 of theSource)
set text item delimiters to "<span class=\"app-requirements\">Requirements: </span>"
set theRequirements to trimLessThan(text item 2 of theSource)
set text item delimiters to TID

display dialog "Size: " & theSize & return & "Requirements: " & theRequirements buttons {"Cancel", "OK"} default button "OK"

on trimLessThan(theText)
	set lessThanOffset to offset of "<" in theText
	return text 1 thru (lessThanOffset - 1) of theText
end trimLessThan

ok Stefan, as always you save my bacon. :slight_smile:

Possible to inquire for two additions?

  1. instead of having the itunes URL buried inside the script, can we have the script invoke a dialog, so that I can paste in the URL?

  2. instead of placing the info in a dialog, do: it would be awesome to get it inside BBEdit, but lacking that, just getting it on the clipboard? Then I could simply paste it into BBEdit instead.

Great Job!

If the URL is on the pasteboard anyway, use


set theURL to the clipboard
try
	set theSource to do shell script "curl " & quoted form of theURL
	
	set {TID, text item delimiters} to {text item delimiters, "<span class=\"label\">Size: </span>"}
	set theSize to trimLessThan(text item 2 of theSource)
	set text item delimiters to "<span class=\"app-requirements\">Requirements: </span>"
	set theRequirements to trimLessThan(text item 2 of theSource)
	set text item delimiters to TID
	set the clipboard to theSize & return & theRequirements
	
on error e
	display dialog e buttons {"Cancel", "OK"} default button "OK"
end try

on trimLessThan(theText)
	set lessThanOffset to offset of "<" in theText
	return text 1 thru (lessThanOffset - 1) of theText
end trimLessThan

BBEdit is pretty well scriptable, so it’s probably not necessary to copy & paste.
However the result separated by a return character is copied to the clipboard

bless you! :slight_smile:

when I run as an app, I do get the contents (and the contents are sent to the clipboard), but there is an error telling me it can’t curl the page?

if the clipboard does not contain an URL right before running the script, then you get the error

Here is a little bit more elaborate curl command that hopefully works. It says it is a request from a browser.

curl --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.55.3 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10" https://itunes.apple.com/app/cute-savio/id606413378

Sorry about that Stefan. I only tried to help. :slight_smile:

oh, you DID help! :slight_smile:

It actually worked fine on the next go around.

Thank you!

What kind of user agent is that :stuck_out_tongue:

Hello.

One that works. :stuck_out_tongue: I guess it is the long description for the user agent for Safari, it works with both wget and curl.

This works for me:

set theURL to (the clipboard)

set theInfo to (do shell script ("curl " & quoted form of theURL & " | sed -En '/<span class=\\\"(app-requirements\\\">Requirements|label\\\">Size): </ { h ; s|^.*<span class=\\\"app-requirements\\\">Requirements: </span>([^>]+)<.*$|\\1|p ; g ; s|^.*<span class=\\\"label\\\">Size: </span>([^>]+)<.*$|Size: \\1|p ; }'"))

tell application "TextWrangler"
	activate
	set contents of selection of front window to theInfo -- Presumably the same code for BBEdit.
end tell

Awesome Nigel! It just keeps getting better and better.

Now I know why I love this site so much. :smiley: