Help with Offset Selection (or more?!)

Hi everyone,

I’m fairly new to Applescript and I’m trying to make a program that may be a bit over my head, but so far I’ve made some good headway (my script looks very amateur as I haven’t coded in almost 4 years).

Ultimately I am trying to take a string of text (a software title) and return the company name for a database I’m compiling. You can probably get the gist of what I’m doing from what I’ve written thus far, which is listed below. I’ve used TextWrangler and BBEdit in several instances due to their extensive library of commands.

The problem I’m having now is that I’m trying to select just the company name or at the very worst the lines containing the company name in the footer of almost all websites (e.g. see below, in “© Copyright 2002“2005 Rickard Andersson”, I would just like to select “Rickard Andersson”).

Any help would be greatly appreciated. P.S. the Automator script I run simply views the source of the Safari webpage and copies it to the clipboard.

Thanks!

Peter Leenhouts


tell application "TextWrangler"
	activate
	set countNum to {count lines of window 1}
	repeat with i from 1 to countNum
		tell window 1
			select line i
		end tell
		copy selection
		set textEntry to selection
		set theURL to "http://www.google.com/search?hl=en&q=" & textEntry & "&btnI=I%27m+Feeling+Lucky"
		tell application "Safari"
			activate
			make new document with properties {URL:theURL}
			delay 10
			tell application "Automator Launcher"
				set workflow to "/Users/pleenhou/Desktop/Coding Project/view_source.workflow"
				set macWorkflow to POSIX file workflow as text
				open macWorkflow
				delay 2
			end tell
			tell application "BBEdit"
				activate
				paste clipboard in window 1
				find "&copy" searching in text 1 of window 1 options {search mode:literal, starting at top:false, wrap around:false, backwards:true, case sensitive:false, match words:true, extend selection:false} selecting match 1
				set countChar to contents of selection
				if (count countChar) ≥ 1 then
					add suffix selection suffix "
"
					copy selection
				else
					find "copyright" searching in text 1 of window 1 options {search mode:literal, starting at top:false, wrap around:false, backwards:true, case sensitive:false, match words:true, extend selection:false} selecting match 1
					set countChar2 to contents of selection
					if (count countChar2) ≥ 1 then
						add suffix selection suffix "
"
						set offset selection
						copy selection
					else
						find "©" searching in text 1 of window 1 options {search mode:literal, starting at top:false, wrap around:false, backwards:true, case sensitive:false, match words:true, extend selection:false} selecting match 1
						add suffix selection suffix "
"
						copy selection
					end if
				end if
				tell application "BBEdit"
					activate
					set countLine to {count lines of window 1}
					select lines 1 thru countLine of window 1
					delete selection
					paste clipboard in window 2
				end tell
				delay 2
				set countLine to {count lines of window 1}
				select lines 1 thru countLine of window 1
				delete selection
			end tell
			tell application "Safari"
				close window 1
			end tell
		end tell
	end repeat
end tell

Model: Macbook Pro C2D 2.16
AppleScript: 1.10.7
Browser: Safari 419.3
Operating System: Mac OS X (10.4)

If I understand what you’re trying to do, you can do most of it within a script without resorting to all of the external apps and Automator actions. Can you give an example of an Application you might be searching for and what the answer might look like so I don’t have to sort out what the Automator action portion of your script does? Perhaps the best illustration would be what you would do without the script.

The automator portion of the code is used to view the source code of the loaded page and then copy that to the clipboard. I wasn’t exactly sure how to do that in applescript, but found an easy solution in Automator.

Hi Peter,

first welcome to MacScripter.

As Adam mentioned, it would be very helpful to get an example of a real search term and the expected result

Here is an example of a search term that would be used: “Pinnacle Instant Copy”

::obviously, the company name is included in this one, but occasionally it is not::

This term is then used in a Google - “I’m Feeling Lucky” Search

The desired result, if you look at the bottom of the Google result which would be at
http://www.pinnaclesys.com/PublicSite/us/Home/
is “Pinnacle Systems, Inc.”, this occurs after the text “©2007”

Here’s how to get the HTML text of your URL as text in a variable called tHTML. After that you have to search it, but in my trials, that’s not easy because some sites use the word “copyright” more than once, some use the © symbol, etc.


set Ghead to "http://www.google.com/search?hl=en&q="
set Gtail to "&btnI=I%27m+Feeling+Lucky"
set tEntry to "NetNewsWire" -- you would put your grabbed entry here: perhaps "set tEntry to the clipboard".
tell application "Safari"
	open location Ghead & tEntry & Gtail
	delay 2
	repeat with t from 1 to 5 -- this is to wait for the page to be complete.
		if (do JavaScript "document.readyState" in document 1) is "complete" then
			exit repeat
		else
			delay 1
		end if
	end repeat
	set tHTML to source of front document
end tell

Thanks Adam!

That definitely helps speed up the process and now I don’t have to sit and let that Automator workflow run.

For searching the documents for “copyright”, &copy, etc., I have been doing a reverse search starting at the bottom and taking the first instance as that seems to be where most occur.

The only problem now is how do I select the company name or at the very least the line it occurs on?

Thanks again for your help!

Hi,

here a similar approach, which filters the lines either with “©” oder"Copyright".
The problem is, each company could use a different way to write the copyright line

set copyRightLines to ""
tell application "TextWrangler"
	-- activate
	set countNum to {count lines of window 1}
	repeat with i from 1 to countNum
		tell window 1
			select line i
		end tell
		copy selection
		set textEntry to selection
		set theURL to "http://www.google.com/search?hl=en&q=" & textEntry & "&btnI=I%27m+Feeling+Lucky"
		tell application "Safari"
			open location theURL
			my page_loaded(20)
			set s to text of document 1
		end tell
		set p to paragraphs of (do shell script "echo " & quoted form of s & " | grep 'Copyright\\|©'")
		if (count p) > 1 then
			repeat with j in p
				if j contains "20" then
					set copyRightLines to copyRightLines & contents of j & return
					exit repeat
				end if
			end repeat
		else
			set copyRightLines to copyRightLines & item 1 of p & return
		end if
		tell application "Safari" to close window 1
	end repeat
end tell

display dialog copyRightLines

on page_loaded(timeout_value)
	delay 2
	repeat with i from 1 to timeout_value
		tell application "Safari"
			if (do JavaScript "document.readyState" in document 1) is "complete" then
				return true
			else if i is timeout_value then
				return false
			else
				delay 1
			end if
		end tell
	end repeat
	return false
end page_loaded

Stefan you are a miracle worker!!

This is amazing! Only one thing. Is it possible to take the results and put them into TextEdit or something to create a list?

Thank you everyone so much! I really appreciate your help.

Nevermind, I got it to work!

Again thank you so much!

Peter

replace the display dialog line with this,
it writes the result in a file copyright.txt on your desktop

set ff to open for access file ((path to desktop as Unicode text) & "copyright.txt") with write permission
write copyRightLines to ff
close access ff

One other problem I’m having is that occasionally there are websites that do not contain any company information. Is there a way I can include a blank line for those sites?

I guess, you get an error message, if there is no copyright informations
Try this:

set copyRightLines to ""
tell application "TextWrangler"
	-- activate
	set countNum to {count lines of window 1}
	repeat with i from 1 to countNum
		tell window 1
			select line i
		end tell
		copy selection
		set textEntry to selection
		set theURL to "http://www.google.com/search?hl=en&q=" & textEntry & "&btnI=I%27m+Feeling+Lucky"
		tell application "Safari"
			open location theURL
			my page_loaded(20)
			set s to text of document 1
		end tell
		try
			set p to paragraphs of (do shell script "echo " & quoted form of s & " | grep 'Copyright\\|©'")
			if (count p) > 1 then
				repeat with j in p
					if j contains "20" then
						set copyRightLines to copyRightLines & contents of j & return
						exit repeat
					end if
				end repeat
			else
				set copyRightLines to copyRightLines & item 1 of p & return
			end if
		on error
			set copyRightLines to copyRightLines & textEntry & ": no copyright information" & return
		end try
		tell application "Safari" to close window 1
	end repeat
end tell

set ff to open for access file ((path to desktop as Unicode text) & "copyright.txt") with write permission
write copyRightLines to ff
close access ff

on page_loaded(timeout_value)
	delay 2
	repeat with i from 1 to timeout_value
		tell application "Safari"
			if (do JavaScript "document.readyState" in document 1) is "complete" then
				return true
			else if i is timeout_value then
				return false
			else
				delay 1
			end if
		end tell
	end repeat
	return false
end page_loaded

Alternatively:

tell application "TextWrangler"
	set theseItems to contents of lines of front text document
end tell

set copyrightLines to {}

repeat with thisItem in theseItems
	try
		do shell script "/usr/bin/python -c 'import sys, urllib; print urllib.quote(unicode(sys.argv[1], \"utf8\"))' " & quoted form of thisItem -- encode query
		do shell script "/usr/bin/curl --silent --show-error --location --user-agent '' " & quoted form of ("http://www.google.com/search?hl=en&q=" & result & "&btnI=I%27m+Feeling+Lucky") & ¬
			" | /usr/bin/ruby -e 'print $stdin.read.gsub(/<br ?\\/?>/, \"\\n\").gsub(%r{</?[^>]+?>}, \"\")' | /usr/bin/grep 'Copyright\\|©'"
		
		set end of copyrightLines to last paragraph of result
	on error errMsg number errNum
		if errMsg is "The command exited with a non-zero status." then
			set end of copyrightLines to thisItem & ": no copyright information"
		else
			error errMsg number errNum
		end if
	end try
end repeat

set ASTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 10
set copyrightLines to "" & copyrightLines
set AppleScript's text item delimiters to ASTID

writeFile from copyrightLines into ((path to desktop as Unicode text) & "copyright.txt") without appending



on writeFile from someData into someFile given appending:appending
	try
		open for access someFile with write permission
		set fileRef to result
		if not appending then set eof of fileRef to 0
		
		write someData to fileRef as (class of someData) starting at eof
		close access fileRef
		return true
	on error errMsg number errNum
		try
			close access fileRef
		end try
		
		error errMsg number errNum
	end try
end writeFile

I’ve been fooling around trying to get the scripts to return just one line per search, thus if I line them up in excel they would properly match. However, some titles have been returning 2 or 3 lines of results. Is there a way to limit the results to a single line?

Thanks,

Peter