Parse Safari Webpage and save to file

I’m looking for a script that will take the results of a dictionary search in Safari (dictionary.com for example) and append the word and definition to a file. I want to be able to keep a list of all the new words I learn with out having to cut and paste every time. Any suggestions or ideas of where to look as a starting point?

Hi bryce,

recently I wrote a html parsing script for a similar task. I made some modifications so it works with ‘dictionary.reference.com’.

It mainly does this:

  • load the html page for a search string in a temp file
  • uses a perl script to parse the html and generates a ‘filtered’ version
  • opens it in safari
property pl1 : "#! /usr/bin/perl
$test = `cat "
property pl2 : "`;
@array = split(/\\n/,$test);
$printthis = 0;
print \"<meta http-equiv=\\\"content-type\\\" content=\\\"text/html; charset=utf-8\\\">\";
print \"search string = $ARGV[0]<br><br>\\n\";
print \"results:<br><span class=\\\"line\\\"></span>\";

foreach (@array) {
	if ($_ =~ /\\s*<!-- begin .* -->\\s*/) {
		$printthis = 1;
	} elsif ($_ =~ /\\s*<!-- end .* -->\\s*/) {
		print \"<span class=\\\"line\\\"></span>\";
		$printthis = 0;
	} elsif ($_ =~ /<link.*text.css.*/) {	
		print \"$_\\n\"; 
	} elsif ($printthis == 1) {
		print \"$_\\n\";
	}	
}"
property plpath : "/usr/bin/parse_result.pl"
set tmp to POSIX path of (path to temporary items)
set tmpResult to tmp & "result_tmp.htm"
set plscript to pl1 & tmpResult & pl2

try
	get alias (plpath as POSIX file)
on error
	-- does not exist
	display dialog "This applications installs a perl script in your /usr/bin/ directory. You will be asked to enter your administrator password."
	do shell script "echo " & quoted form of plscript & " > " & plpath & "; chmod 755 " & plpath with administrator privileges
end try

set tmp2 to tmp & "result_tmp2.htm"

set searchString to (text returned of (display dialog "Search for?" default answer ""))
(do shell script "curl " & "http://dictionary.reference.com/browse/" & searchString & " -o " & quoted form of tmpResult)

set filteredPage to (do shell script "parse_result.pl " & searchString & " > " & quoted form of tmp2)

tell application "Safari"
	open tmp2 as POSIX file
end tell


If you don’t want to store html … have a look at the shell command ‘textutil’ (Terminal- ‘man textutil’).

Note: the search string needs to be URL encoded - in case it contains spaces or accented characters you will have to add a routine for this.

Hope that helps,

D.

Hi,

This is a previous script of mine that uses dict.org and writes the definition to a file and opens it in TextEdit:


-- Variables
set working_folder to path to desktop folder from user domain as string
set doc_name to "dictionary.txt"
set doc_path to working_folder & doc_name

--  For input
set the_word to text returned of (display dialog "Enter word." default answer "")

-- Do work
set theDef to get_definition(the_word)

-- Extract definition only
try
	set {TIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {return}}
	set theDef to ((paragraphs 5 thru -4) of theDef & "----" & return) as text
	set AppleScript's text item delimiters to TIDs
on error
	set theDef to the_word & return & "  No definition found." & return & "----" & return
end try

-- Write def.
WriteToFile(doc_path, theDef, false)

-- Open in TextEdit.
tell application "TextEdit"
	activate
	open file doc_path
end tell

-- Handler to fetch the word's definition.
on get_definition(the_word)
	try
		-- Note :wn (uses WordNet only)
		return do shell script "cURL " & (quoted form of ("dict://dict.org/d:" & the_word & ":wn"))
	on error the_err
		return the_err
	end try
end get_definition

--handler to write text to text file with flag to clear text before
on WriteToFile(theFile, theText, clearFLag)
	try
		open for access file theFile with write permission
	end try
	if clearFLag is true then
		set eof of file theFile to 0
	end if
	write theText & return to file theFile starting at eof
	try
		close access file theFile
	end try
end WriteToFile

Best wishes

John M