Searching HTML for string

Can anyone help me out with searching a web page’s HTML for a certain string? Is it possible to do it without downloading the HTML? I tried with downloading:

set HTML to (do shell script “curl http://www.google.com”)

set displaystring to “Not Found”
set stringtofind to “google”
if stringtofind is in HTML then set displaystring to “Found”
display dialog displaystring

and it always comes up with not found.

I would love code which doesn’t download

I’m trying to convert from autoit for windows to applescript.

Thanks for the help!

Your code works just fine here on my Mac with an active internet connection.

Therefor you should log the HTML output, maybe in your case curl is directed to another site:


set HTML to (do shell script "curl http://www.google.com")
log HTML
set displaystring to "Not Found"
set stringtofind to "google"
if stringtofind is in HTML then set displaystring to "Found"
display dialog displaystring

Just activate the Event Log tab at the bottom of the Script Editor window and have a look what is downloaded.

If the string I’m searching for is to far down the page then my code fails. I seem to remember reading about some limitation in size for curl, what can I do to download the entire page?

You could also try to use URL Access Scripting instead of curl:


on run
	-- getting an unused temp file path
	set tmpfilepath to my gettmpfilepath()
	-- downloading the HTML to the tempo file
	tell application "URL Access Scripting"
		set tmpfile to download "http://www.apple.com" to tmpfilepath
	end tell
	-- opening and reading the content of the temp file
	try
		set fileobj to open for access tmpfile
		set filecont to read fileobj
		close access fileobj
	on error
		try
			close access fileobj
		end try
	end try
	-- searching for the string
	set searchstring to "iPod"
	if searchstring is in filecont then
		tell me
			activate
			display dialog "Search string found!"
		end tell
	end if
	-- removing the temporary file
	do shell script ("rm " & quoted form of (POSIX path of tmpfilepath))
end run

-- I am returning a file path to an unused temporary file
on gettmpfilepath()
	set tmpfolderpath to (path to temporary items folder from user domain) as Unicode text
	repeat
		set randnum to random number from 1000 to 9999
		set tmpfilepath to (tmpfolderpath & randnum & ".tmp")
		try
			set tmpfilealias to tmpfilepath as alias
		on error
			exit repeat
		end try
	end repeat
	return tmpfilepath
end gettmpfilepath

Also Safari is scriptable and lets you access the source of a loaded website:


tell application "Safari"
	set htmlsource to source of document 1
end tell

Moreover DEVONagent/DEVONthink have powerful AppleScript libraries to process web items:


tell application "DEVONagent"
	set htmlsource to download markup from "http://www.apple.com"
end tell

Thank you so much for your detailed response. Do you have any thoughts on which would be the fastest?

My favourite solutions:

  1. curl

  2. URL Access Scripting

  3. Python script utilizing the urllib

  4. DEVONthink/DEVONagent

  5. Safari

      1. completely stay in the background, therefor I recommend them. I guess 1. - 3. are also the fastest solutions.