getSource of ie document

have been trying to do this GetSource in IE but no luck have also tried curl but access denied.

What i want to do is get the source and see if it contains a word if so then do something.

so:


tell application "Internet Explorer"
	OpenURL "http://www.google.co.uk/"
	set thesource to do shell script "curl http://www.google.co.uk"
	if thesource contains "Advertise" then
		display dialog "hello"
	end if
	
end tell

This display hello in a dialog, and it works but i cant get it to work with getSource. any ideas? curl wont seem to work at all.

Do you have to use Internet Explorer? Its dictionary is seriously FUBAR’d

I’m not sure why the curl version doesn’t work (seems to work fine here), but if you need to see the page then Safari would be a better bet (despite its apparent lack of dictionary)

tell application "Safari"
    set URL of document 1 to "http://www.google.co.uk"
    set theSource to source of document 1
    if theSource contains "Advertise" then 
        display dialog "hello" 
    end if
end tell

Any way to extract text from this source file between a couple of key words or maybe text that is matched in a regex and save this text to another text file?

Thanks.
SA :smiley:

Once you’ve got the source in an AppleScript variable, you can do anything to it you like. There are a number of third-party regex additions that will let you perform regular expression parsing on the text, or you can use text item delimiters to break the text up, like:

set theSource to do shell script "curl [url=http://www.google.co.uk/]http://www.google.co.uk/"[/url]
set {oldDelims, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "<input>"}
set theParts to text items of theSource
-- set the TIDs back for safety:
set AppleScript's text item delimiters to oldDelims

At this point, theParts will be a list of text items corresponding to theSource broken up in parts based on each occurrence of the string “”, so the first item will be the text up to the first , the second item will be the text between the first and the second “” tags, etc.