Wikipedia...

If I get text returned of a dialog box, how do I then search Wikipedia for the article, get the right page and then display the first line of the first paragraph of the printable version in a dialog box?

All I have is…

set theSearchText to text returned of (display dialog "What would you like to search Wikipedia for?" default answer "")
    (* "create" the url *)
    set theURL to "http://en.wikipedia.org/wiki/Special:Search?search=" & theSearchText & "&go=Go"
    
    --try to open the url
    try
        tell application "Safari"
            activate
            set URL of front window to theURL
        end tell
    on error (err)
        display dialog "An error was encountered:
" & err
    end try

Hello.

I am a little bit soar about this, because I had to use Safari, since I didn’t find a way to use curl.
This script works under the assumption that you always get some output if you are searching Wikipedia.
It should really have been strengthened with a test for a network connection.

So please: if anyone has the right curse for curl, for a search string that may work.
By the way I’m on SL, is there anything I have to do, or is there any thing I should check???


set AppleScript's text item delimiters to ""
set theSearchText to text returned of (display dialog "What would you like to search Wikipedia for?" default answer "")
(* "create" the url *)
set theURL to "http://en.wikipedia.org/wiki/Special:Search?search=" & theSearchText & "&go=Go"

try
	tell application "Safari"
		activate
		make new document at front
		set URL of its document 1 to ""
		set URL of its document 1 to theURL
		set mySource to ""
		repeat until mySource is not ""
			set mySource to the source of document 1 as string
		end repeat
		set mySource to text item 1 of my extractAllBetween(mySource, "<p>", "</p>")
		if mySource is {} then
			my inform("Not found or connection down")
			tell its window 1 to close
			return
		else if mySource is "" then -- too many hits
			return
		end if
	end tell
on error (err)
	display dialog "An error was encountered:\n" & err
end try

set mySource to mySource as string

set res to (do shell script "echo " & quoted form of mySource & " |textutil -convert txt -format html  -stdin -stdout ")
if (offset of "may refer to" in res) ≠ 0 then
	tell application "Safari" to activate
else
	tell application "Safari" to close its window 1
	inform(res)
end if

to extractAllBetween(SearchText, startText, endText) -- Yvan Koenig
	local tid, liste
	set tid to AppleScript's text item delimiters -- save them for later.
	set AppleScript's text item delimiters to startText -- find the first one.
	set liste to text items of SearchText
	set AppleScript's text item delimiters to endText -- find the end one.
	-- causes "copy text item 1" to only copy text before the end text.
	set extracts to {}
	repeat with subText in liste
		if subText contains endText then
			copy text item 1 of subText to end of extracts
		end if
	end repeat
	set AppleScript's text item delimiters to tid -- back to original values.
	return extracts
end extractAllBetween

on inform(theMessage)
	tell me
		activate
		display alert theMessage
	end tell
end inform

Edit
You may want to incorporate this to get rid of Safari altogether. But you have to figure out the curse for your self.
Please post an update if you make it.


script curl -- Dylan Weber -- http://www.macscripter.net/viewtopic.php?id=33209
	on retrieve(x)
		return (do shell script "curl -s -S " & x) as string
	end retrieve
	on retrievepro(x, i)
		return (do shell script "curl -s -S " & i & space & quoted form of x) as string
	end retrievepro
	on post(x, i)
		return (do shell script "curl -s -S -d \"" & i & "\" " & quoted form of x) as string
	end post
end script
my curl's retrieve("http://www.google.com/")

Hi,

there are too many different results depending on the search term.
For a starting point this creates the URL and reads the plain printable text from Safari


property wikiLanguage : "en"

set theSearchText to text returned of (display dialog "What would you like to search Wikipedia for?" default answer "")
set theURL to "http://" & wikiLanguage & ".wikipedia.org/w/index.php?title=" & theSearchText & "&printable=yes"
tell application "Safari" to set URL of document 1 to theURL
if page_loaded(10, theURL) then
	tell application "Safari" to set theText to text of document 1
	theText
end if


on page_loaded(timeout_value, theURL)
	delay 2
	repeat with x from 1 to 3
		repeat with i from 1 to the timeout_value
			tell application "Safari"
				if (do JavaScript "document.readyState" in document 1) is "complete" then
					if name of window 1 is "Untitled" or name of window 1 contains "Failed" then
						set URL of document 1 to theURL
						delay 5
						exit repeat
					end if
					return true
				else if i is the timeout_value then
					return false
				else
					delay 1
				end if
			end tell
		end repeat
	end repeat
	return false
end page_loaded


Hello.

I didn’t think about the chance of getting more than one result.

I updated the code in the post above to just leave you at the search results page of WikiPedia if there were more than one hit. You get a message if not results were obtained, and shown a dialog box with the results if there were on hit. -The definition. I’ll be off looting some of Stefan’s code :D.

This type of task is useless. Here’s what you have to do. You create the search url and then use curl to download the source code of the returned results. Try this…

property wikiLanguage : "en"

set theSearchText to text returned of (display dialog "What would you like to search Wikipedia for?" default answer "")
set theURL to "http://" & wikiLanguage & ".wikipedia.org/w/index.php?title=" & theSearchText & "&printable=yes"
do shell script "curl " & quoted form of theURL

Look at the results from that script. You then need to parse that and extract out the result url you want… then use curl to get the source code from that url, and finally extract out the line of code you want.

The problem with all of this is that even if you could figure that out, next week wikipedia might change their website such that your parsing techniques no longer work. So then you have to adjust your script… and then the following week they change it again. You can’t win this battle.

FYI: an easier way to open the url is using “open location” command… rather than all the “Safari” code.

open location theURL

Hey, thanks a lot for all of the help !! I feel sorta dumb compared to you guys… how do I extract the first sentence of a paragraph though?

Hello.

Please run the script I created for you, If you look at the event log of the AppleScript Editor you may see how it is done. Try it and enter “apple” in the search field, and you should get returned the definition of an apple.

The line below extracts the first paragraph of the retrieved document if you got returned a page for a for your search term

If you got no hits at all, then you should receive nothing {}.

If you got more than one hit, that is a page with alternatives then the call below should return “”.


 set mySource to text item 1 of my extractAllBetween(mySource, "<p>", "</p>")

Please try my code it Works and is what you asked for. Just the first script as it is, which works with Safari, if you get more than on hit for your search, then you are left with the results window, if you got just one hit, then you will receive the definition term in a dialog box, and a message that no hits were found, if there were no hits.

For your question about how to extract the first sentence :


set mp to "A horrible paragraph. Which continues over several sentences. Even more than two."
set ms to text 1 thru (offset of "." in mp) of mp ”> "A horrible paragraph."

Mcuser, your script works like 3/4 of the time but for example if I type “television” it goes to the right page and all, but then has an error - “An error was encountered:
Can’t get text item 1 of {}.” and then displays another alert dialog that is blank.

Also, if there are multiple items which the word may refer to, it doesn’t seem to list them quite properly eg. when I typed in “sofa”, there were multiple results but the dialog simply said “Sofa may refer to:”

But just so you know, your work is really appreciated as I know I (almost definitely) wouldn’t have been able to do this on my own.

Hello.

I’ll try with your search words, and see if I can improve it. :slight_smile: It worked for apple, and a word wich gave several hits, so I guessed the layout were such that it would work all of the time. I’ll be back

Hello.

I have updated the code in the post #2 above.

Television works almost for me, there is an encoding issue left however. I believe the page to be encoded with iso-8858-1. I’ll try to fix that. As for “sofa”, I have now implemented a test for the phrase “may refer to” in the first paragraph, and if you meet such an “ambiguous” page, then you are left with it.

I cant think of no other reason that television didn’t work for you than that you may have some issue with your - or apples ! text item delimiters. :slight_smile: Try to run this simple one liner, and then the script several times after. If it works every time then, then you should consider pasting that one line into the very top of the script.


set AppleScript's text item delimiters to ""

That is awesome, thanks a lot :smiley: !! I added that one line

and I think it works 100% of the time now!

EDIT: I think I’m learning something… test things properly, and don’t make assumptions.

Unfortunately, for several words, eg. help, pepper, duck, fork it opens the web page and all but shows a dialog saying “An error was encountered:
Can’t get text item 1 of {}.” and then a blank dialog shortly after.

It turns out that when I run that one line seperately in another script, on it’s own, then your wikipedia script it works fine! However, inserted into the top of your wikipedia script it doesn’t seem to be working. sigh

Hello.

I have done some minor modifications, the fault is really, at least it was at my place, that I have another homepage that were loaded before the URL was entered, and this was then perceived to be the source of the document.

I think I have fixed that issue now, and that may be the cause of you perceiving having gotten the correct output, still receiving an error message. I think it is fixed by now, but if it is not, then we’ll use Stefan’s eminent handler! :slight_smile:

Greek characters sucks! -They are literally encoded as unicode characters in the television web page as utf-16, while the rest of the page is utf-8, there is really nothing I can do about it.

Whew! That finally, properly works. Thanks a lot!! By the way, I have a webpage thats loaded first too - my homepage (google). You said

…So am I. Works. :slight_smile:

Also, quick question - is there a way to get Safari to run in the background? eg. not show the windows in the foreground in front of the currently open (front) application.

of course, remove all activate commands in the Safari tell blocks

Aha. Ok