curl and Apple Technotes.

Hello

Here is the link to an Apple Technical Note : http://support.apple.com/kb/TS5482
I’m able to get its contents thru Safari but I can’t find the correct syntax to get it using curl.

When I run this script :

do shell script "curl " & "http://macscripter.net/viewtopic.php?id=43271"

I get the contents of the Web page.

When I run :

set theURL to "http://support.apple.com/kb/TS5482"

do shell script "curl " & theURL

I just get an empty string.

Is one of you knowing the correct incantation ?

Thanks in advance.

No need to hurry, I’m tired and switch off just after sending this message.

Yvan KOENIG (VALLAURIS, France) mercredi 5 novembre 2014 21:35:18

Up

Yvan KOENIG (VALLAURIS, France) jeudi 6 novembre 2014 18:32:39

Hi Yvan,

it’s maybe a redirect issue. Try


set theURL to "http://support.apple.com/kb/TS5482"

do shell script "curl -L " & theURL

Thanks Stefan

It works well.
So now I have a draft of the code which I need :


set theURL to "http://support.apple.com/kb/TS5482?viewlocale=fr_FR"

set rawXML to do shell script "curl -L " & theURL
if rawXML does not contain "s.pageName=" & quote & "acs::kb::404 error::we're sorry." then
	if rawXML does not contain "<div class=" & quote & "mod-date" & quote & ">" then
		# It's not a page "Welcome to Apple Support"
		set lineWithDate to first paragraph of item 2 of my decoupe(rawXML, "<div class=" & quote & "mod-date" & quote & ">")
		set theModDate to item 2 of my decoupe(lineWithDate, quote & space)
		set lineWithNumNote to first paragraph of item 2 of my decoupe(rawXML, "var documentId = '")
		set theNumNote to item 1 of my decoupe(lineWithNumNote, "';")
		
		set lineWithTitle to first paragraph of item 2 of my decoupe(rawXML, "s.pageName=")
		set theRawTitle to item -1 of my decoupe(lineWithTitle, "::")
		if theRawTitle contains "&#" then
			tell application "ASObjC Runner"
				set theRawTitle to modify string theRawTitle so it is unencoded for XML
			end tell
		end if
		if text -11 thru -1 of theRawTitle is in {" (fr_fr)" & quote & ";", " (en_us)" & quote & ";"} then set theRawTitle to text 1 thru -11 of theRawTitle
	end if
end if

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

I will search in pieces of code already posted by Shane Stanley if I find a better way to unencode the strings looking like :

“iPhoto doesn&# 39;t show an option to share to Weibo in OS X Yosemite (fr_fr)";”
I inserted a space before 39 so that the embedded entity isn’t deciphered by the message parser.

Yvan KOENIG (VALLAURIS, France) jeudi 6 novembre 2014 20:36:19

Hello

I’m definitely an ass with Objective C.
I’m unable to find the equivalent of the piece of code


if unTitre contains "&#" then
	tell application "ASObjC Runner"
		set unTitre to modify string unTitre so it is unencoded for XML
	end tell
end if

when unTitre is this kind of string :

“iPhoto doesn&# 39;t show an option to share to Weibo in OS X Yosemite (fr_fr)";”

I had a look in XCode help but found only a list of codes describing numerous encodings but nothing related to XML encodind/unencoding.

Yvan KOENIG (VALLAURIS, France) vendredi 7 novembre 2014 17:35:40

Hello Yvan.

Maybe the parsing is easier for you with the the XML Suite of System Events? I figured first that plutil was ok, if you were to just extract some keys you knew of with it. But alas, many xml files seems to be broken after they added json support to it (plutil)…

Other problem.

Running :

do shell script "curl -L [url=http://support.apple.com/kb/HT1014?viewlocale=fr_FR&locale=en_US]http://support.apple.com/kb/HT1014?viewlocale=fr_FR&locale=en_US"[/url]

never finish.

I tried to use an auxiliary parameter :

do shell script "curl --max-time 20 -L [url=http://support.apple.com/kb/HT1014?viewlocale=fr_FR&locale=en_US]http://support.apple.com/kb/HT1014?viewlocale=fr_FR&locale=en_US"[/url]

but it never ends too.

Is there a working scheme to get rid of that ?

It’s puzzling because Safari is able to open the referenced web page.

Yvan KOENIG (VALLAURIS, France) vendredi 7 novembre 2014 19:02:14

as far as I know, there is no single method to decode HTML entities.
However there are a few NSString categories like: http://www.cocoanetics.com/2011/01/html-entities/

Hello Stefan

ASObjC Runner does the job.
I’m just in search of a way to do that triggering Objective C because Shane wrote that ASObjC Runner is more or less obsolete with Yosemite.

Yvan KOENIG (VALLAURIS, France) vendredi 7 novembre 2014 19:38:24

Shane means that in Yosemite you’re able to run AppleScriptObjC code in scripts without using libraries but there are a lot of complex functions in ASObjC Runner that makes the tool very useful though

Maybe I wrongly understood Shane.

Yvan KOENIG (VALLAURIS, France) vendredi 7 novembre 2014 21:00:26

No, you understood correctly. But this is one of those things that is not directly accessible using ASObjC. The solution is to use the ASObjCExtras framework, a new version (1.1.0) of which I released the other day. (The previous version could do it too, but there’s an AppleScript bug that complicated things.)

So the equivalent of the ASObjC Runner command would be:

set unTitre to current application's SMSFord's stringFrom:unTitre makingIt:"UnecodedForXML"

And the script would need the line:

use framework "ASObjCExtras"

The framework can be in ~/Library/Frameworks, /Library/Frameworks, or the /Contents/Frameworks folder of a bundled applet or .scptd file.

Stefan,

Most of Runner’s functions that aren’t accessible directly from vanilla ASObjC have been transferred to the ASObjCExtras framework. That means many of the list and text commands, along with the trigonometry stuff. ASObjCExtras was written to be the successor of Runner.

ASObjC Runner still runs OK, and may keep doing so for years – but it also may fail. I’d rather people start moving away from it now, than when it happens.

ASObjCExtras also covers two of the current shortcomings of ASObjC simply: the ability to convert dates, and the fact that coercion of numbers to reals loses precision.

In case anyone missed it:

www.macosxautomation.com/applescript/apps/ASObjCExtras.html

Same price as ASObjC Runner, too :wink:

And if anyone can think of any that haven’t, I’d like to know…

Hey Yvan,

It’s generally a good idea to quote your urls in curl to prevent any shell-expansion voodoo.

This works for me.


do shell script "curl -Ls -A 'Opera/9.70 (Linux ppc64 ; U; en) Presto/2.2.1' 'http://support.apple.com/kb/HT1014?viewlocale=fr_FR&locale=en_US'"

I also like to use a user-agent string to make the relevant server think curl is a browser.

Thanks.
In fact, the simple fact to quote the link was solving the problem.

do shell script "curl -L 'http://support.apple.com/kb/HT1014?viewlocale=fr_FR&locale=en_US'"

I don’t understand what’s the need for that because there is nothing requiring to be quoted in the link.
In practice, I will play safe and use your code.

Yvan KOENIG (VALLAURIS, France) samedi 8 novembre 2014 12:54:40

Hey Yvan,

Remember that you’re in the shell rather than AppleScript. It has a different set of reserved characters.

Hello.

Sometimes it is useful to have a useragent string, here are two ways: one for Safari to generate it, and something you can save as a html file, load into your browser of choice, and copy the user agent string to clipboard from there. (Stolen from w3schools.com.)

tell application "Safari" to tell current tab of window 1 to set the clipboard to (do JavaScript "navigator.userAgent;")

Yep. See my post above.

I always use a user-agent string with curl or wget, unless I’m just testing something. In general it’s a good practice to keep the host from knowing you’re not a browser.

The JavaScript is handy. Thanks.

Note that you can change the user-agent in Safari using the Develop menu and get a few of the common ones.

A more comprehensive list is available here.

The ampersand and is-equal-to symbols in the URL are special characters in Bash and will be interpreted and/or substituted. Therefore it’s better to quote URLs with single quotes, or even better, use quoted form of.