utf-8 and open location

You’re referring again to the unicode number and not the byte value. W3.org meant on user-level in that documentation, not from a technical-level. On technical-level it’s just an array of bytes and an é is just two byte values. They are encoded not because they are unicode but because both byte values are higher than 127. Also they mention that unicode characters are not allowed because (UTF-8) unicode characters uses always byte values higher than 127.

In another topic I asked you to run a piece of code, which you wouldn’t try. You should, and try opening a folder with a special character. When you’re not using unicode characters it won’t even open, when you use my example, that supports unicode characters, the file will open without problems.

Actually I may have worded it wrong, was I meant that (without going and finding it in the standard), is that extended characters are not permitted for urls. That is the meaning of what I got out of the w3.org standard for urls.

Now, we both know, that when having folders on your disk, or files, that you want to use in hyperlinks, this scheme is totally impractical, and kind of fascistic!

Still I do wonder in which RFC extended ascii characters for urls, are acknowledged.

Opening with the open command was never any problem, the problem is really that Safari has declared the url scheme file, so it handles it interally. That is I how I have understood it, so no matter the characters, it won’t open the folder, I have tried with nice characters, characters below 127, and nothing really happened.

Now, to really put it to a test, I’ll encode a file with your handler, and I’ll make an anchor tag out of it, and see if that opens, I know the result up front, but just to reach a conclusion! :slight_smile:

As for your code, if you go back and read, you’ll see that I indeed tried it.

Opening a url without any encoding necessary, well that actually worked, so I guess the encoding I have used, has messed it up!

Not only those Shane, but the SIMBL’s too. AsobjC-Runner, is a nice exception, in that it dies after a minute of idleness.

Whether those osaxen’s does something or not, there are still pages to be administred, and it eat cycles, and battery. That is my opinon, and I think I will stick with it!

Thanks for your code, consider it snagged, that is really the easy way out of it!

Hello!

You routine performed equally bad on the folder I had problems with Bazzie Wazzie. Looking at the properties at that folder, it was shared! When folders have the normal rights, I guess everything will work all right.

The folder I had problems with, had the same right for groups and everybody, I think that to be the obstacle, as I see Safari do something, but not enough!

I’ll use something else, than the old routines from the guidbook for the future!

Thanks for your help, both of you!

Not my idea originally!


set chars to "#$%##&/???????!#$%!!!!!!%&**@@**@@@@^^^"
set str to ""
repeat with i from 1 to (get random number from 5 to 9)
	set str to str & some character of chars
end repeat
tell application "SystemUIServer"
	activate
	display dialog str
	-- your code goes here...
end tell

I know what you mean but it’s not correct. I’ve dug into w3.org and the only thing they mention is that an valid URL is according to the RFC 3986. PHP is also according to the same standard and therefore it’s according to w3.org. So on w3.org itself there is nothing mentioned about URLs only that it should be conform RFC 3986. The RFC 3986, which I have read a few years ago, doesn’t mention anything about unicode characters whatsoever. It’s important that you only use 7-bit values so an é should be %C3%A9 and is completely supported nowadays and it’s still UTF-8, even in this form.

You can stop wondering because when the URL is encoded there are only 7-bit ascii characters in the URL. when we decode it, it’s an UTF-8 string again. It’s not only Inside Mac OS X, my own web servers, Google, Bing and MacScripter use all UTF-8 encoded URLs for instance.

It is true that an encoded character holds 7 bits :slight_smile:

Here’s a vanilla encoding handler which seems to work OK:

on URIEncode from str given |encoding reserveds|:encodingReserveds
	set unreservedChars to "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~"
	set reservedChars to ":/?#[]@!$&'()*+,;=%"
	
	set chars to str's characters
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {"«data rdat", "»"}
	
	considering case
		repeat with i from 1 to (count chars)
			set thisChar to item i of chars
			if not ((thisChar is in unreservedChars) or ((not encodingReserveds) and (thisChar is in reservedChars))) then
				-- This character needs to be encoded.
				-- Write it to a temporary file as UTF-8 and read it back as data.
				set fref to (open for access file ((path to temporary items as text) & "utf8.txt") with write permission)
				try
					set eof fref to 0
					write thisChar as «class utf8» to fref
					set d to (read fref from 1 as data)
				on error errMsg
					display dialog errMsg buttons {"OK"} default button 1
				end try
				close access fref
				
				-- Use the deliberate-error hack to get text containing a representation of the data object and extract the hex digits from that.
				try
					d as text
				on error errMsg
					set hex to text item 2 of errMsg
				end try
				--  Make up a list of texts consisting of hex-digit pairs prefaced with "%".
				set percentCodes to {}
				repeat with j from 1 to (count hex) by 2
					set end of percentCodes to "%" & text j thru (j + 1) of hex
				end repeat
				-- Repace the original character with the list of "%" codes.
				set item i of chars to percentCodes
			end if
		end repeat
	end considering
	
	-- Coerce the character list back to text.
	set AppleScript's text item delimiters to ""
	set encodedURL to chars as text
	set AppleScript's text item delimiters to astid
	
	return encodedURL
end URIEncode

-- Rubbish URL for testing.
URIEncode from "http://www.アsp.net/Às pqrs.html" without |encoding reserveds|
--> "http://www.%E3%82%A2sp.net/%C3%80s%20pqrs%EF%A3%BF.html"

-- The use of '. with |encoding reserveds|' should be obvious if you need it.

I knew someone would come up with the cumbersome data coercion :D… But, hey, at least you can say it’s vanilla.

EDIT: Nigel, it’s still not conform RFC 3986

Hello! :slight_smile:

This should give the correct encoding according to RFC 3986

I looked into it yesterday, writing the correct encoding routines in Applescript, it seems just too hard to do it is too many bytes floating around there. Not worth the effort! :slight_smile:

But. This must be considered as a vanilla solution?


tell application "Finder"
	set a to make new file at folder (path to desktop folder as text) with properties {name:"アsp.net/Às pqrs.html"}
	get URL of a
	log result
”>%E3%82%A2sp.net:A%CC%80s%20pqrs%EF%A3%BF.html (just the interesting part without path to desktop )
end tell

Edit
Finders decoding do in fact deviate from all of the routines found by google, that I cared to investigate! (The reason for this, is that Finder seem to encode the hfs path of the file!) :slight_smile:

I have now tested several routines for encoding and decoding url’s and my conclusion is that this is a very flaky business indeed! It was hard to find two sets that delivered the same results! I guess having heard a lot about RFC 3986 that Bazzie Wazzie’s php routines are fitting the bill, and I will seek no further :wink:


” DJ Bazzie Wazzie's encode/decode handlers
on rawURLEncode(str)
	return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' echo rawurlencode(fgets(STDIN)); '"
end rawURLEncode

on rawURLDecode(str)
	return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' echo rawurldecode(fgets(STDIN)); '"
end rawURLDecode


Well I only read section 2. :slight_smile: My test results seem to conform to that, whereas the A-grave in McUsr’s Finder URL does not. Presumably the Finder’s encoding its (or the filing system’s) private system for accented characters.

Edit: Satimage OSAX’s ‘escapeURL’ command gives the same result as my script with my URL. However, its ‘unescapeURL’ command successfully turns McUsr’s Finder result into a readable form and ‘escapeURL’ turns that result back into the Finder original! It’s obviously too pointlessly complex and boring a subject for a vanilla solution. :stuck_out_tongue:

Oh well, so, we do gamble on that Safari conforms to the RFC, don’t we?

Can of worms. I’ll stick with Bazzie Wazzie’s routines, and if that shceme ever breaks, then I’ll just file a bug with Safari!

I am not in the mood for more testing of this at the moment. :smiley:

The rejecting of opening folders with spaces in foldernames, escaped or unescaped, in furls, has put me back a little… :o

Here is the new version of my library, as you can see, there are superfluos parameters, for backwards compatibility, that you may remove, should you choose to use it.

I want to add, that when it comes to encoding items on my own disk, then I’ll use the URL of the item from finder, I’ll use this for links that I open locally on my machine, not server, until it breaks in Safari, which I doubt it will, as the file we were testing on, opened with the open command. I know that doens’t give any guarrantee though, as the open command can open folders with spaces in their name, but the file opened at least flawlessly in Safari!


-- URL LIB

script URLLib
	
	on isAvalidHtmlFileUrl(theUrl)
		local ok, astid
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to ":"
		if not text item 1 of theUrl is "file" then
			set AppleScript's text item delimiters to astid
			return false
		end if
		set AppleScript's text item delimiters to "."
		if not text item -1 of theUrl is "html" then
			set AppleScript's text item delimiters to astid
			return false
		end if
		
		set AppleScript's text item delimiters to astid
		return true
	end isAvalidHtmlFileUrl
	
	
	on decodefurl(anUrlFromABrowser)
		-- 27/08/12 Tested!
		-- konverterer escaped chars tilbake til normal 
		-- fjerner file, og local host. 
		-- localhost starter helt til å begynne med i tilfelle.
		local tmpUrl
		
		set tmpUrl to my rawURLDecode(anUrlFromABrowser)
		
		set tmpUrl to my privateHandlers's str_replace({substringToReplace:"file://", replacementString:"", OriginalString:tmpUrl})
		
		if (offset of "localhost" in tmpUrl) is 1 then set tmpUrl to text 10 thru -1 of tmpUrl
		
		return tmpUrl
	end decodefurl
	
	--  DJ Bazzie Wazzie http://macscripter.net/edit.php?id=154949
	on rawURLEncode(str)
		return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' echo rawurlencode(fgets(STDIN)); '"
	end rawURLEncode
	
	on rawURLDecode(str)
		return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' echo rawurldecode(fgets(STDIN)); '"
	end rawURLDecode
	
	on filepath_to_URL(this_file, encode_URL_A, encode_URL_B)
		set this_file to this_file as text
		set AppleScript's text item delimiters to ":"
		set the path_segments to every text item of this_file
		repeat with i from 1 to the count of the path_segments
			set this_segment to item i of the path_segments
			set item i of the path_segments to my rawURLEncode(this_segment)
		end repeat
		set AppleScript's text item delimiters to "/"
		set this_file to the path_segments as string
		set AppleScript's text item delimiters to ""
		return this_file
	end filepath_to_URL
	
	-->	set b to getIP from "https://127.0.0.1/path/to/file/"
	-->"127.0.0.1"
	to getIP from anUrl
		local a, b
		set a to offset of "//" in anUrl
		set b to offset of "/" in (text (a + 2) thru -1 of anUrl)
		set ipAddr to text (a + 2) thru (a + b) of anUrl
		return ipAddr
	end getIP
	
	
	
	script privateHandlers
		on str_replace(R) -- Returns modified string
			-- R {substringToReplace: _ent, replacementString: _rent,OriginalString: _str}
			local _tids
			set _tids to AppleScript's text item delimiters
			set AppleScript's text item delimiters to R's substringToReplace
			set _res to text items of R's OriginalString as list
			set AppleScript's text item delimiters to R's replacementString
			set _res to items of _res as string
			set AppleScript's text item delimiters to _tids
			return _res
		end str_replace
		
	end script
	
end script

Well it’s maybe nitpicking but what I mean is that an URL consist of a Scheme, Authority, Path, Query and Fragment. Queries and path for instance should be encoded differently. For instance I use urlencode php function encode the query part of an url and I use rawurlencode to encode the path component of the URL. Also a scheme is different encoded as well. So like any other URL controller you should split the URL string into Scheme, Authority, Path, Query and Fragment components and encode it separately with their according encoding rules.

‘Hello World’ should be in a query component ‘Hello+World’ and in an path component ‘Hello%20World’.

This topic started as url encoding of string data but now we’re trying to create a complete solution so that’s why I’m getting picky.

I’m out of this! And Kudos to Nigel!

The Dark Lord of Warwickshire Helps them all out of their quagmire In space that is quite economical. The best I've seen Solutions fast and clean He helps them out when need is dire
I really guess it holds for encoding the path component of the Urls then, and that is really all what I care about!

I read the W3.org standard during the weekend, and I am not touching any RFC at the moment.

Yes, I have been vading through to day also, to find that clause, stating about regular characters, but maybe it was just a misconception for my part.

I find php superior over perl, in the way that not so many tinker with it. I am far more pessimistic when it comes to perl. Having had some encoding experiences with it.

It’s not that hard to encode a non-encoded URL, but you have to choose the proper language to do in. Because Apple keeps saying to Objective-C developers, who uses NSApplescript objects, not to use URLs because there is no support I think that would apply to AppleScripters as well. Python and Ruby are good programming languages with support for URLs and therefore will save you hundred lines of code in AS to write yourself. For instance to encode an HTML URL I choose PHP. Since 10.6 this code can’t run (http_build_url function) because Mac OS X (including server) doesn’t include php’s pecl extension by default anymore.

set theURL to "http://www.mywebserver.com/path/to/text script.php?string=größe maße&page=2#overview"

do shell script "/bin/echo -n " & quoted form of theURL & " | php -r '$c=parse_url(trim(fgets(STDIN))); 
parse_str($c[\"query\"], $c[\"query\"]);
$c[\"query\"] = http_build_query($c[\"query\"]);
$c[\"path\"] = implode(\"/\", array_map(\"rawurlencode\", explode(\"/\", $c[\"path\"])));
echo http_build_url($c);'"

For persons that doesn’t have pecl installed I have the following code (made quickly)

set theURL to "http://djbw:password@www.mywebserver.com/path/to/text script.php?string=größe maße&page=2#overview"

--first parse the url with help from PHP
set rawUrlComponents to do shell script "/bin/echo -n " & quoted form of theURL & " | php -r '$c=parse_url(trim(fgets(STDIN))); 
parse_str($c[\"query\"], $c[\"query\"]);
$c[\"query\"] = http_build_query($c[\"query\"]);
$c[\"path\"] = implode(\"/\", array_map(\"rawurlencode\", explode(\"/\", $c[\"path\"])));
foreach($c as $key => $value){printf(\"%s=%s\\n\", $key, $value);}'"

--now built the url string again according to chapter 5.3 in RFC 3986
set urlString to {"", "", "", "", "", "", "", "", "", "", "", "", ""}
repeat with cmp in paragraphs of rawUrlComponents
	if cmp begins with "scheme=" then
		set item 1 of urlString to text 8 thru -1 of cmp
		set item 2 of urlString to ":"
	else if cmp begins with "host=" then
		set item 8 of urlString to text 6 thru -1 of cmp
		set item 3 of urlString to "//"
	else if cmp begins with "user=" then
		set item 4 of urlString to text 6 thru -1 of cmp
		set item 7 of urlString to "@"
	else if cmp begins with "pass=" then
		set item 6 of urlString to text 6 thru -1 of cmp
		set item 5 of urlString to ":"
	else if cmp begins with "path=" then
		set item 9 of urlString to text 6 thru -1 of cmp
	else if cmp begins with "query=" then
		set item 11 of urlString to text 7 thru -1 of cmp
		set item 10 of urlString to "?"
	else if cmp begins with "fragment=" then
		set item -1 of urlString to text 10 thru -1 of cmp
		set item -2 of urlString to "#"
	end if
end repeat

return urlString as string

Also I don’t encode the authority because normally you won’t accept a special character username, password or hostname because there is world wide to many software that doesn’t support that. For example IE built-in ftp client is buggy when reserved characters are used in the authority part.

Hi!

Thanks for your work and explantions so far! :slight_smile:

Actually, there are two more things :slight_smile:

While you are at it, (and php) seems like the ide language for doing this). How would you decode a POST/GET from a html form with applescript, if applescript was to be the reciepient?

And how would you encode an applescript, if applescript was to be put on a page with the the applescript:// protocol?

Of course assuming Utf-8!

Thanks Bazzie! :slight_smile:

The pleasure’s mine, it’s some thing I have been wrestling with through the years. I needed all these information to write a proper and universal SOAP and XML-RPC server. And therefore I had to dug into HTTP, and all it’s related documented that’s needed to write a proper server. All the effort resulted in an general XML-RPC/SOAP/JSON/JSON-RPC server that can be used for every programming languages i’ve worked with so far including AppleScript.

First of all POST and GET (get=url query), but also PUT and DELETE (for crud and rest), are typical HTTP and isn’t typical URL related. POST and GET are almost the same except that the query is in the URL when using GET and the query is saved in the HTTP body (not HTML Body) when using POST.

But you’re totally right because when it comes to general URL encoding the query is differently encoded between different protocols. The example code in my previous post (encoding an http url) I’ve used an other way of encoding the query than you need for AppleScript. For AppleScript the URL is completely RFC3986 while HTTP isn’t.

An applescript can be encoded like this:

set theScript to "tell application \"Finder\"
display dialog \"Hello, i'm created with an URL.\"
end tell"

set encodedScript to rawurlencode(theScript)

open location "applescript://com.apple.scripteditor?action=new&script=" & encodedScript

on rawurlencode(str)
	return do shell script "/bin/echo -n " & quoted form of str & " | php -r ' while($line = fgets(STDIN)){echo rawurlencode($line);} '"
end rawurlencode

Hi. :slight_smile:

Thank you very much, I think I found the rest of what I need for future use higher up in the thread, (the query string). :slight_smile: