How can i get the URL, stored in a .webloc file?

Hi.
no matter what i try, i cant get the URL, that is stored in a .webloc file, created with Safari.
.webloc files created with OmniWeb are indeed xml files, and by doing a “read file”, i easyly can find the URL in it.

But if i “read” a .webloc file, that was created by dragging a link from Safari, (or text, dragged from TextEdit to the Finder), there is nothing in it.

So how can i get the URL???

thanks, Spock

many ways, drop it on to a textedit document window.

or
onto terminal.

or
in terminal. ( renamed the file first and type it into terminal, not drop it)
cat foo.webloc/rsrc

or
cat MacScripter.webloc/rsrc | tr -d “\000” | sed ‘s/.(http:.).http:.*/\1/’

It actually a resource file that holds the info for the safari .webloc/s

read this
http://www.macosxhints.com/article.php?story=20040728185233128

Oops, Mark :rolleyes:

I had already played with your linked forum and had altered the webloc I was testing. I’ve deleted my suggestion.

Hi,

Also, you can get the url with the Finder.

set f to choose file
tell application “Finder”
set web_loc to location of f
end tell

gl,

They’re not all stored as resource forks. In Tiger, new webloc files are just XML data, and I think this might extend back to Panther–but I’m not sure about that one.

thanks for all the answers.
The most simple would be the “location of” command, as mentioned by KEL,
but i cannot use this for some other reason.

So all the other answers braught me to this code, which works for me

set theItem to "/Users/spock/Desktop/Test/Safari.webloc"

set filePath to POSIX path of theItem
set fileContent to (do shell script "cat " & filePath & "/rsrc")

set thePos to offset of "http" in fileContent
set theSubstring to characters (thePos) through (length of fileContent) of fileContent as string

set thePos to offset of (ASCII character of 0) in theSubstring
set theURL to (characters 1 through (thePos - 1) of theSubstring) as string

This way, using text item delimiters may be faster than counting characters (works with any webloc, too):

property head : "http://"
property n : -5
set p to "/Users/bellac/Desktop/Sec.webloc" -- an https:// url
--set p to "/Users/bellac/Desktop/Plain.webloc" -- an http:// url
set W to (do shell script "cat " & p & "/rsrc")
if W contains "https://" then set {head, n} to {"https://", -6}
set {tid, text item delimiters} to {text item delimiters, head}
set u to head & text 1 thru n of text item 2 of W
set text item delimiters to tid
u

Hi guys. Perhaps I could add just one or two points to this.

When passing variables to the shell, it’s generally safer to use quoted form of. (To demonstrate why, try the above ‘cat’ routine on a path that contains spaces.)

While I realise that Adam’s tid-based suggestion is just a test routine, it may be worth noting that a working version, in which the file path is derived from external input (such as dropped files or a ‘choose file’ dialog), may not work consistently. After parsing a URL starting with “https://”, the values of the ‘head’ and ‘n’ properties will remain set to “https://” and -6, respectively - thus causing a problem with any further URLs that start with “http://”.

Generally, I prefer the simplicity of using Finder (as kel suggested) to extract an internet location file’s location property. However, if this was not possible for some reason, there are a number of alternatives (which may not work pre-Tiger).

Here’s another way to parse ‘cat’ output using text item delimiters:

to get_url from f
	set t to do shell script "cat " & quoted form of POSIX path of f & "/rsrc"
	set d to text item delimiters
	set text item delimiters to "url "
	set t to t's text from text item 2 to -1
	set text item delimiters to t's text 13 thru 16
	set t to t's text 17 thru text item 2
	set text item delimiters to d
	t
end get_url

get_url from choose file of type {"ilht"} without invisibles

Or, using offset:

to get_url from f
	set t to do shell script "cat " & quoted form of POSIX path of f & "/rsrc"
	set p to (offset of "url " in t) + 20
	t's text p thru ((offset of t's text (p - 4) thru (p - 1) in t's text p thru -1) + p - 2)
end get_url

get_url from choose file of type {"ilht"} without invisibles

However, if we’re going to parse text extracted from a file, we could do it more directly:

set t to read (choose file of type {"ilht"} without invisibles)
t's text ((offset of "<string>" in t) + 8) thru ((offset of "</string>" in t) - 1)

We can now even call on System Events to parse such files:

set f to choose file of type {"ilht"} without invisibles
tell application "System Events" to value of property list item "URL" of property list file (POSIX path of f)

This is interesting, Kai, because it shows up the real differences in weblocs produced by Safari and those from other browsers. Safari weblocs are not type “ilht” - their type property is blank, so the typing has to be removed from the choice to “see” them in your solutions (that was the OP’s original request). Even after that removal, however, when a Safari webloc is chosen, your versions (except for the Finder, of course) don’t work on a Safari webloc, but do on a Camino, etc. weblocs. Do they work on your Safari or were you running another browser? I’m running Safari 2.0.3 in OS X 10.4.6 for these tests. Is that the difference?

I’ll fiddle with my version to see if I can get it to accept a “choose file”. Thanks for pointing out that it couldn’t - I only tested it, as you noted, with hard-coded weblocs. Back in a bit…

This seems to work for both Camino weblocs, Safari weblocs, and Opera weblocs. In each case, I tested by telling the appropriate application to open that location with the result.

set head to "http://"
set n to -5
set p to choose file with prompt "Choose a webloc file" without invisibles
set W to (do shell script "cat " & quoted form of POSIX path of p & "/rsrc")
if W contains "https://" then set {head, n} to {"https://", -6}
set {tid, text item delimiters} to {text item delimiters, head}
set u to head & text 1 thru n of text item 2 of W
set text item delimiters to tid
u

As a postscript; is there a way to choose files with a particular extension like “webloc”? - I couldn’t find one.

This compiles, but doesn’t work:

tell application "Finder" to (choose file whose name extension is "webloc")

The only way to go seems to be like this:

set theFolder to path to desktop folder
tell application "System Events" to set theFiles to name of every file of theFolder whose name extension is "webloc"
set myFile to (theFolder & (choose from list theFiles)) as string

The webloc files I used for testing were derived from Safari, Camino and some URLs in TextEdit - all of which automatically produce files of type “ilht” here.

While on the subject, and to answer your subsequent question, there currently seems to be no Uniform Type Identifier that’s specific enough to sort the wheat from the chaff. AFAICT, the closest so far is “public.data”. That’s way too general to be useful for this kind of filtering - including, as it does, stuff like “public.text”, “com.apple.ink.inktext”, “com.apple.applescript. script”, “public.object-code”, “com.apple.mach-o-binary”, “com.apple.pef-binary”, “com.microsoft.windows-executable”, yadda yadda yadda… (And no, I’m afraid it’s not possible to filter by file extension using the ‘choose file’ command.)

The ‘typing’ was incidental to my suggestion. I thought (and still think) that Spock’s problem relates more to reading a Safari webloc file - which I interpreted as extracting the URL from it (and which seems to have been the main thrust of discussion so far).

Could be. I’m running Safari version 2.0.3 (417.9.2) in Mac OS X 10.4.5. (I usually wait a while before upgrading - at least until I’m fairly confident that any fresh breakages aren’t too serious…) :wink:

I was going to post two examples of get info for Safari weblocs for the same version of Safari under 10.4.6, but discovered they wouldn’t post, breaking at file type " presumably followed by some unprintable Unicode. Great - now you can’t get file types reliably.

A little scullduggery (of your design) gets this:

"(«data utxt006E0061006D0065003A0022004D006100630053006300720069007000740065007200200042004200530020007C0020004100700070006C006500730063007200690070007400200046006F00720075006D0073002E007700650062006C006F00630022002C0020006300720065006100740069006F006E002000640061» as unicode text) & («data utxt00740065003A006400610074006500200022004D006F006E006400610079002C00200041007000720069006C002000310030002C00200032003000300036002000310032003A00350031003A0033003600200041004D0022002C0020006D006F00640069006600690063006100740069006F006E00200064006100740065» as unicode text) & («data utxt003A006400610074006500200022004D006F006E006400610079002C00200041007000720069006C002000310030002C00200032003000300036002000310032003A00350031003A0033003600200041004D0022002C002000690063006F006E00200070006F0073006900740069006F006E003A007B0030002C00200030» as unicode text) & («data utxt007D002C002000730069007A0065003A003600360034002E0030002C00200066006F006C006400650072003A00660061006C00730065002C00200061006C006900610073003A00660061006C00730065002C0020007000610063006B00610067006500200066006F006C006400650072003A00660061006C00730065002C» as unicode text) & («data utxt002000760069007300690062006C0065003A0074007200750065002C00200065007800740065006E00730069006F006E002000680069006400640065006E003A00660061006C00730065002C0020006E0061006D006500200065007800740065006E00730069006F006E003A0022007700650062006C006F00630022002C» as unicode text) & («data utxt00200064006900730070006C00610079006500640020006E0061006D0065003A0022004D006100630053006300720069007000740065007200200042004200530020007C0020004100700070006C006500730063007200690070007400200046006F00720075006D0073002E007700650062006C006F00630022002C0020» as unicode text) & («data utxt00640065006600610075006C00740020006100700070006C00690063006100740069006F006E003A0061006C00690061007300200022004100430042002D00470035005F0031003A00530079007300740065006D003A004C006900620072006100720079003A0043006F0072006500530065007200760069006300650073» as unicode text) & («data utxt003A00460069006E006400650072002E006100700070003A0022002C0020006B0069006E0064003A002200570065006200200049006E007400650072006E006500740020004C006F0063006100740069006F006E0022002C002000660069006C006500200074007900700065003A002200000000000000000022002C0020» as unicode text) & («data utxt00660069006C0065002000630072006500610074006F0072003A002200000000000000000022002C002000740079007000650020006900640065006E007400690066006900650072003A002200640079006E002E006100670065003800310073003300700063007200760031003000670022002C0020006C006F0063006B» as unicode text) & («data utxt00650064003A00660061006C00730065002C002000620075007300790020007300740061007400750073003A00660061006C00730065002C002000730068006F00720074002000760065007200730069006F006E003A00220022002C0020006C006F006E0067002000760065007200730069006F006E003A00220022» as unicode text)"

It’s often assumed that a file type that looks like “” is an empty string. However, since a file type should be a 4-character code, an unspecified type is usually denoted by a string consisting of null characters (ASCII character 0). These often cause problems when copying and pasting.

To allow access to the ‘broken’ webloc files from the ‘choose file’ command, you should be able to use the null string as an additional filter. This will, of course, show any file having a null file type - so the size of the problem will depend on how many matching non-webloc files you have. (The issue should be largely resolved if and when Apple gets around to introducing a Uniform Type Identifier to cover this particular type of file.)

tell (ASCII character 0) to set nullType to it & it & it & it
choose file of type {nullType, "ilht"} without invisibles

What interests me even more is how such webloc files are configured internally - since the ideal solution should parse files in both Mac OS 10.4.5 and 10.4.6. (If you feel like sending me one or two files, I’d be happy to compare them.)

(Incidentally - you might like to note that, shortly after posting it, I modified the Unicode encoding routine slightly. The results are essentially no different - although, when several chunks are necessary, only the first should now include an explicit coercion to Unicode text.)

Adam kindly sent me a couple of his files to play with, so that I could check out the differences between mine and his. I soon began to suspect that, rather than different OS versions causing the problem, the variations were being produced by the same version of Safari (2.0.3).

After some further trials, I drew the following conclusions (which Adam has since confirmed by trying the same tests on his machine).

Safari produces two different forms of web location (webloc) file , depending on where a URL is dragged from. When dragging from the address bar’s favicon, the resulting file appears to be incomplete, or of an older format. When dragging from any other position in Safari, the file’s structure is as expected.

To reproduce the results:

There should now be two new web location files on the desktop for comparison. In theory (apart from some variation in name), one might reasonably expect them to be virtually identical. However, there are some significant differences between them.

While they’re still on the desktop, you can compare the files by running this test script:

tell application "Finder"
	set l to name of files whose name extension is "webloc"
	set f to choose from list l with prompt "Choose a webloc file to check:"
	if f is false then error number -128
	set f to (first file whose name is in f) as alias
	set {name:a, file type:t, file creator:c, type identifier:i} to info for f
	set m to "file type:" & tab & tab & t & return & "file creator:" & tab & c & return & ¬
		"type identifier:" & tab & i & return & "file length:" & tab & (get eof f)
	display alert a message m
end tell

The file generated by dragging the selected text contains both XML data and a resource fork. In addition, it has the following properties:

(Cross-testing with a control browser, Camino, produces similar results to the above in every case, regardless of where the URL is dragged from.)

The file produced by dragging the favicon has a resource fork - but no XML data. Its corresponding properties are:

So Apple’s current move away from resource forks to XML files is reflected in the current webloc format - which is a hybrid, containing XML data and a resource fork. That is, unless it’s created from Safari’s address bar favicon. In which case the XML data is missing (note the file length of 0.0 bytes) - along with the “ilht” file type, “MACS” file creator and “dyn.agk80w5dksu” type identifier (returned instead as “dyn.age81s3pcrv10g”).

My guess is that the Safari files are not broken - just an older format (probably because an update, implemented everywhere else, was simply overlooked when it came to the favicon).

A sans-XML file still works when double-clicked, because of the duplicate data stored in the “TEXT” resource of its resource fork. This would also explain why resource-aware extraction methods (e.g: Finder/cat) can see the data - while text/XML-based techniques (e.g: read file/property list file) can’t.

That being the case, something like this should work for either file ‘version’:

set t to paragraph 1 of (do shell script "strings -8 " & quoted form of POSIX path of (choose file) & "/rsrc")
t's text (offset of "http" in t) thru -1

However, this won’t resolve the other issues. So my advice would be to avoid using Safari’s address bar favicon altogether - until it gets fixed. (I filed a bug report with Apple today.)

I’m trying to get this work in 10.8 but it doesn’t

set head to "http://"
set n to -5
set p to choose file with prompt "Choose a webloc file" without invisibles
set W to (do shell script "cat " & quoted form of POSIX path of p & "/rsrc")
if W contains "https://" then set {head, n} to {"https://", -6}
set {tid, text item delimiters} to {text item delimiters, head}
set u to head & text 1 thru n of text item 2 of W
set text item delimiters to tid

ERROR

cat: /Users/<username>/Desktop/<file>.webloc/rsrc: Not a directory

Any help would be appreciated.

Hi,

nowadays the contents of a webloc file is just a hidden property list file with a single URL key.
To reveal the real content it’s necessary to rename the file with the appropriate extension.
To keep the original webloc the script copies the file to temporary items folder,
reads the value of the URL key and deletes the temporary file


set weblocFile to POSIX path of (choose file of type "webloc")
set temporaryFile to POSIX path of (path to temporary items) & (do shell script "date +%y%m%d%H%M%S") & ".plist"
do shell script "cp " & quoted form of weblocFile & space & quoted form of temporaryFile
set weblocURL to do shell script "defaults read  " & quoted form of temporaryFile & " URL; rm " & quoted form of temporaryFile

or a simpler and more applescript version


set weblocFile to (choose file of type "webloc")
set weblocData to read weblocFile
tell application "System Events"
	set weblocDoc to make new property list item with properties {text:weblocData}
	set weblocURL to value of property list item "URL" of weblocDoc
end tell

1 Like

thanks that worked.

old topic, but close to what I’m trying to accomplish, so let’s see if can revive some (hopefully not so dead) corpses … :wink:

How would a script need to look like to display the web address (“URL”) of a .webloc file when hovering over it in Finder? Either(preferred) in a little pop-up at the mouse pointer position (like when hovering over an abbreviated file name, its full name is expanded in such a pop-up), or otherwise in Finder’s status bar?