catching source urls

i tried to get the source url of downloaded files.
xattr shows me only “”. (?)

tell application "Finder"
	set nm to name of item 1 of (get the selection)
	set trg to POSIX path of (get target of window 1 as text)
end tell

#do shell script "mdfind -onlyin '" & trg & "' -name '" & nm & "' \"kMDItemWhereFroms\""
#do shell script "xattr '" & (trg & nm as text) & "'"

What is exactly your question? When a file is downloaded an extended attribute is added to the file named where the first number indicates the state of the file (if you set the first number back to 0001 it will ask the user again for permission). When the file is in quarantine state it can’t execute any executable code unless the user has confirmed the risks of it. This attribute doesn’t contain an URL to it’s source but only indicates with process has downloaded the file. The information can be retrieved with the -p option

do shell script "xattr -p " & quoted form of POSIX path of (choose file) & " || true"

i used xattr to understand if urls are written also to metadata attributes.
However, my goal is to get the source url of the downloaded file. I ask this, because if i type “html” in my finder window search bar and confirm it as tag, i get files downloaded from a particular source url.
kMDItemWhereFroms, should retrieve urls, but im not sure.

You understand that kMDItemWhereFroms is not a required metadata for a quarantine-aware application? In other words when is set, it doesn’t mean the kMDItemWhereFroms is set nor an entry added to the ~/Library/Preferences/ sqlite database. FireFox is an quarantine-aware application but doesn’t save this kind of extra information like Safari does.

Well I also found out that Safari doesn’t always set kMDItemWhereFroms, sometimes it only insert information in the LSQuarantineEventIdentifier. Also I convert the download date into an AppleScript date.

set theFile to POSIX path of (choose file)
set xattr to do shell script "xattr -p " & quoted form of theFile & " || true"

if xattr = "" then return {missing value, missing value, {}}--stop, there is no reason to continue

set oldTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to ";"
if (count of text items of xattr) is not 4 then
	set AppleScript's text item delimiters to oldTID
	return {missing value, missing value, {}} is wrong format
end if
set {theState, theDate, theApp, theUUID} to every text item of xattr
set AppleScript's text item delimiters to oldTID

-- convert the hex epoch date to a AppleScript data
set dateWords to every word of (do shell script "printf \"%d\" 0x" & theDate & " | xargs -J @ date -r @ \"+%Y %m %d %H %M %S\" ")
set theDate to current date
--for better readability it's not merged into a single line setting two lists
set year of theDate to item 1 of dateWords as integer
set month of theDate to item 2 of dateWords as integer
set day of theDate to item 3 of dateWords as integer
set hours of theDate to item 4 of dateWords as integer
set minutes of theDate to item 5 of dateWords as integer
set seconds of theDate to item 6 of dateWords as integer

--now try to get the URL where this file is downloaded from
set whereFrom to do shell script "mdls -raw -name kMDItemWhereFroms " & quoted form of theFile
if (count of paragraphs of whereFrom) < 3 then
	set whereFrom to {}
	--we still could try to get it from the quarantine database
	set sqlResults to do shell script "sqlite3 -line \"$HOME/Library/Preferences/\" \"SELECT LSQuarantineOriginURLString, LSQuarantineDataURLString FROM LSQuarantineEvent WHERE LSQuarantineEventIdentifier = '" & theUUID & "'\""
	if sqlResults is not "" then
		set end of whereFrom to text 31 thru -1 of paragraph 1 of sqlResults
		set end of whereFrom to text 31 thru -1 of paragraph 2 of sqlResults
	end if
	set whereFrom to run script "{" & (paragraphs 2 thru -2 of whereFrom as string) & "}"
end if

return {theApp, theDate, whereFrom}

and thanks for your awesome script, DJ Dazzie Wazzie. I used your script a lot, but the source code is missing most of the time, so its needed another approach.
A javascript should do the trick-retrieving the source url from the current loaded page, and adding the results to the attributes after the document has been saved to the desktop or another location. Again, I have no experience with java, and don’t know if the idea really works.

After some trials, i was able to build a script with another approach. Thank you nevertheless for the inspiration DJ.

If youre interested about the results: (redirects to Code exchange)
URLs to Finder comment

i’ve still two questions open:

  1. do browsers always use the websites title to set the name of htm, html documents?
  2. for now, i’ve to use “Finder”'s comments to add the source url info, but would like to use xattr to make things more useful for spotlight searches, something more permanent. “Finder”'s comments are awkward . :expressionless:

You’re welcome. Anyway, it’s maybe useful for another user in the future

In general yes. Every browser for years will set the name of the window or tab to the title of the HTML document

The problem is that am URL is not saved as extended attribute. But the question would be how spotlight can index a file and knows its where from key. Well for this you can use xattr. Define directly followed by a colon and directly followed by its meta key name. The next time the file is indexed by spotlight it will use its xattr value instead. Important to know is that spotlight is an asynchronous, indexed database search. So when then proper xattr is set, there can be a delay between setting xattr and the moment that the spotlight spider will re-index the file and copy the xattr into it’s database.

i’m on an iPad now, if you want I could give you an example tomorrow

Yes, that sounds fantastic! great! :slight_smile:

Here an example of how you can add kMDItemWhereFroms metatag to your file.

set theFile to POSIX path of (choose file)
set sourceURL to ""
set plist to "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"\">
<plist version=\"1.0\"><array><string>" & sourceURL & "</string></array></plist>"

set xattrValue to do shell script "echo " & quoted form of plist & " | plutil -convert binary1 - -o - | xxd -ps"

do shell script "xattr -wx '' " & xattrValue & space & quoted form of theFile
--probably not needed but re-index the file anyway. 
do shell script "mdimport " & quoted form of theFile

DJ Bazzie Wazzie, everything works like a charm.
no real issues, but i saw, after some testing:

¢ whenever i create zip archives (.zip) from downloaded files the zipped ones loose their wherefrom metatag; which is a bit pity. Altering a downloadeded file instead (actions like copy, move, save) do not affect the wherefrom metatag

¢ tagging with xattr seems really specific and requires different approaches. Assigning simple string tags (“tag1”,“tag2”) follow other rules than to write the very specific wherefrom attribute (I asked about xattr with a double interest, because using the tags assigned by the openmeta tag shell from ironic softw. makes tags not portable) maybe the wrong place here to to extend my initial question ?