Native Applescript decoder for dynamic Uniform Type Identifiers

NOTE: An excellent resource for decoding and interpreting dynamic Uniform Type Identfiers, along with Objective-C and Swift decoders based on that resource, have been previously published. The current post describes a vanilla Applescript version of the decoder, decodeDynUTI, that uses an alternative approach to the bitwise operations of the other decoders, demonstrates satisfactory execution speed (< 0.01 seconds per conversion), and is readily incorporated into native Applescript and ASObjC code. (Tested in macOS 10.13.3)

Apple’s Uniform Type Identifier (UTI) system provides an elegant way of typing file system items. A straightforward way to get an item’s UTI in Applescript is to query the type identifier property of the item with the System Events application. For example:


tell application "System Events" to return type identifier of alias "...HFS path to jpeg image file..." --> "public.jpeg"

The text content of the UTI of these common data types immediately identifies the item’s type, for example, “public.jpeg” for a JPEG image file, or “com.adobe.pdf” for an Adobe PDF file.

The operating system recognizes a large number of UTIs for common data types. However, as stated in Apple’s documentation, one may occasionally encounter a file system item without an assigned UTI. This might be the case, for instance, with a file of a new or obscure file type that is not recognized by the operating system. In these cases, the operating system dynamically assigns a UTI starting with the domain name “dyn.” followed (always, it seems) by the character “a” followed by a series of unintelligible characters (described as “opaque” in Apple’s documentation.) Lurking beneath the “opaque” character sequence, however, lies encoded information. It turns out that the characters following “dyn.a” are encoded with a custom base-32 encoding scheme in which the characters abcdefghkmnpqrstuvwxyz0123456789 represent the decimal values 0 through 31. The dynamic UTI can then be decoded by first converting the characters following “dyn.a” into left-zero-padded 5-character bit strings derived from the custom base-32 encoding scheme, and then converting each successive 8-bit group in left-to-right order into their Unicode character equivalents.

Taking examples from the aforementioned resource, here is the output of the decoder run on the dynamic UTI “dyn.ah62d4r34gq81k3p2su1zuppgsm10esvvhzxhe55c”:


decodeDynUTI("dyn.ah62d4r34gq81k3p2su1zuppgsm10esvvhzxhe55c")
--> "?0=7:3=text/X-frob:1=frob"

And here is the output for the dynamic UTI “dyn.ah62d4r3qkk7dgtpyqz6hkp42fzxhe55cfvy042phqy1zuppgsm10esvvhzxhe55c”:

decodeDynUTI("dyn.ah62d4r3qkk7dgtpyqz6hkp42fzxhe55cfvy042phqy1zuppgsm10esvvhzxhe55c")
--> "?0=7,B:3=text/X-frob,image/X-frob:1=frob"

What do the decoded text strings mean? Although undocumented by Apple, the interpretation of the decoded strings is described in the aforementioned resource. Here are a few important highlights from that resource:

  1. The decoded string consists of colon-delimited expressions of the form: [UTI]=[value]
  2. If an expression has multiple values (e.g., multiple UTIs to which the item conforms), the values are separated by commas:
    [UTI]=[value1,value2,…]
  3. The hexadecimal digits 0 through F are used as abbreviations for the following common UTIs:
    ?0: UTTypeConformsTo (the purpose of the “?” prefix is unexplained; perhaps it signifies that the value, UTTypeConformsTo, is not a UTI)
    1: public.filename-extension
    2: com.apple.ostype
    3: public.mime-type
    4: com.apple.nspboard-type
    5: public.url-scheme
    6: public.data
    7: public.text
    8: public.plain-text
    9: public.utf16-plain-text
    A: com.apple.traditional-mac-plain-text
    B: public.image
    C: public.video
    D: public.audio
    E: public.directory
    F: public.folder
  4. The following control characters must be escaped with a reverse slash if they appear as literal characters in a UTI or value:
    , : = \ NUL

With this information at hand, we can now interpret the decoded results from the examples.

?0=7:3=text/X-frob:1=frob means:

[i]?0=7[/i]  ->  UTTypeConformsTo=public.text  ->  the item conforms to the UTI [i]public.text[/i]
[i]3=text/X-frob[/i]  ->  public.mime-type=text/X-frob  ->  the item's mime type is [i]text/X-frob[/i]
[i]1=frob[/i]  ->  public.filename-extension=frob  ->  the item's filename extension is [i]frob[/i]

?0=7,B:3=text/X-frob,image/X-frob:1=frob means:

[i]?0=7,B[/i]  ->  UTTypeConformsTo=public.text,public.image  ->  the item conforms to both the UTI [i]public.text[/i] and the UTI [i]public.image[/i]
[i]3=text/X-frob,image/X-frob[/i]  ->  public.mime-type=text/X-frob,image/X-frob  -> the item's mime types are [i]text/X-frob[/i] and [i]image/X-frob[/i]  (note: the system only recognizes the first mime type, as discussed in the [url=https://alastairs-place.net/blog/2012/06/06/utis-are-better-than-you-think-and-heres-why/]resource[/url]; this odd combination of mime types for a single item reflects that fact that the author used a contrived example simply for demonstration purposes)
[i]1=frob[/i]  ->  public.filename-extension=frob  ->  the item's filename extension is [i]frob[/i]

Handler:


on decodeDynUTI(dynamicUTI)
	-- Decodes a custom base-32-encoded dynamic Uniform Type Identifier of the form "dyn.a...", and returns the decoded text string
	-- Note: The handler only recognizes dynamic UTIs whose first letter following the domain name "dyn." is "a"
	script util
		-- Custom base-32 encoding scheme characters and their 5-bit equivalent bitstring values
		property customBase32Chars : "abcdefghkmnpqrstuvwxyz0123456789"
		property bitstringValues : {"00000", "00001", "00010", "00011", "00100", "00101", "00110", "00111", "01000", "01001", "01010", "01011", "01100", "01101", "01110", "01111", "10000", "10001", "10010", "10011", "10100", "10101", "10110", "10111", "11000", "11001", "11010", "11011", "11100", "11101", "11110", "11111"}
		-- Main handler
		on run
			tell dynamicUTI
				-- Perform a preliminary validation of the input argument
				if (its class ≠ text) or (it does not start with "dyn.a") then error "The input argument is not a dynamic Uniform Type Identifier of the form \"dyn.a[...]\"."
				-- Handle the special case of a dynamic UTI without content
				if length = 5 then return ""
				-- Convert the relevant portion of the dynamic UTI to its decoded bitstring equivalent with the recursive handler dynUTIToBitstring
				set currBitstring to my dynUTIToBitstring(text 6 thru -1)
			end tell
			-- Convert the bitstring to a Unicode character string with the recursive handler bitstringToUnicodeString
			set decodedString to my bitstringToUnicodeString(currBitstring)
			-- Return the decoded string
			return decodedString
		end run
		-- Utility handlers
		on dynUTIToBitstring(currString)
			-- Converts a custom base-32-encoded dynamic UTI string to its equivalent bitstring by replacing each input character with a corresponding 5-bit substring
			tell currString
				-- Handle the special case of an empty input string
				if length = 0 then return ""
				-- Get the index position in the custom base-32 and bitstring lists of the input string's first character
				set tid to AppleScript's text item delimiters
				try
					set AppleScript's text item delimiters to text 1
					set currIndex to (my customBase32Chars's first text item's length) + 1
				end try
				set AppleScript's text item delimiters to tid
				-- Throw an error if the input string's first character doesn't match any entry in the custom base-32 list
				if currIndex > my customBase32Chars's length then error "The following character in the dynamic UTI is invalid: " & return & return & tab & (text 1)
				-- Get the 5-character bitstring equivalent of the first input character
				set currBitstring to my bitstringValues's item currIndex
				-- If the dynamic UTI consists of only one character, return its 5-bit equivalent value
				if length = 1 then return currBitstring
				-- Otherwise, return the first character's 5-bit equivalent value concatenated with the 5-bit equivalent values of the remaining characters obtained recursively through the current handler
				return currBitstring & my dynUTIToBitstring(text 2 thru -1)
			end tell
		end dynUTIToBitstring
		on bitstringToUnicodeString(currBitstring)
			-- Converts a bitstring to its equivalent Unicode string by replacing 8-bit substrings in left-to-right order with corresponding Unicode characters
			tell currBitstring
				-- If the input bitstring is empty (i.e., the dynamic UTI has been fully processed) or is < 8 bits long and consists only of extraneous "0"'s (as is sometimes encountered in valid dynamic UTIs), return the empty string
				if (length = 0) or (it is in "0000000") then return ""
				-- If the input bitstring is < 8 bits long and has non-zero content, throw an "extraneous bits" error
				if length < 8 then error "The input argument is invalid because " & ({"there is 1 leftover trailing non-zero bit", "there are " & length & " leftover trailing non-zero bits"}'s item (1 + ((length > 1) as integer))) & " after processing all 8-bit substrings."
				-- Get the Unicode character equivalent of the first 8 bits
				set currUnicodeChar to character id ((128 * (text 1) + 64 * (text 2) + 32 * (text 3) + 16 * (text 4) + 8 * (text 5) + 4 * (text 6) + 2 * (text 7) + (text 8)))
				-- If the input bitstring is only 8 bits long, return its Unicode character equivalent
				if length = 8 then return currUnicodeChar
				-- Otherwise, return the Unicode character equivalent of the first 8 bits concatenated with the Unicode character equivalents of the remaining 8-bit substrings obtained recursively through the current handler
				return currUnicodeChar & my bitstringToUnicodeString(text 9 thru -1)
			end tell
		end bitstringToUnicodeString
	end script
	-- Decode the input text string, and return the decoded string
	return (run util) as text
end decodeDynUTI

Note: A minor edit was made to the dynUTIToBitstring handler code without any material functional changes since the original submission.