Anyways as to the overall purpouse, not sure if you know, but the Illustrator native file format (ai) is largely based off of the PDF language specification. because of this it may be hard to determine the correct file type, as you’ve seen, from the hex dump.
Before anyone misconstrues…we had to choose a product that had a DAM and a project management piece…systems that do both are incredibly rare, at least that meet our well-developed criteria. Interwoven’s was the best fit by a long shot.
There are plenty of stand-alone DAMs, many quite a bit better at handling Mac files, even at a corporation the size of ours. But we needed project management integration, and that part of the software market has been slow to mature outside specific vertical markets (and Creative Services isn’t one of them, strangely enough).
But we didn’t catch the resource fork issue in testing…nuts. Turns out MediaBin’s support of Mac files is “odd” at best. Worse, Adobe stopped development of their server-based Adobe Graphics Server (AGS), which MediaBin relies on. Making my life tricky, though giving me plenty to do as you can see.
Yeah I knew that part, but up until CS1 there were still differences I could work with. Even MediaBin could do basic determinations with PDF vs. CS1. But CS2 looks like they took the “Illustrator is a PDF” to the final stage. I’m spending today doing a detailed breakout of the metadata/XML exposed by the hexdump to see if I can find any difference, however slight, before I throw in the towel.
I’ve also posted to the Adobe forums, but I’m not holding my breath, rarely does anyone from Adobe seem to step in, and this kind of question is likely too esoteric for the normal Adobe forum user base.
On the upside for anyone following along at home…this is all related to the “File Repair” software I posed in Code Exchange many months back:
I’ve been asking related questions in other threads, refining the script. I am hoping to refine it enough for ScriptBuilders, mostly because on my end I need to refine it to the point where it will not only work internally, but “in the wild.”
Ahh, it all makes more sense to my funny little mind now =)
Anyways if you’re releasing in the wild you should make sure that when processing files that you’re not trying to do a hex dump on a folder that is dropped Or if one is process the files and folders within that as well =)
I got bored, love my job :D, and decided to update it to accomodate the possiblity of folders, possibly nested, being dropped alongside files.
Please pardon though the lack of commenting and reogranization of the handlers, we all have our own habits
on open fileList
repeat with this_item in fileList
process(this_item)
end repeat
end open
on process(this_item)
if (folder of (info for this_item)) then
tell application "Finder" to set subItems to items of folder this_item
repeat with anItem in subItems
process(anItem as alias)
end repeat
else
set AppleScript's text item delimiters to {""}
set this_item_posix to quoted form of POSIX path of (this_item as string)
set doc_name to name of (info for this_item)
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\"|\"}{print$2}'")
set hex_dump to searchNreplace(hex_dump, return, "")
tell application "TextEdit"
make new document
set text of front document to hex_dump
set name of front window to doc_name
end tell
end if
end process
on searchNreplace(parse_me, find_me, replace_with_me)
set {ATID, AppleScript's text item delimiters} to {"", find_me}
set being_parsed to text items of parse_me
set AppleScript's text item delimiters to {replace_with_me}
set parse_me to being_parsed as string
set AppleScript's text item delimiters to ATID
return parse_me
end searchNreplace
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\"|\"}{print$2}'")
I’ve been thinking about this all week. The above will produce incorrect output (characters will be omited) if the pipe character occurs within the text. I’m a shell scripting noob, so it took me a while to figure this out, but my suggestion follows. Maybe one of the experienced shell scripters here can give it some polish.
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\" \"}{print(substr($0,62,16))}'")
The pipes are simply a visual aid, not part of the actual code (or not a useful part anyway, you can tell by reading the data within). I’ve yet to see the pipe character within the two “border” pipe characters, so it seems safe enough.
Interesting! I actually did a bit of testing and never encountered a pipe in between the border pipes. Just curious what kind of files were you dealing with?
Also sorry if the question seems stupid, but are you sure it wasn’t a lower case L? In my terminal font a l and | look a hell of a lot alike
The files in question were Excel files. I verified that it was the pipe character and not the lower case ‘L’ by searching for the pipe character in TextEdit.
Can someone verify that Sauron’s repalcement Ggrep string works correctly? I’m not enough of a grep expert to tell either.
I don’t need to detect Excel files, but no harm in fixing this little issue in case some day in the future I borrow this handler for a different purpose.
Essentially the operation isn’t much diffrent that what we were doing before, but there are a few small changes… First though there is one part you can emmit from the equation, so then it becomes
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk '{print(substr($0,62,16))}'")
At least in various tests this works… anyways as to whats happening…
awk is now taking the entire string of arguments it was passed, delimited by the default field separator, thus an entire line represented by the $0 and then extracting a substring.
The format of substring is this:
substr(s,p,l) – The substring of s starting at p and continuing for l characters
so the source is an entire line and then it’s just a matter of counting how far in the Human Readable string starts and how many characters it continues for.
--
-- Get Hexdump Info v4
-- by Kevin Quosig, 3/28/07
--
-- Used to drag-n-drop files to examine their contents/headers.
--
-- Most code segments courtesy of James Nierodzik of MacScripter
-- http://bbs.applescript.net/profile.php?id=8727
--
--
-- UTILITY HANDLER
--
-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
--save incoming TID state, set new TIDs
set {ATID, AppleScript's text item delimiters} to {"", find_me}
--using the specified character as a break point to strip the delimiter out and break the string into items
set being_parsed to text items of parse_me
--switch the TIDs again (replace string)
set AppleScript's text item delimiters to {replace_with_me}
--coerce it back to a string with new delimiters
set parse_me to being_parsed as string
--restore incoming TID state
set AppleScript's text item delimiters to ATID
--return results
return parse_me
end searchNreplace
--
-- MAIN HANDLER
--
on open fileList
-- parse through files dropped onto droplet
repeat with i from 1 to number of items in fileList
set AppleScript's text item delimiters to {""} --reset delimiters
set this_item to item i of fileList as string ---pick item to work with
set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
--Improved hexdump script line by TheMouthofSauron at MacScripter
--http://bbs.applescript.net/viewtopic.php?pid=77811#p77811
--
--hexdump with the -C parameter formats the hexdump as columns of hex pairs
--and then a column with a human-readable "ASCII translation" delimited by a pipe
--character at the beginning and end of the ASCII column
--
--"awk" takes the entire -C formatted hexdump line ($0 = all arguements)
--and filters-out the hex pairs and the delimiting of pipe characters
--(return only 16 characters starting at position 62)
--
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk '{print(substr($0,62,16))}'")
--remove carriage returns so output is one giant paragraph
--(allows for TextEdit searching for strings and manual scanning)
set hex_dump to searchNreplace(hex_dump, return, "")
--write to TextEdit window and rename window to file name to keep things straight
tell application "TextEdit"
make new document
set text of front document to hex_dump
set name of front window to doc_name
end tell
end repeat
end open