Cleaner Hexdump?

James_Nierodzik · March 20, 2007, 3:28pm

Ahh, makes sense. What DAM system do you use?

Anyways as to the overall purpouse, not sure if you know, but the Illustrator native file format (ai) is largely based off of the PDF language specification. because of this it may be hard to determine the correct file type, as you’ve seen, from the hex dump.

CalvinFold · March 20, 2007, 3:59pm

We’re using the Interwoven’s MediaBin.

Before anyone misconstrues…we had to choose a product that had a DAM and a project management piece…systems that do both are incredibly rare, at least that meet our well-developed criteria. Interwoven’s was the best fit by a long shot.

There are plenty of stand-alone DAMs, many quite a bit better at handling Mac files, even at a corporation the size of ours. But we needed project management integration, and that part of the software market has been slow to mature outside specific vertical markets (and Creative Services isn’t one of them, strangely enough).

But we didn’t catch the resource fork issue in testing…nuts. Turns out MediaBin’s support of Mac files is “odd” at best. Worse, Adobe stopped development of their server-based Adobe Graphics Server (AGS), which MediaBin relies on. Making my life tricky, though giving me plenty to do as you can see.

Yeah I knew that part, but up until CS1 there were still differences I could work with. Even MediaBin could do basic determinations with PDF vs. CS1. But CS2 looks like they took the “Illustrator is a PDF” to the final stage. I’m spending today doing a detailed breakout of the metadata/XML exposed by the hexdump to see if I can find any difference, however slight, before I throw in the towel.

I’ve also posted to the Adobe forums, but I’m not holding my breath, rarely does anyone from Adobe seem to step in, and this kind of question is likely too esoteric for the normal Adobe forum user base.

On the upside for anyone following along at home…this is all related to the “File Repair” software I posed in Code Exchange many months back:

http://bbs.applescript.net/viewtopic.php?id=19095

I’ve been asking related questions in other threads, refining the script. I am hoping to refine it enough for ScriptBuilders, mostly because on my end I need to refine it to the point where it will not only work internally, but “in the wild.”

James_Nierodzik · March 20, 2007, 4:24pm

Ahh, it all makes more sense to my funny little mind now =)

Anyways if you’re releasing in the wild you should make sure that when processing files that you’re not trying to do a hex dump on a folder that is dropped Or if one is process the files and folders within that as well =)

CalvinFold · March 20, 2007, 4:52pm

Oh I wasn’t planning for the hexdump part to go to the wild, I just use it locally for utility purposes.

Figured I’d put it in Code Exchange though so others can benefit from the results of this thread.

James_Nierodzik · March 20, 2007, 5:08pm

I got bored, love my job :D, and decided to update it to accomodate the possiblity of folders, possibly nested, being dropped alongside files.

Please pardon though the lack of commenting and reogranization of the handlers, we all have our own habits

on open fileList
	repeat with this_item in fileList
		process(this_item)
	end repeat
end open

on process(this_item)
	if (folder of (info for this_item)) then
		tell application "Finder" to set subItems to items of folder this_item
		repeat with anItem in subItems
			process(anItem as alias)
		end repeat
	else
		set AppleScript's text item delimiters to {""}
		set this_item_posix to quoted form of POSIX path of (this_item as string)
		set doc_name to name of (info for this_item)
		set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  'BEGIN{FS=\"|\"}{print$2}'")
		set hex_dump to searchNreplace(hex_dump, return, "")
		tell application "TextEdit"
			make new document
			set text of front document to hex_dump
			set name of front window to doc_name
		end tell
	end if
end process

on searchNreplace(parse_me, find_me, replace_with_me)
	set {ATID, AppleScript's text item delimiters} to {"", find_me}
	set being_parsed to text items of parse_me
	set AppleScript's text item delimiters to {replace_with_me}
	set parse_me to being_parsed as string
	set AppleScript's text item delimiters to ATID
	return parse_me
end searchNreplace

TheMouthOfSauron · March 24, 2007, 1:00am

set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\"|\"}{print$2}'")

I’ve been thinking about this all week. The above will produce incorrect output (characters will be omited) if the pipe character occurs within the text. I’m a shell scripting noob, so it took me a while to figure this out, but my suggestion follows. Maybe one of the experienced shell scripters here can give it some polish.

set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  'BEGIN{FS=\" \"}{print(substr($0,62,16))}'")

CalvinFold · March 24, 2007, 1:10am

TheMouthOfSauron:

set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\"|\"}{print$2}'")
The above will produce incorrect output (characters will be omited) if the pipe character occurrs

hexdump -C outputs like this:

00000300 30 20 52 2f 41 49 50 72 69 76 61 74 65 44 61 74 |0 R/AIPrivateDat|

The pipes are simply a visual aid, not part of the actual code (or not a useful part anyway, you can tell by reading the data within). I’ve yet to see the pipe character within the two “border” pipe characters, so it seems safe enough.

Or are you referring to something else?

TheMouthOfSauron · March 24, 2007, 1:14am

Sorry, a web browser snafu. See my edited post. In some of the files I tested the pipe character did occur between the “border” pipes.

James_Nierodzik · March 24, 2007, 2:25pm

Interesting! I actually did a bit of testing and never encountered a pipe in between the border pipes. Just curious what kind of files were you dealing with?

Also sorry if the question seems stupid, but are you sure it wasn’t a lower case L? In my terminal font a l and | look a hell of a lot alike

TheMouthOfSauron · March 28, 2007, 1:26am

The files in question were Excel files. I verified that it was the pipe character and not the lower case ‘L’ by searching for the pipe character in TextEdit.

CalvinFold · March 28, 2007, 1:35am

Can someone verify that Sauron’s repalcement Ggrep string works correctly? I’m not enough of a grep expert to tell either.

I don’t need to detect Excel files, but no harm in fixing this little issue in case some day in the future I borrow this handler for a different purpose.

James_Nierodzik · March 28, 2007, 1:59am

Nice work Sauron… your version is in fact correct. Compared your output to mine and mine did strip a few legitimate pipe characters.

CalvinFold · March 28, 2007, 2:55pm

Since it sounds like this works, can you explain what everything starting with awk is doing?

James_Nierodzik · March 28, 2007, 3:18pm

CalvinFold:

TheMouthOfSauron:
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  'BEGIN{FS=\" \"}{print(substr($0,62,16))}'")
Since it sounds like this works, can you explain what everything starting with awk is doing?

Essentially the operation isn’t much diffrent that what we were doing before, but there are a few small changes… First though there is one part you can emmit from the equation, so then it becomes

set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  '{print(substr($0,62,16))}'")

At least in various tests this works… anyways as to whats happening…

awk is now taking the entire string of arguments it was passed, delimited by the default field separator, thus an entire line represented by the $0 and then extracting a substring.

The format of substring is this:

substr(s,p,l) – The substring of s starting at p and continuing for l characters

so the source is an entire line and then it’s just a matter of counting how far in the Human Readable string starts and how many characters it continues for.

CalvinFold · March 28, 2007, 3:45pm

Is “$0” a “begnning of line” designator?

James_Nierodzik · March 28, 2007, 4:08pm

$0 is all arguments… the arguments being designated by the Field Separator and individual arguments being accessible by $# So for example take this.

set x to "arg1 arg2 arg3 arg4"

do shell script "echo " & x & " | awk '{print($0)}'" -- returns "arg1 arg2 arg3 arg4"
do shell script "echo " & x & " | awk '{print($1)}'" -- returns "arg1"
do shell script "echo " & x & " | awk '{print($2)}'" -- returns "arg2"
do shell script "echo " & x & " | awk '{print($3)}'" -- returns "arg3"
do shell script "echo " & x & " | awk '{print($4)}'" -- returns "arg4"

CalvinFold · March 28, 2007, 4:12pm

Ah, gotcha, thanks!

CalvinFold · March 28, 2007, 4:25pm

New and improved code. Any comments?

--
-- Get Hexdump Info v4
-- by Kevin Quosig, 3/28/07
--
-- Used to drag-n-drop files to examine their contents/headers.
--
-- Most code segments courtesy of James Nierodzik of MacScripter
-- http://bbs.applescript.net/profile.php?id=8727
--


--
-- UTILITY HANDLER
--

-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
	
	--save incoming TID state, set new TIDs
	set {ATID, AppleScript's text item delimiters} to {"", find_me}
	
	--using the specified character as a break point to strip the delimiter out and break the string into items
	set being_parsed to text items of parse_me
	
	--switch the TIDs again (replace string)
	set AppleScript's text item delimiters to {replace_with_me}
	
	--coerce it back to a string with new delimiters
	set parse_me to being_parsed as string
	
	--restore incoming TID state
	set AppleScript's text item delimiters to ATID
	
	--return results
	return parse_me
	
end searchNreplace


--
-- MAIN HANDLER
--
on open fileList
	
	-- parse through files dropped onto droplet
	repeat with i from 1 to number of items in fileList
		
		set AppleScript's text item delimiters to {""} --reset delimiters
		set this_item to item i of fileList as string ---pick item to work with
		set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
		set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
		
		--Improved hexdump script line by TheMouthofSauron at MacScripter
		--http://bbs.applescript.net/viewtopic.php?pid=77811#p77811
		--
		--hexdump with the -C parameter formats the hexdump as columns of hex pairs
		--and then a column with a human-readable "ASCII translation" delimited by a pipe
		--character at the beginning and end of the ASCII column
		--
		--"awk" takes the entire -C formatted hexdump line ($0 = all arguements)
		--and filters-out the hex pairs and the delimiting of pipe characters
		--(return only 16 characters starting at position 62)
		--
		set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk '{print(substr($0,62,16))}'")
		
		--remove carriage returns so output is one giant paragraph
		--(allows for TextEdit searching for strings and manual scanning)
		set hex_dump to searchNreplace(hex_dump, return, "")
		
		--write to TextEdit window and rename window to file name to keep things straight
		tell application "TextEdit"
			make new document
			set text of front document to hex_dump
			set name of front window to doc_name
		end tell
	end repeat
end open

TheMouthOfSauron · March 29, 2007, 2:52am

James Nierodzik:

the operation isn’t much diffrent that what we were doing before, but there are a few small changes… First though there is one part you can emmit from the equation, so then it becomes
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  '{print(substr($0,62,16))}'")

Thanks James, I knew there was still room for improvement.

kel · March 29, 2007, 7:20am

I haven’t been following this, but there is a hex editor:

Don’t know if this would pertain to this post, but you can compare files with this.

Edited: oops, posted the wrong hexedit. This is the mac one:

http://hexedit.sourceforge.net/

gl,