Cleaner Hexdump?

I need to really stare at the guts of some files that are incredibly similar. I’ve been using this script to open the hexdump for examination:


on open fileList
	repeat with i from 1 to number of items in fileList
		tell application "Finder"
			set this_item to item i of fileList as string
			set this_itemz to POSIX path of this_item
			
			set docname to name of (info for alias this_item)
		end tell
		set displaytype to do shell script "hexdump -C " & quoted form of this_itemz & " | open -f"
		tell application "TextEdit"
			set name of window 1 to docname
		end tell
	end repeat
end open

It outputs a column of hex on the left side, then the “human readable” on the right. Unfortunately in this state you can’t do effective searches with Find due to the word wraps, like this:

00000300 30 20 52 2f 41 49 50 72 69 76 61 74 65 44 61 74 |0 R/AIPrivateDat|
00000310 61 34 20 31 32 20 30 20 52 2f 41 49 50 72 69 76 |a4 12 0 R/AIPriv|
00000320 61 74 65 44 61 74 61 35 20 31 33 20 30 20 52 2f |ateData5 13 0 R/|
00000330 41 49 50 72 69 76 61 74 65 44 61 74 61 36 20 31 |AIPrivateData6 1|
00000340 34 20 30 20 52 2f 41 49 50 72 69 76 61 74 65 44 |4 0 R/AIPrivateD|

Is there any way to just get the human-readable in nice, searchable form, more like this…

0 R/AIPrivateData4 12 0 R/AIPrivateData5 13 0 R/AIPrivateData6 14 0 R/AIPrivateD

…so I can use find tools or better visual scanning?

Thanks in advance,

This should get you started

set theFile to quoted form of POSIX path of (choose file)
set hexDump to (do shell script "hexdump -C " & theFile & " | awk '{print$18}'")

I keep getting an error:

NSReceiverEvaluationScript Error: 4

Here’s how I introduced it:


on open fileList
	repeat with i from 1 to number of items in fileList
		tell application "Finder"
			set this_item to item i of fileList as string
			set this_itemz to POSIX path of this_item
			
			set docname to name of (info for alias this_item)
		end tell
		set displaytype to (do shell script "hexdump -C " & (quoted form of this_itemz) & " | awk '{print$18}'")
		--set displaytype to do shell script "hexdump -C " & quoted form of this_itemz & " | open -f"
		tell application "TextEdit"
			set name of window 1 to docname
		end tell
	end repeat
end open

I suspect that your text edit info lines are to blame… if you remove them does the scrip run?

Is this part of a larger script?

Well I commented-out the TextEdit portion, and there was no error…but no TextEdit window opened either so the script now essentially does nothing. :frowning:

This is a stand-alone droplet.

It’s supposed to open a TextEdit window with the text gathered by the hexdump. Allows multiple files to be dropped at once so I can do a quick look at hexdumps for multiple files. I believe I got the code I originally posted from here at MacScripter, while back ago.

Newest code per your suggestion:


on open fileList
	repeat with i from 1 to number of items in fileList
		tell application "Finder"
			set this_item to item i of fileList as string
			set this_itemz to quoted form of POSIX path of this_item
			set docname to name of (info for alias this_item)
		end tell
		set displaytype to (do shell script "hexdump -C " & this_itemz & " | awk '{print$18}'")		
	end repeat
end open

This works for me, shouldn’t be too much trouble to throw it in a loop

set theFile to quoted form of POSIX path of (choose file)

tell application "TextEdit"
	make new document
	set text of front document to (do shell script "hexdump -C " & theFile & " | awk '{print$18}'")
end tell

Worked like a charm, but is still leaving the pipe characters at the beginning and end of each line, as well as the end-of-line character at the end of every line. Basically want the outout to be one big honking paragraph. :wink:

Is this doable? Or am I better off handing this off to BBEdit to massage the output?

Thanks again!

How about like this then?

set theFile to quoted form of POSIX path of (choose file)
set hexDump to (do shell script "hexdump -C " & theFile & " | awk '{print$18}'")

set hexDump to srchRep(hexDump, return, "")
set hexDump to srchRep(hexDump, "|", "")

tell application "TextEdit"
	make new document
	set text of front document to hexDump
end tell

on srchRep(theStr, fndTxt, repTxt)
	set {ATID, AppleScript's text item delimiters} to {"", fndTxt}
	set tmpStr to text items of theStr
	set AppleScript's text item delimiters to {repTxt}
	set theStr to tmpStr as string
	set AppleScript's text item delimiters to ATID
	return theStr
end srchRep

Your output is what I had in mind, but something odd is going on.

This is the first few lines of my original hexdump (Illustrator CS file):

00000000 25 50 44 46 2d 31 2e 34 0d 25 e2 e3 cf d3 0d 0a |%PDF-1.4.%…|
00000010 31 20 30 20 6f 62 6a 3c 3c 2f 50 61 67 65 73 20 |1 0 obj<</Pages |
00000020 32 20 30 20 52 2f 54 79 70 65 2f 43 61 74 61 6c |2 0 R/Type/Catal|

(note the space characters, “Pages” and “Type”)

The version previous outputs:

|%PDF-1.4.%…|
|1
|2
|og/Metadata

And now the last one:

%PDF-1.4.%…12og/Metadata

So now I’m confused…my GREP output is different from yours…wierd that it not only gives different data, but seems to remove all space characters. Also looks like I’m losing a bunch of info in yours, completely skipped over.

Eeek. I wish I understood the shell GREP more.

Is that what you wanted? One long string? I simply removed the return breaks and the pipe character… Or maybe I’m missing what you are getting at.

Compare the last two outputs above with the output of my original script before we started this thread:

00000000 25 50 44 46 2d 31 2e 34 0d 25 e2 e3 cf d3 0d 0a |%PDF-1.4.%…|
00000010 31 20 30 20 6f 62 6a 3c 3c 2f 50 61 67 65 73 20 |1 0 obj<</Pages |
00000020 32 20 30 20 52 2f 54 79 70 65 2f 43 61 74 61 6c |2 0 R/Type/Catal|

If your script was passing all the same data mine was, the result should be:

%PDF-1.4.%…1 0 obj<</Pages 2 0 R/Type/Catal

But instead yours is returning:

%PDF-1.4.%…12og/Metadata

Do you see the difference? I didn’t notice it before, when you first gave me your script, that our two scripts are returning different data. So while your latest script indeed does a single paragraph and strips the pipes and returns, I just realized it’s been doing alot more than that–it looks like it’s GREPing differently somehow and remove whole chunks of data.

Yours is correct through the %PDF-1.4.%…1 but then they diverge. I tried skimming my own GREP output and can’t seem to easily find where yours is getting 2og/Metadata but I suspect it’s farther down the data. Yours is skipping or stripping the 0 obj<</Pages 2 0 R/Type/Catal that mine is returning.

I don’t underdstand GREP well enough, so I can only assume that the | awk '{print$18} arguement is actually changing what GREP “sees” or “returns” and is thus skipping overa bunch of stuff before returning data to AppleScript. Poor conjecture at best, but I’m kinda in the dark.

::ponders::

got a ftp somewhere you can upload the file too? so I can do some testing on my end.

I sent you a PM with my e-mail. Supply an e-mail address and I’ll send along the test file and my original GREP script. They are really small.

Can’t give-out the company FTP unfortunately, unless it’s for client business, due to NDA stuff.

Okay think I found it. Try using this

set AppleScript's text item delimiters to {""}
set theFile to quoted form of POSIX path of (choose file)
set hexDump to (do shell script "hexdump -C " & theFile & " | awk  'BEGIN{FS=\"|\"}{print$2}'")

set hexDump to srchRep(hexDump, return, "")

tell application "TextEdit"
	make new document
	set text of front document to hexDump
end tell

on srchRep(theStr, fndTxt, repTxt)
	set {ATID, AppleScript's text item delimiters} to {"", fndTxt}
	set tmpStr to text items of theStr
	set AppleScript's text item delimiters to {repTxt}
	set theStr to tmpStr as string
	set AppleScript's text item delimiters to ATID
	return theStr
end srchRep

HUZZAH!

Worked like a charm, as expected, even after I popped it into the shell of my droplet.

Now in the spirit of “never having to ask this darned question again,” can you go over what exactly this script is doing, cause I’m in the dark.

It took me a few minutes, but I realized srchRep is a very handy little Search-n-Replace routine…nice. It’s a good example of the Text Item Delimiters trick I keep seeing for doing this kind of work. Probably gonna recycle that handler in other scripts. :wink:

I just realized that the hexdump didn’t change, and that “|” in shell must mean “next command”…yes? I found some info on awk, it looks like it shares syntax with GREP, but is line-by-line based (near as I can tell). But I can’t quite discern what you’re up to with awk (the ‘BEGIN{FS="|"}{print$2}’" part). Maybe you can step me through it quick?

I’ll post-back my finished, commented code when we’re done. Using comments to “take notes” inside my script. :wink:

THANKS!

I don’t mind explaining at all :smiley:

First the “|” character. The pipe takes output from one command and feeds it to another command. This example isn’t perfect but think of the following AppleScript statement…

display dialog (5 + 1)

If pipes worked in AS the equiv would be something like this

5 + 1 | display dialog

The first command is processed and its output sent to the next for handling.

So taking that we apply it to our hexdump command. hexdump is kicking out the dump one line at a time which through the pipe we feed each line to awk.

Awk is a processing language that is amazingly powerful, very little of which we are using here. The print statement {print$x} basically says print the x item of data. awk by default is " " delimited which is why we ran into problems the first time through.

|%PDF-1.4.%…| worked fine because it had no spaces… so counting the items it was item 18.

|1 0 obj<</Pages | though has spaces inside it so |1 is item 18 but 0 is number 19.

To get around this we tell awk to change it’s own delimiter (or field separator) with BEGIN{FS="|"}. Now that the FS is the pipe character (the [b][/b] character are used to escape the quotes) when we parse a line everything before the first | is item 1, the next junk (what we want) is item 2, so it is that item we print.

All these collected prints are returned back to our AppleScript variable and there we process further to remove the returns.


So I’m not the best at commenting or explaining, but I hope that helps :smiley:

All I can say is…wow. And that I know know why the “|” is called the “pipe” character. :wink:

Here’s my commented-to-death code (most…comments…ever…). Let me know if I’ve got it right…

--
-- Get Hexdump Info v3
-- Used to drag-n-drop files to examine their contents/headers.
--
-- Most code segments courtesy of James Nierodzik of MacScripter
-- http://bbs.applescript.net/profile.php?id=8727
--


--
-- UTILITY HANDLER
--

-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
	
	--save incoming TID state, set new TIDs
	set {ATID, AppleScript's text item delimiters} to {"", find_me}
	
	--convert from string to separate text items using new TIDs (find string), effectively "tagging" them for replacement
	set being_parsed to text items of parse_me
	
	--switch the TIDs again (replace string)
	set AppleScript's text item delimiters to {replace_with_me}
	
	--when "undelimited" back to a string the replacement text pops into place
	set parse_me to being_parsed as string
	
	--restore incoming TID state
	set AppleScript's text item delimiters to ATID
	
	--return results
	return parse_me
	
end searchNreplace


--
-- MAIN HANDLER
--
on open fileList
	
	-- parse through files dropped onto droplet
	repeat with i from 1 to number of items in fileList
		tell application "Finder"
			set AppleScript's text item delimiters to {""} --reset delimiters
			set this_item to item i of fileList as string ---pick item to work with
			set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
			set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
		end tell
		
		-- Get hex_dump and fomat (-C parameter sets-up formatting in the "hex and piped human-readable" format),
		-- then pipe to awk, one line at a time (hexdump does a line, then awk works with it further).
		-- Then set awk's field separators (FS) to the pipe character (|), using escaped characters (\"),
		-- this makes awk see two "items" (stuff before the first pipe, and the stuff after it).
		-- {print$2} tells awk to return the second item only (the human-readable part of the hexdump)
		set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  'BEGIN{FS=\"|\"}{print$2}'")
		
		--remove carriage returns so output is one giant paragraph
		--(allows for TextEdit searching for strings and manual scanning)
		set hex_dump to searchNreplace(hex_dump, return, "")
		
		--write to TextEdit window and rename window to file name to keep things straight
		tell application "TextEdit"
			make new document
			set text of front document to hex_dump
			set name of front window to doc_name
		end tell
	end repeat
end open

(Yes I know, LOTS of comments…let’s just say as often as I get interrupted and the huge lags between working on scripts, it helps when I have to recall how I pulled-off a certain handler. This is especially true when folks give me shell stuff. I don’t have to comment regular AppleScript stuff quite as much, it’s a bit more intuitive to me.)

And again…THANKS!

Looks pretty good, and I even see my name in there =)

Just a few thoughts though…

It’s not so much tagging as it is using the specified character as a break point to strip the delimiter out and break the string into items. Then when we coerce it back to a string with new delimiters it’s using our new character, in this case “”, as the separator. I’m probably just being anal, but saying it was tagging them just didn’t sound right to me.

I don’t think that needs to be in a tell Finder block, so removing tells, especially inside a loop, is always a good thing.

Other than that looks pretty good! So I know what the script is doing, but I’m curious what you’re using the actual data for to check?

We have a digital asset management system that we found out too late strips the resource fork from Mac files. So when Mac users download things like Illustrator or Photoshop files, they get all confused because either the Mac OS makes bad guesses as to the file type, or the asset manager sometimes adds extensions further confusing the matter (making Illustrator files look like PDFs, for example).

So my script uses known information like extensions and headers info to fix the file type and creator type…for most cases.

I use the routine we’ve been working on to manually dump the file guts so I can look for unique strings or other information I can use to figure out what kind of file it is, then use a GREP routine in my script to scan for those unique strings.

Been working fine until now, since scanning the -C formatted hexdump wasn’t too bad, differences were glaring enough. But right now I’m trying to figure out how to tell the difference between a real PDF and an Illustrator file that has had “.pdf” added to the end of it erroneously by the asset manager (it adds extensions when there aren’t any…which is annoying because it guesses wrong so often).

Unfortunately, Illustrator CS2 and Acrobat files look identical in a hexdump. I wanted to make it more human readable so I could scan it in a very detailed fashion. Still haven’t found a difference…native Illustrator CS2 files appear to be identical to a regular PDF file. :frowning:

Anyway, that’s what I use it for…just as a utility for scanning file innards. :wink:

Better?

After all this work I decided to refine it enough for Code Exchange. :wink:

--
-- UTILITY HANDLER
--

-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
	
	--save incoming TID state, set new TIDs
	set {ATID, AppleScript's text item delimiters} to {"", find_me}
	
	--using the specified character as a break point to strip the delimiter out and break the string into items
	set being_parsed to text items of parse_me
	
	--switch the TIDs again (replace string)
	set AppleScript's text item delimiters to {replace_with_me}
	
	--coerce it back to a string with new delimiters
	set parse_me to being_parsed as string
	
	--restore incoming TID state
	set AppleScript's text item delimiters to ATID
	
	--return results
	return parse_me
	
end searchNreplace


--
-- MAIN HANDLER
--
on open fileList
	
	-- parse through files dropped onto droplet
	repeat with i from 1 to number of items in fileList
		
		set AppleScript's text item delimiters to {""} --reset delimiters
		set this_item to item i of fileList as string ---pick item to work with
		set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
		set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
		
		-- Get hex_dump and fomat (-C parameter sets-up formatting in the "hex and piped human-readable" format),
		-- then pipe to awk, one line at a time (hexdump does a line, then awk works with it further).
		-- Then set awk's field separators (FS) to the pipe character (|), using escaped characters (\"),
		-- this makes awk see two "items" (stuff before the first pipe, and the stuff after it).
		-- {print$2} tells awk to return the second item only (the human-readable part of the hexdump)
		set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk  'BEGIN{FS=\"|\"}{print$2}'")
		
		--remove carriage returns so output is one giant paragraph
		--(allows for TextEdit searching for strings and manual scanning)
		set hex_dump to searchNreplace(hex_dump, return, "")
		
		--write to TextEdit window and rename window to file name to keep things straight
		tell application "TextEdit"
			make new document
			set text of front document to hex_dump
			set name of front window to doc_name
		end tell
	end repeat
end open