I need to really stare at the guts of some files that are incredibly similar. I’ve been using this script to open the hexdump for examination:
on open fileList
repeat with i from 1 to number of items in fileList
tell application "Finder"
set this_item to item i of fileList as string
set this_itemz to POSIX path of this_item
set docname to name of (info for alias this_item)
end tell
set displaytype to do shell script "hexdump -C " & quoted form of this_itemz & " | open -f"
tell application "TextEdit"
set name of window 1 to docname
end tell
end repeat
end open
It outputs a column of hex on the left side, then the “human readable” on the right. Unfortunately in this state you can’t do effective searches with Find due to the word wraps, like this:
on open fileList
repeat with i from 1 to number of items in fileList
tell application "Finder"
set this_item to item i of fileList as string
set this_itemz to POSIX path of this_item
set docname to name of (info for alias this_item)
end tell
set displaytype to (do shell script "hexdump -C " & (quoted form of this_itemz) & " | awk '{print$18}'")
--set displaytype to do shell script "hexdump -C " & quoted form of this_itemz & " | open -f"
tell application "TextEdit"
set name of window 1 to docname
end tell
end repeat
end open
Well I commented-out the TextEdit portion, and there was no error…but no TextEdit window opened either so the script now essentially does nothing.
This is a stand-alone droplet.
It’s supposed to open a TextEdit window with the text gathered by the hexdump. Allows multiple files to be dropped at once so I can do a quick look at hexdumps for multiple files. I believe I got the code I originally posted from here at MacScripter, while back ago.
Newest code per your suggestion:
on open fileList
repeat with i from 1 to number of items in fileList
tell application "Finder"
set this_item to item i of fileList as string
set this_itemz to quoted form of POSIX path of this_item
set docname to name of (info for alias this_item)
end tell
set displaytype to (do shell script "hexdump -C " & this_itemz & " | awk '{print$18}'")
end repeat
end open
This works for me, shouldn’t be too much trouble to throw it in a loop
set theFile to quoted form of POSIX path of (choose file)
tell application "TextEdit"
make new document
set text of front document to (do shell script "hexdump -C " & theFile & " | awk '{print$18}'")
end tell
Worked like a charm, but is still leaving the pipe characters at the beginning and end of each line, as well as the end-of-line character at the end of every line. Basically want the outout to be one big honking paragraph.
Is this doable? Or am I better off handing this off to BBEdit to massage the output?
set theFile to quoted form of POSIX path of (choose file)
set hexDump to (do shell script "hexdump -C " & theFile & " | awk '{print$18}'")
set hexDump to srchRep(hexDump, return, "")
set hexDump to srchRep(hexDump, "|", "")
tell application "TextEdit"
make new document
set text of front document to hexDump
end tell
on srchRep(theStr, fndTxt, repTxt)
set {ATID, AppleScript's text item delimiters} to {"", fndTxt}
set tmpStr to text items of theStr
set AppleScript's text item delimiters to {repTxt}
set theStr to tmpStr as string
set AppleScript's text item delimiters to ATID
return theStr
end srchRep
So now I’m confused…my GREP output is different from yours…wierd that it not only gives different data, but seems to remove all space characters. Also looks like I’m losing a bunch of info in yours, completely skipped over.
If your script was passing all the same data mine was, the result should be:
%PDF-1.4.%…1 0 obj<</Pages 2 0 R/Type/Catal
But instead yours is returning:
%PDF-1.4.%…12og/Metadata
Do you see the difference? I didn’t notice it before, when you first gave me your script, that our two scripts are returning different data. So while your latest script indeed does a single paragraph and strips the pipes and returns, I just realized it’s been doing alot more than that–it looks like it’s GREPing differently somehow and remove whole chunks of data.
Yours is correct through the %PDF-1.4.%…1 but then they diverge. I tried skimming my own GREP output and can’t seem to easily find where yours is getting 2og/Metadata but I suspect it’s farther down the data. Yours is skipping or stripping the 0 obj<</Pages 2 0 R/Type/Catal that mine is returning.
I don’t underdstand GREP well enough, so I can only assume that the | awk '{print$18} arguement is actually changing what GREP “sees” or “returns” and is thus skipping overa bunch of stuff before returning data to AppleScript. Poor conjecture at best, but I’m kinda in the dark.
set AppleScript's text item delimiters to {""}
set theFile to quoted form of POSIX path of (choose file)
set hexDump to (do shell script "hexdump -C " & theFile & " | awk 'BEGIN{FS=\"|\"}{print$2}'")
set hexDump to srchRep(hexDump, return, "")
tell application "TextEdit"
make new document
set text of front document to hexDump
end tell
on srchRep(theStr, fndTxt, repTxt)
set {ATID, AppleScript's text item delimiters} to {"", fndTxt}
set tmpStr to text items of theStr
set AppleScript's text item delimiters to {repTxt}
set theStr to tmpStr as string
set AppleScript's text item delimiters to ATID
return theStr
end srchRep
Worked like a charm, as expected, even after I popped it into the shell of my droplet.
Now in the spirit of “never having to ask this darned question again,” can you go over what exactly this script is doing, cause I’m in the dark.
It took me a few minutes, but I realized srchRep is a very handy little Search-n-Replace routine…nice. It’s a good example of the Text Item Delimiters trick I keep seeing for doing this kind of work. Probably gonna recycle that handler in other scripts.
I just realized that the hexdump didn’t change, and that “|” in shell must mean “next command”…yes? I found some info on awk, it looks like it shares syntax with GREP, but is line-by-line based (near as I can tell). But I can’t quite discern what you’re up to with awk (the ‘BEGIN{FS="|"}{print$2}’" part). Maybe you can step me through it quick?
I’ll post-back my finished, commented code when we’re done. Using comments to “take notes” inside my script.
First the “|” character. The pipe takes output from one command and feeds it to another command. This example isn’t perfect but think of the following AppleScript statement…
display dialog (5 + 1)
If pipes worked in AS the equiv would be something like this
5 + 1 | display dialog
The first command is processed and its output sent to the next for handling.
So taking that we apply it to our hexdump command. hexdump is kicking out the dump one line at a time which through the pipe we feed each line to awk.
Awk is a processing language that is amazingly powerful, very little of which we are using here. The print statement {print$x} basically says print the x item of data. awk by default is " " delimited which is why we ran into problems the first time through.
|%PDF-1.4.%…| worked fine because it had no spaces… so counting the items it was item 18.
|1 0 obj<</Pages | though has spaces inside it so |1 is item 18 but 0 is number 19.
To get around this we tell awk to change it’s own delimiter (or field separator) with BEGIN{FS="|"}. Now that the FS is the pipe character (the [b][/b] character are used to escape the quotes) when we parse a line everything before the first | is item 1, the next junk (what we want) is item 2, so it is that item we print.
All these collected prints are returned back to our AppleScript variable and there we process further to remove the returns.
So I’m not the best at commenting or explaining, but I hope that helps
All I can say is…wow. And that I know know why the “|” is called the “pipe” character.
Here’s my commented-to-death code (most…comments…ever…). Let me know if I’ve got it right…
--
-- Get Hexdump Info v3
-- Used to drag-n-drop files to examine their contents/headers.
--
-- Most code segments courtesy of James Nierodzik of MacScripter
-- http://bbs.applescript.net/profile.php?id=8727
--
--
-- UTILITY HANDLER
--
-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
--save incoming TID state, set new TIDs
set {ATID, AppleScript's text item delimiters} to {"", find_me}
--convert from string to separate text items using new TIDs (find string), effectively "tagging" them for replacement
set being_parsed to text items of parse_me
--switch the TIDs again (replace string)
set AppleScript's text item delimiters to {replace_with_me}
--when "undelimited" back to a string the replacement text pops into place
set parse_me to being_parsed as string
--restore incoming TID state
set AppleScript's text item delimiters to ATID
--return results
return parse_me
end searchNreplace
--
-- MAIN HANDLER
--
on open fileList
-- parse through files dropped onto droplet
repeat with i from 1 to number of items in fileList
tell application "Finder"
set AppleScript's text item delimiters to {""} --reset delimiters
set this_item to item i of fileList as string ---pick item to work with
set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
end tell
-- Get hex_dump and fomat (-C parameter sets-up formatting in the "hex and piped human-readable" format),
-- then pipe to awk, one line at a time (hexdump does a line, then awk works with it further).
-- Then set awk's field separators (FS) to the pipe character (|), using escaped characters (\"),
-- this makes awk see two "items" (stuff before the first pipe, and the stuff after it).
-- {print$2} tells awk to return the second item only (the human-readable part of the hexdump)
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\"|\"}{print$2}'")
--remove carriage returns so output is one giant paragraph
--(allows for TextEdit searching for strings and manual scanning)
set hex_dump to searchNreplace(hex_dump, return, "")
--write to TextEdit window and rename window to file name to keep things straight
tell application "TextEdit"
make new document
set text of front document to hex_dump
set name of front window to doc_name
end tell
end repeat
end open
(Yes I know, LOTS of comments…let’s just say as often as I get interrupted and the huge lags between working on scripts, it helps when I have to recall how I pulled-off a certain handler. This is especially true when folks give me shell stuff. I don’t have to comment regular AppleScript stuff quite as much, it’s a bit more intuitive to me.)
Looks pretty good, and I even see my name in there =)
Just a few thoughts though…
It’s not so much tagging as it is using the specified character as a break point to strip the delimiter out and break the string into items. Then when we coerce it back to a string with new delimiters it’s using our new character, in this case “”, as the separator. I’m probably just being anal, but saying it was tagging them just didn’t sound right to me.
I don’t think that needs to be in a tell Finder block, so removing tells, especially inside a loop, is always a good thing.
Other than that looks pretty good! So I know what the script is doing, but I’m curious what you’re using the actual data for to check?
We have a digital asset management system that we found out too late strips the resource fork from Mac files. So when Mac users download things like Illustrator or Photoshop files, they get all confused because either the Mac OS makes bad guesses as to the file type, or the asset manager sometimes adds extensions further confusing the matter (making Illustrator files look like PDFs, for example).
So my script uses known information like extensions and headers info to fix the file type and creator type…for most cases.
I use the routine we’ve been working on to manually dump the file guts so I can look for unique strings or other information I can use to figure out what kind of file it is, then use a GREP routine in my script to scan for those unique strings.
Been working fine until now, since scanning the -C formatted hexdump wasn’t too bad, differences were glaring enough. But right now I’m trying to figure out how to tell the difference between a real PDF and an Illustrator file that has had “.pdf” added to the end of it erroneously by the asset manager (it adds extensions when there aren’t any…which is annoying because it guesses wrong so often).
Unfortunately, Illustrator CS2 and Acrobat files look identical in a hexdump. I wanted to make it more human readable so I could scan it in a very detailed fashion. Still haven’t found a difference…native Illustrator CS2 files appear to be identical to a regular PDF file.
Anyway, that’s what I use it for…just as a utility for scanning file innards.
After all this work I decided to refine it enough for Code Exchange.
--
-- UTILITY HANDLER
--
-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
--save incoming TID state, set new TIDs
set {ATID, AppleScript's text item delimiters} to {"", find_me}
--using the specified character as a break point to strip the delimiter out and break the string into items
set being_parsed to text items of parse_me
--switch the TIDs again (replace string)
set AppleScript's text item delimiters to {replace_with_me}
--coerce it back to a string with new delimiters
set parse_me to being_parsed as string
--restore incoming TID state
set AppleScript's text item delimiters to ATID
--return results
return parse_me
end searchNreplace
--
-- MAIN HANDLER
--
on open fileList
-- parse through files dropped onto droplet
repeat with i from 1 to number of items in fileList
set AppleScript's text item delimiters to {""} --reset delimiters
set this_item to item i of fileList as string ---pick item to work with
set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
-- Get hex_dump and fomat (-C parameter sets-up formatting in the "hex and piped human-readable" format),
-- then pipe to awk, one line at a time (hexdump does a line, then awk works with it further).
-- Then set awk's field separators (FS) to the pipe character (|), using escaped characters (\"),
-- this makes awk see two "items" (stuff before the first pipe, and the stuff after it).
-- {print$2} tells awk to return the second item only (the human-readable part of the hexdump)
set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk 'BEGIN{FS=\"|\"}{print$2}'")
--remove carriage returns so output is one giant paragraph
--(allows for TextEdit searching for strings and manual scanning)
set hex_dump to searchNreplace(hex_dump, return, "")
--write to TextEdit window and rename window to file name to keep things straight
tell application "TextEdit"
make new document
set text of front document to hex_dump
set name of front window to doc_name
end tell
end repeat
end open