Parsing a large data file

fryboyslim · February 15, 2011, 4:17pm

Hi all,

I’m writing a slightly bigger script than i normally attempt and would appreciate advice on the best route to take.

I’ve been using a PC app for many years (as there wasn’t a Mac version of it) and recently someone launched an app with similar features for Mac, that is also AppleScript-able. I’m writing a script that takes a data file from the PC app and generates the equivalent within the Mac app. I’ve worked out the latter half of the script and I’m now working on the section that analyses the PC programs’s data file. I’ve largely worked out the data structure and what hex codes map to various properties of the program. The data seems to be delimited by 00 00 00, and there is an awful lot of it (ie data files tend to be >200Kb) in quite a complex arrangement.

So i have to write a routine to parse this data into something meaningful for the rest of my applescript and i have a few questions:

What is the best approach to read and parse the data given the large data sizes? This is what I’ve considered so far and would appreciate feedback from anyone who has been down this road already!

Breaking the data down into small lists:
First break up the data into 2 digit chunks (47726F7500000047 → 47,72,6F,75,00,00,00,47)
Set the TID to “00,00,00” to break up the data into lists (“47,72,6F,75”,“47”)
Parse the lists using a lookup table to generate meaning
I think my raw data is just too big to read into a list in the first place though - what is an alternate way to do this?
Using Database Events to create a Database with multiple fields seems a good way to handle the large amount of data, but then what is the best way to read in the data to it?

regards,
g

Adam_Bell · February 15, 2011, 5:00pm

Your best bet is to read the file containing the raw data in successive chunks, parse it, read it out to an output file, read in another chunk, etc. There’s a good tutorial on this in unScripted, but unfortunately it doesn’t seem to be available.

fryboyslim · February 16, 2011, 2:41am

Thanks for that, in the absence of UnScripted, would you be able to elabourate?!

Many thanks
g

Nigel_Garvey · February 16, 2011, 12:04pm

You need to parse the data as “blocks of meaning” rather than just breaking it up into arbitrary chunks. Since none of us knows the structure of the data, we can’t really give you any sensible suggestions for doing that. All I’ve been able to infer from the information given is that the data are text representing hexadecimal byte values.

You’re unlikely to have a three-byte delimiter value. In an old-style Mac data file, a string of three zeros might be padding after a previous item or they might be part of a four-byte integer value (or parts of two adjacent integer values). An integer value might be an item in its own right or an indicator of how many of the following bytes represent the next item.

toc-rox · February 18, 2011, 7:43pm

Think about a combination of AppleScript and a command line tool for your task.
E.g. Perl could be a good choice because it was constructed for parsing large (text) files.

Adam_Bell · February 18, 2011, 8:15pm

As Nigel suggests, give us more information about the data you want to read and parse. In the meantime you can read an article he wrote about The Ins & Outs of File Read/Write in AppleScript. I fixed the links in the unScripted Quick Guide