Hi all,
I’m new to this forum, so I’m not sure if this is the best place to ask this. If not, please let me know.
**Background
I have a dataset that’s 1.2GB big! The text file is delimited by the character “~” (Tilde). In the first row it has the field(column) names and every subsequent row has the data. Below is an example of what it looks like. The actual data set has more columns and rows.
Group_number~Price~Group_description
1~34.2~Base_Customer
2~23~Medium_Customer
3~89~High_Customer
1~90.21~Base_Customer
2~100.55~Medium_Customer
3~200.11~High_Customer
***Problem
Now, I usually read this kind of file using Stata (a statistical package - www.stata.com). I then do all sorts of analyses on the same package. The problem is that Stata holds the data you load in memory, and the text file I’m trying to read is 1.2GB!!!. It has way, way, way, more rows than the example I gave above, which makes it reach such a size.
I’m no expert in computer hardware, but my MackBook Pro 2.16 GHz Intel Duo has only 1GB of RAM. So, I’m pretty sure it won’t load!. Furthermore, I’ve tried loading large files with Stata and all the OS is willing to give it is 900Mb.
***Request for help
So, I figured that what I may be able to do is somehow open the text file as a stream and split it into, say 3 separate files, one for each Group_number in my example above. This would hopefully yield 3 separate text files that would be under 900Mb each. I could then easily manage these. The problem is, I don’t know how to do this in OS X. I’ve seen/heard of people doing this in C# in Windows, but I’m wondering (and quietly hoping) if an AppleScript could do the job. If so, I would love some pointers if anyone’s got any. I have very limited experience with AppleScript but have experience with procedural and (basic level) object-oriented programming so I’m hoping I’ll be able to work out scripts people pitch.
Thanks so much in advance for your help!!!
Model: MacBook Pro
AppleScript: 2.2
Browser: Safari 525.20
Operating System: Mac OS X (10.5)