Convert datafile to list

How would I convert my database into the list format so I can easily populate my Studio table? The first 6 records of my 7000-record database file looks like this:

AFL AFL 7 AFL 44
AMTRAK AMTRAK 7
Abel act 272 A 44
Alabama Alabama 235 A 44
Alaska Alaska 236 A 44
America America 191 er 201

AS Studio provides a simple method for populating tables:

“set content of tableView to {{“Red”, “Green”, “Blue”}, {“Black”, “White”, “Gray”}} --This would produce a table view with two rows and three columns.”

I’ve already tried using an applescript loop subroutine to process data read from a file by formating each line read, however this takes too long. My intuition is to employ a clever UNIX AWK function do the translation. While it’s easy enough to use awk ‘{print “{"”$1"", "“$2"", "”$3"", "“$4"", "”$5"", "“$6"", "”$7""}"}’ to format data record to produce the following outcome:

{“AFL”,“AFL”,“7”,“AFL”,“44”}
{“AMTRAK”,“AMTRAK”,“7”,“”,“”}
{“Abel”,“act”,“272”,“A”,“44”}

I’m stumped on how to handle the database end-of-lines to generate the desired form:

{{“AFL”,“AFL”,“7”,“AFL”,“44”}, {“AMTRAK”,“AMTRAK”,“7”,“”,“”}, {“Abel”,“act”,“272”,“A”,“44”}} …

I am guessing that the solution to my problem is fairly well known to those who work with Aapplescript Studio. I would appreciate any advice on this.

Try this:

awk ‘{print “{"”$1"", "“$2"", "”$3"", "“$4"", "”$5"", "“$6"", "”$7""}"}’ /Users/awnispel/Library/Scripts/SpellAwareAuxilaryScripts/SpellDictWordBitsCopy.txt | tr ‘\n’ ‘|’ | sed ‘s/|/, /g’

Note that you can not use UNIX sed to remove the new lines in a file because sed reads subsequent text lines automatically. The work-around employs UNIX function ‘tr’. Unfortunately, tr can only replace one symbol with one symbol which won’t do for creating both a space and comma between each list record. Therefore I use sed to replace a temporary pipe (|) character.

There should be a more elegant method using only AWK, however I would have to re-study the AWK manual for awhile. Perhaps some genius out there knows of it?

Of course the solution above will require some fussing over to work as an AppleScript shell script call due to the finicky nature of the Applescript browser.

So you’re saying that something like this is too slow?

set x to "AFL	AFL	7	AFL	44
AMTRAK	AMTRAK	7		
Abel	act	272	A	44
Alabama	Alabama	235	A	44
Alaska	Alaska	236	A	44
America	America	191	er	201"
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "	" -- that's a backslash followed by t: \t
set mylist to {}
set myparagraphs to every paragraph of x
repeat with eachparagraph in myparagraphs
	set mysublist to every text item of eachparagraph as list
	set end of mylist to mysublist
end repeat
set AppleScript's text item delimiters to tid
mylist

Edit: Missed a couple of tabs in the AMTRAK sublist :stuck_out_tongue:

You could probably rework your script using perl in place of sed:

If called from an Applescript using do shell script: perl -pe ‘s/[|\n]/, /g’
If called from a shell: perl -pe ‘s/[|\n]/, /g’

I haven’t tested the above!

Nice suggestions cwtnospam. Your applescript method for generating lists by amending a defined list has got to be the definitive way to create lists from data less than, say 500 entries. I overlooked how easy it is to get a list format by such a simple process. Unfortunately, this is still not practical for large sets of data–in my case, 7000 entries. I tried your method on it and it takes a couple minutes to generate the desired output. The bottle neck is the pokey applescript for-loop. I recall there is a method to speed up this loop using a pointer (‘reference to’ in AppleScript). A fellow at Apple showed it to me during a job interview I had there so years ago, but I forgot just how it goes. So it looks like UNIX AWK is the only workable solution I can summons. I tried to use your PERL formula but this boils down to the same thing as the SED function where it’s impossible to ferret out the new line character for the reason sited in my previous entry–I believe SED relies on the new line character of a text file for its automatic input, thus making it’s elimination impossible.

Now I have to figure out how to properly parse my unwieldy AWK formula to work in an AppleScript. This is very unpleasant work. I wish there were a tool out there that would translate UNIX scripts into kosher AppleScript ‘do shell’ scripts!

Here’s my AWK solution translated into AppleScript:

 set myResult to "{{" & (characters 1 thru -4 of (do shell script "awk '{print \"\\\"\"$1\"\\\", \\\"\"$2\"\\\", \\\"\"$3\"\\\", \\\"\"$4\"\\\", \\\"\"$5\"\\\", \\\"\"$5\"\\\", \\\"\"$7\"\\\"\"}' " & spellRuleFilePath & " | tail -5 |  tr '" & (ASCII character 10) & "' '|' | sed 's/|/}, {/g'") as text) & "}"
"

Yech! Nevertheless, this sort of solution is a practical method for making it possible to quickly load large amounts of delineated text file data into an AS Studio table. Unfortunately, I find that this does not produce an actual list (see comments below) and will not populate a table as I hoped. (revised 12-1) :frowning:

Hi.

I can’t comment on the shell script, but as far as the AppleScript’s concerned:

‘characters 1 thru -4 of (blah blah blah) as text’ should be ‘text 1 thru -4 of (blah blah blah)’. Otherwise you’re creating a list containing a hell of a lot of individual characters and then coercing it to text using the current value of AppleScript’s text item delimiters.

‘ASCII character’ and ‘ASCII number’ are deprecated as from Leopard. Your original use of “\n” in the shell script text would probably be better.

The result is a text representation of an AppleScript list. If you want an actual list, you’ll need to use ‘run script myResult’.

Thanks for taking a critical look at the grammar of my function:

The most important point you raised was the one about my script generating a ‘text representation’. I’m glad you spotted this issue because I could not get my function to populate my is AS Studio and could not understand why. It didn’t occur to me that I had not yet generated a working list. Your solution to use ‘run script’ was not fruitful, however. It required yet another meta-level of quotations and soon drove me nuts. Instead, I revisited cwtnospam’s applescript loop method while experimenting with the use of pointers (reference to’s) to get the speed up. This was an interesting exploration, although I ran into other issues with this strategy. (see my other post at http://bbs.macscripter.net/viewtopic.php?pid=107573#p107573)

Your other criticism was: “‘characters 1 thru -4 of (blah blah blah) as text’ should be ‘text 1 thru -4 of (blah blah blah)’. Otherwise you’re creating a list containing a hell of a lot of individual characters and then coercing it to text using the current value of AppleScript’s text item delimiters.” This seems counter-intuitive to me but I was able quickly test the validity of your method.

Thanks for the tip about learning to avoid the use of ‘ASCII character’ and ‘ASCII number’ with the adoption of new norms. I see that “\t”, for example, will work for tabs.

This is beginning to bug me, because PHP rips through this in about two seconds (maybe less!):

<?php
$x = "AFL	AFL	7	AFL	44
AMTRAK	AMTRAK	7		
Abel	act	272	A	44
Alabama	Alabama	235	A	44
Alaska	Alaska	236	A	44
America	America	191	er	201
";
$y = "AFL	AFL	7	AFL	44
AMTRAK	AMTRAK	7		
Abel	act	272	A	44
Alabama	Alabama	235	A	44
Alaska	Alaska	236	A	44
America	America	191	er	201";
for ($i=1;$i<11;$i++)
{
$x .= $x;
}
$x .= $y;
$thearray = array();
$ax = explode("\n",$x);
$n = count($ax);
echo $n."<br>";
for ($i=0; $i < $n; $i++)
	{
	$th = explode("\t",$ax[$i]);
	$thearray[$i]=$th;
	}
var_dump($thearray);
?>

cwtnospam: thanks for your PHP workout. I’m thinking that will be my next scripting language. For now, however, I will need to squeeze all I can from AppleScript. In that spirit, I thought you might be interested in seeing how by using AS pointers–or references, as they are called–you can process those 7000 records in a second. See the discussion at:

http://bbs.macscripter.net/viewtopic.php?id=27715

:slight_smile:

Interesting Applescript quirk. Of course, PHP has plenty of it’s own. No language is perfect!