Applescript challenge

Hi all !

I have been asked to help find a solution for a project I’m working on. My objective is to read a number of text files containing rows of text with specific info in them.

eg. “name,adres,telephone number” You get the idea.

There are a lot of these rows in some of the text files. Over 500 000 rows per text file and just over 5000 text files to be exact. This number will grow next week. Anyway, my plan was to simply read these files and turn the data into a list.

My question is this:

Is it possible to find the first item (in this case"name") quickly without having to run a repeat loop like:

repeat with i in theList
  if item 1 of i equals "whatever" then
    set newList to i
  end if
end repeat

The reason for this is because we will occassionally need to find specific data regarding the “name” entry. Almost like in a database. The text files are logs that are generated automatically by a third party and we need to streamline our workflow to deal with this old system.

The order of the items in the text file will always be the same.

Is there a better way? Perhaps a cocoa method. I’m still in the procces of learning applescript and I am not familiar with cocoa or how to implement it since I have to do this in my free time. i have written a small applescript studio app before but it’s relatively basic.

Thanks

Hi EMTools,

what about using a command line tool like grep - i don’t believe it’s speed can be beaten by any AppleScript - and if you need to - you can easily implement it in a script using ‘do shell script’.

An example:

given a list called ‘test.list’:

paul smith,street,209348293-234234
frank lloyd, other street, 234234234-234234
josephine smith, next street , 2342342342-23423
albert hall, onemore street, 234234-234234
steven smith, some boulevard, 23423423-23423
ringo starr, smith street, 234234-234234
george smith, somewhere, 123123-23234
john doe, nowhere, 2342342-234234

where you want to filter out all occurences of smith it was simply

grep smith test.list > filteredtest.list

and if you don’t want to have ringo in your filtered list (who lives in smith street) - you could do something like this instead:

grep -e ‘^[^,]*smith’ test.list > filteredtest.list

D.

Hi Dominik

Thanks for your reply. I believe I was unclear as to what it is I need to do. I don;t need to filter out anything. I simply need to find the required text in item 1 of myList. This is basically like a unique id.

Then I need to get the rest of that particular entry’s info like the “adress” and so on. The text files basically work like a database for the info and item 1 is the entry id.

I hope this makes sense. Can I do this with the grep command and if so how do I implement it ? I am having difficulty understanding how to use “do shell script” and command line tools. At this time it’s a bit advanced for me since I don’t have the time to experiment.We all know work pressure…:stuck_out_tongue:

Thanks

EM

EM,

i am not 100% sure now but i still think I understood your problem. Try yourself:

Using my test ‘database’ from above:

paul smith,street,209348293-234234
frank lloyd, other street, 234234234-234234
josephine smith, next street , 2342342342-23423
albert hall, onemore street, 234234-234234
steven smith, some boulevard, 23423423-23423
ringo starr, smith street, 234234-234234
george smith, somewhere, 123123-23234
john doe, nowhere, 2342342-234234

and running this script:

set databaseFile to quoted form of (POSIX path of (choose file))

set nameToSearch to text returned of (display dialog "name?" default answer "smith")

set resultOfSearch to paragraphs of (do shell script "grep -e'^[^,]*" & nameToSearch & "' " & databaseFile)

you get as resultOfSearch this list:

{“paul smith,street,209348293-234234”, “josephine smith, next street , 2342342342-23423”, “steven smith, some boulevard, 23423423-23423”, “george smith, somewhere, 123123-23234”}

Since I thought you don’t want to do further processing I suggested do directly redirect the output of grep to an other text file in my command line exmple above (> filteredtest.list)

if you want to turn the result in a list of records you might add these lines at the end of my script:



set listOfRecords to {}
set {tid, text item delimiters} to {text item delimiters, ","}
repeat with thisEntry in resultOfSearch
	set thisRecord to {nam:text item 1 of thisEntry, street:text item 2 of thisEntry, phone:text item 3 of thisEntry}
	copy thisRecord to the end of listOfRecords
end repeat
set text item delimiters to tid
get listOfRecords

an the result now looks like this:

{{nam:“paul smith”, street:“street”, phone:“209348293-234234”}, {nam:“josephine smith”, street:" next street “, phone:” 2342342342-23423"}, {nam:“steven smith”, street:" some boulevard", phone:" 23423423-23423"}, {nam:“george smith”, street:" somewhere", phone:" 123123-23234"}}

Hope that helps?

Regards,

Dominik

Note: Just recognized, that my example maybe was misleading? Of course it is also possible to find whole name matches (instead of Surname matches) using grep …

Ok I get it now, :smiley: I’m being stupid:P

Thanks I’ll let you know , going to a meeting now…

EM

Ok I get the following error:

I believe that the problem is that the text is written as a applescript list.

I tried to open the test text file with TextEdit and got a lot of text that looks like this:

"list "

damn I can’t post the text correctly. Tried 3 times now

If I read the file as list I get a proper applescript list.

Any ideas

The error is grep not finding any matches. If you want to write something coherent to the text file so that it’s in the same format that you started with, you need to turn your list into text somehow before writing it to the file. Here’s my way of doing it:

set stuff to {{1, "a", 2}, {3, "b", 4}, {5, "c", 6}}
set res to ""
set AppleScript's text item delimiters to ", "
repeat with i from 1 to count stuff
	set res to res & (item i of stuff as text)
	set res to res & return
end repeat
end
set AppleScript's text item delimiters to ""
return res

If grep has a problem with this, try replacing “return” with “\n”. I’ve had troubles with that sometimes.

OK … Unless I’m missing something I will have to convert the lists to a standard text file first. Then use grep to extract the information. But that will then leave me with 2 sets of files won’t it?

Thanks for the suggestion. I’m not sure how well this will work though. I have over 5000 files to process and applescript is taking it’s time chomping through just 1. The reason for this is because my test file has over 500 000 entries. And apparently some of the files have even more. :frowning: MMM… looks like I’ll need to distribute the process over 2 or 3 machines and save the processed files in a central folder. Then dedicate another to do the grep operation.

This would appear to be a much larger undertaking than originally anticipated.

Thanks for your input.

EM

[edit] The files are supplied to us by a third party. Apparently they use a custom set of applescripts to generate the logs, and they prefer this method because it works for them…

By the way , I’m just trying to help the guys who actually need to work with these files since I know a little applescript.

OK, here’s a slightly faster version. I don’t know if the script object trick is going to help much with res, but it can’t hurt, right?

script s
	property res : {}
	property stuff : {}
end script

set s's stuff to {{1, "a", 2}, {3, "b", 4}, {5, "c", 6}}
set s's res to {}
set AppleScript's text item delimiters to ", "
repeat with i from 1 to count s's stuff
	set end of s's res to item i of s's stuff as text
	set end of s's res to return
end repeat
end
set AppleScript's text item delimiters to ""
return s's res

If you want it really fast though, you’ll have to switch to another language. The last time I had to do something that read files, I wrote a java tool to do it and then imported it into AS using do shell script. The java file was stored in the AS app’s package.