Please forgive me if this is too easy for you, but I can’t find a solution for that problem.
With this part of a script I open a file containing Tab separated information as the result of a database export.
set vmnum to "6221393"
set adbaselist to choose file with prompt "Please select file"
set adbaselist to adbaselist as text
set AppleScript's text item delimiters to ""
set every_row to paragraphs of (read file adbaselist)
This file contains about 100’000 paragraphs (records)
Now I would like to get the entire record containing a specific information. Is this possible with AppleScript?
Too much records for AS, I think. I’d better use “grep”:
set searchTerm to "blah"
set inputFile to alias "path:to:file.txt"
set matchingRecords to (do shell script "grep -e " & quoted form of searchTerm & space & quoted form of POSIX path of inputFile)
It could be done in a different fashion in plain AS, though (and also very quick) using offsets, but I’d be so lazy to do it while it can be done with “grep” :rolleyes:
Also, if you want to do with offset, then something like this:
set vmnum to “6221393”
set adbaselist to choose file with prompt “Please select file”
set db_text to (read adbaselist)
set the_offset to (offset of vmnum in db_text)
set the_record to (read adbaselist from the_offset before (ASCII character 10) using delimiter tab)
– ascii character 10 is linefeed
– or use return (ascii character 13)
– I wrote my test text file with TextEdit
622139 needs to be a unique field value. If not, then you can do other things depending on how the record fields are ordered etc. Here Iassume that the vmnum is in the first field of the record.
Thank you for the “offset” solution. Works fine too and it doesn’t matter how the source text file is formatted. Working with the grep-version requires a source file with UNIX linebreaks.
On very large files, offset may be relatively slow - while grep, though speedy, will always carry the overhead of making an external call to the shell. Another option, which should be faster and equally effective, is to use AppleScript’s text item delimiters - perhaps something like this (to return the first paragraph containing a given search string):
to searchText of currText for searchString
set tid to text item delimiters
set text item delimiters to searchString
considering case
tell currText to if (count text items) is 1 then
set searchResult to "ERROR: no match found for \"" & ({""} & "\".")
else
set searchResult to text item 1's paragraph -1 & ({""} & text item 2's paragraph 1)
end if
end considering
set text item delimiters to tid
searchResult
end searchText
set vmnum to "6221393"
searchText of (read (choose file)) for vmnum
to searchText of currText for searchString
set tid to text item delimiters
set text item delimiters to searchString
considering case
tell currText to if (count text items) is 1 then
set searchResult to “ERROR: no match found for "” & ({“”} & “".”)
else
set searchResult to text item 1’s paragraph -1 & ({“”} & text item 2’s paragraph 1)
end if
end considering
set text item delimiters to tid
searchResult
end searchText
Baschi, note that you need to error check your script. One issue that might arise is there might be numbers “6221393” and “62213934”. Stuff like that might cause problems, but if the ad number has just seven digits, then it should be ok.
EDIT: wait I made a mistake and there’s a double read. If you replace:
(read (choose file))
with the variable t then there’s almost no difference. Disregard.
I forgot to mention that I tested it on a file with 100001 paragraphs (records). Here’s the script to write the file:
set desk_path to (path to desktop) as string
set file_spec to (desk_path & “ads”) as file specification
set ref_num to (open for access file_spec with write permission)
try
repeat with i from 1 to 100000
set t to ((i as string) & tab & “hello” & return)
write t to ref_num
end repeat
set t to (“6221393” & tab & “bye”)
write t to ref_num
close access ref_num
on error
close access ref_num
beep 2
return
end try
Never mind the double read, kel - that ‘choose file’ was a sure-fire way of stopping the tids method in its tracks!
Out of curiosity, I’ve just carried out a few tests myself on some files containing between 10 and 100,000 paragraphs (each consisting of just over 10 items/words).
Offset certainly seems faster than when I last compared it. (I’d swear they’ve tweaked it!) However, using a finer timing method, tids still appear to have a slight edge. As I suggested earlier, the shell overhead puts grep at somewhat of a disadvantage when parsing smaller files - but it’s certainly worth considering for large files, as you’ll see from these results (to which the usual qualifications about timings apply):
file size: all times in milliseconds: paragraphsoffsettidsgrep
10 5.1 2.6 94
100 5.6 3.8 92
1,000 20 11 95
10,000 146 101 111
100,000 1283 906 200
Since performance is only one aspect of such comparisons, it might be worth reiterating one or two other points to consider generally. Baschi has already observed that grep requires a file to be LF (ASCII character 10) delimited and, as you pointed out, the offset method may need to be adjusted according to a file’s paragraph delimiters. The tid-based approach should handle UNIX, DOS or Mac line-breaks without modification.
It also appears that, to return a complete record/paragraph, the offset suggestion relies on the record starting with the search string. (No biggie in this case, since it apparently works for Baschi - and that’s the main thing!)
Good point, kel.
I’m not aware of the precise structure of the file in question, but one way around this might be to “top’n’tail” the search string with the appropriate separators (“\r”, “\n”, “\t”, space, whatever). That way, part of a longer string shouldn’t be confused with a shorter one.
I’ve used this as the basis for a MIF-file parsing algorithm and it works great; my problem (as posted elsewhere) is that I now need to convert it to a droplet, which will allow for multiple files to be processed unattended.
When I add a simple Droplet shell dropped two files onto it:
on open filelist
repeat with CurrentDocument in filelist
set delimiterString to "<ParaLine"
searchText of CurrentDocument for delimiterString
end repeat
end open
to searchText of currText for searchString
set tid to text item delimiters
set text item delimiters to searchString
considering case
tell currText to if (count text items) is 1 then
set searchResult to "ERROR: no match found for \"" & ({""} & "\".")
else
set searchResult to text item 2's paragraph 3 & ({""} & text item 2's paragraph 1)
end if
end considering
set text item delimiters to tid
searchResult
end searchText
I get an error that “item 1 of (alias “Macintosh HD:filename1.mif”, alias “Macintosh HD:filename .mif”) doesn’t understand the count message.”
Any chance it’s a simple fix to get the droplet working?