handle large textfile

Hello,

i’ve got a tab-delimited textfile with about 5000 lines and want to handle it as shown below to achieve a InDesign-tagged-file. well, it’s very slowly …
is there a faster way to do it by a shell-script-combination¿ (… or by good AppleScript :wink: )


set tid to AppleScript's text item delimiters
set mytext to "butcher	name1	name2	name3	strasse	ort
butcher	name1	name2	name3	strasse	ort
butcher	name1	name2	name3	strasse	ort
butcher	name1	name2	name3	strasse	ort
Gärtner	name1	name2	name3	strasse	ort
Gärtner	name1	name2	name3	strasse	ort
Gärtner	name1	name2	name3	strasse	ort
Gärtner	name1	name2	name3	strasse	ort
maurer	name1	name2	name3	strasse	ort
maurer	name1	name2	name3	strasse	ort
maurer	name1	name2	name3	strasse	ort"

set AppleScript's text item delimiters to "\n"
set mylist to every text item of mytext
set branche to "<pstyle:branche>"
set firm to "<pstyle:rest>"

set newlist to {}
repeat with i from 1 to count of mylist
	set myrow to item i of mylist
	set AppleScript's text item delimiters to "\t" --tabulator
	set rowlist to every text item of myrow
	set AppleScript's text item delimiters to " "
	set myname to firm & (items 2 thru 4 of rowlist) as text
	set myeditedrowlist to {}
	copy branche & item 1 of rowlist to end of myeditedrowlist
	copy myname to end of myeditedrowlist
	copy firm & item 5 of rowlist to end of myeditedrowlist
	copy firm & item 6 of rowlist & return & return to end of myeditedrowlist
	copy myeditedrowlist to end of newlist
end repeat

set finallist to {}
repeat with j from 1 to count of newlist
	set mybranche to item j of newlist
	try
		set myreferrer to item (j - 1) of newlist
		if item 1 of myreferrer contains branche then set branchereferrer to item 1 of myreferrer
		if item 1 of mybranche is branchereferrer then set mybranche to suppress item 1 from mybranche
	end try
	copy mybranche to end of finallist
end repeat
set AppleScript's text item delimiters to return

set thestring to "<ANSI-MAC>" & return & "<vsn:6>" & return & finallist as string
set AppleScript's text item delimiters to tid
(* result
<ANSI-MAC>
<vsn:6>
<pstyle:branche>butcher
<pstyle:rest>name1 name2 name3
<pstyle:rest>strasse
<pstyle:rest>ort


<pstyle:rest>name1 name2 name3
<pstyle:rest>strasse
<pstyle:rest>ort


<pstyle:rest>name1 name2 name3
<pstyle:rest>strasse
<pstyle:rest>ort


<pstyle:rest>name1 name2 name3
<pstyle:rest>strasse
<pstyle:rest>ort


<pstyle:branche>Gärtner
<pstyle:rest>name1 name2 name3
<pstyle:rest>strasse
<pstyle:rest>ort


<pstyle:rest>name1 name2 name3
<pstyle:rest>strasse
<pstyle:rest>ort
 and so on ...
*)

A 5000 line file parsed in .07 sec.

Call from AppleScript like this:


set converted_text to do shell script "ruby /path/to/ruby_file.rb /path/to/tab_file.txt"

Save this with a .rb extension.

I was going to do this with Ruby as well, but you beat me to it. :cool:

It has truly become my favorite language. It is a joy to program with.

Hello Craig,

wow, this is really rubyfast :wink: and it’ll give me a pretty good start on my monday morning.

thx a lot

Hans-Gerd Classen

You’re welcome.