delimiter fun!

patrick99e99 · October 25, 2005, 6:12pm

Hi everyone,

I have a text file which I want to convert to a list and delimit by (ascii character 10 not followed by ascii character 32)… In other words, the text file has LF characters followed by spaces, and then LF characters not followed by spaces. The LF without the space is where I want the list items to be broken up.

The text file is rather long, so I am wondering if there is an easy way to do this without scanning each and every character of the file as I am doing here:


set thefile to (path to desktop as string) & "addressbook"
set fileid to open for access file thefile
set contentlist to read fileid
close access fileid

set thechars to characters of contentlist

set thelist to {}
set x to 0
set y to 1
repeat with currentchar in thechars
	set x to x + 1
	try
		if item x of thechars = (ASCII character (10)) and item (x + 1) of thechars â‰  (ASCII character (32)) then
			set AppleScript's text item delimiters to ""
			set end of thelist to ((items y thru (x - 1) of thechars) as string)
			set y to x + 1
		end if
	end try
end repeat

return thelist

as you can imagine… this script takes forever to run…

thank you for your time.

julifos · October 25, 2005, 7:55pm

Depending on your target, I just would replace any coincidence of LF+space with a dummy character (eg, ASCII 255), then get “paragraphs of (read file x)”, then replace again dummy character with LF+space when needed. There are various fast search/replace handlers in the Code Exchange forum…

hhas · October 25, 2005, 9:26pm

Should be good enough:

property _LF : ASCII character 10

on parse(txt)
	script k -- list access speed kludge
		property lst : rest of txt's paragraphs
		property res : {txt's paragraph 1}
	end script
	repeat with paraRef in k's lst
		if paraRef starts with space then
			set last item of k's res to last item of k's res & _LF & paraRef
		else
			set end of k's res to paraRef's contents
		end if
	end repeat
	return k's res
end parse

-- TEST
set txt to "foo
    bar
    zib
fub
bling
    dob

nib"
parse(txt) --> --> {"foo\n    bar\n    zib", "fub", "bling\n    dob", "", "nib"}