Delimiter rant

Hi MS’s,

I’m sorry to bother you with this, becuase i know that you have helped lots of other people with (almost) the same problem i have right now. I always that i had progressed far enough not to run into these basic AS problem anymore… Apperently not. So, will you please help me with my problem?

So, i wrote a script that currently does all kinds of interesting thing but all the coding hangs around this simple thing:

I need to extract a specific sentence from a text file. The text file will always follow the same ‘rule’:

the text file currently contains about 50 ‘entries’ which should be put in a variable by my script. The problem is that whatever i do, it will always extract the second sentence from my file

the coding to read out the file is already done, after that a subroutine is called with a parameter var1, var2 and on.


on dd_(x) -- where x would be the parameter
log "Input: " & x
set AppleScript's text item delimiters to (x & " : " as text)
set TXTLT to TXTLT as text -- Where TXTLT would be the content of the file.. in text
set step1 to text item 2 of TXTLT
log "Output: " & step1
return step1
end

so when i call dd_()

dd_("Var3")

it should reply “Var3 : It’s over 9000!” instead, no matter what parameter i enter i get “Var2 : And it was 11 o’clock, which was time for a little something”.

Does anyone know what i am doing wrong? everything i do seems to return var2 as a result…

I’m confused >.<

Model: Macbook Pro9,1
AppleScript: 10.8
Browser: Safari 537.36
Operating System: Mac OS X (10.8)

Hi,

the error occurs, if TXTLT is a list of paragraphs.
The line


set TXTLT to TXTLT as text

puts the text item delimiter between each paragraph and you’ll get always the second line

Hello.

If you have ever seen some code here, that goes like this:

set myText to return & every paragraph of it as text

Then that was to work around the problem that StefanK described above.

Hello again Xpresso :slight_smile:

I think the easiest/safest way is setting the text item delimiters to the key including the return and the semicolon.


set theText to "Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit" & return & "Var2 : And it was 11 o'clock, which was time for a little something" & return & "Var3 : It's over 9000!"

dd(theText, "var1")

on dd(s, k)
	set s to return & s
	set {oldTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return & k & " : "}
	set x to every text item of s
	set AppleScript's text item delimiters to oldTID
	return first paragraph of item 2 of x
end dd

Hello

Nifty little handler. But it has a rather undescriptive name. Why not call it first_match? :smiley: (I’m pondering putting in an extra parameter for the paragraph separator.)

By the way, here is the Greatest Post about UTI’s Ever!

Browser: Safari 534.57.2
Operating System: Mac OS X (10.6)

It’s really easy to get turned around when using TIDs. :slight_smile:

I use them all the time for a number of things - even though they’re a bit awkward - they’re very, very fast.

However for parsing text I nearly always prefer regular expressions, as they are much more versatile. I’ve had some sort of regex osax since at least MacOS 8.1, and for a decade or so I’ve been using the Satimage.osax.

The inputs it can handle are: text, list, and file.

I have handlers for find, find/replace, find w/capture, and find-boolean. Appended is my simple find handler.

It is set to find only the first occurrence of a line starting with “Var3 :” with case-sensitive on and string-result on.


set _text to "Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit" & return & "Var2 : And it was 11 o'clock, which was time for a little something" & return & "Var3 : It's over 9000!"

set _match to fnd("^Var3 : .+", _text, true, false, true) of me

-------------------------------------------------------------------------------------------
on fnd(_find, _data, _case, _all, strRslt) # Last 3 are all bool
	try
		find text _find in _data case sensitive _case all occurrences _all string result strRslt with regexp
	on error
		return false
	end try
end fnd
-------------------------------------------------------------------------------------------

Hi,

Using what everybody wrote, here’s another version using ‘offset’.


set theText to "Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit
Var2 : And it was 11 o'clock, which was time for a little something
Var3 : It's over 9000!
Var4 : Some variable text."

{dd_(theText, "Var1"), dd_(theText, "Var2"), dd_(theText, "Var3"), dd_(theText, "Var4")}

on dd_(t, the_var)
	considering case
		set var_offset to offset of the_var in t
	end considering
	set temp_text to (text var_offset thru -1 of t)
	set the_paragraph to first paragraph of temp_text
	return the_paragraph
end dd_

Needs error checking if the “variable” is not found.

Editted: First this site went down. Then, my internet went down so couldn’t post earlier. Added error checking and DJ Bazzie Wazzie’s linefeed in the search string and text:


set theText to "Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit
Var2 : And it was 11 o'clock, which was time for a little something
Var3 : It's over 9000!
Var4 : Some Var5 text.
Var5 : More variables."
--
{dd_(theText, "Var1"), dd_(theText, "Var2"), dd_(theText, "Var3"), dd_(theText, "Var4"), dd_(theText, "Var5")}
--
on dd_(t, the_var)
	considering case
		set var_offset to offset of (linefeed & the_var) in (linefeed & t)
	end considering
	if var_offset is 0 then
		set the_paragraph to ""
	else
		set temp_text to (text var_offset thru -1 of t)
		set the_paragraph to first paragraph of temp_text
	end if
	return the_paragraph
end dd_

Don’t know why I needed linefeeds instead of returns.

gl,
kel

In this case you don’t want to go around text item delimiters because they don’t need Apple Events and using text item delimiters is in this case the fastest way to go. Not that I don’t like your code but I prefer code that can run on an out of the box mac.

When you hit the return key in a text view, such as in a script editor, a linefeed gets inserted. When you compile a script, the text gets sent to AppleScript, which then returns a styled string that uses returns between its lines, and that gets put in the text view. But linefeeds in quoted strings are untouched in compiling.

This can be a real gotcha in Applescript. You’ll think WTH? I know this code works. Have I lost my mind? 'Cause it ain’t working now…

You can get mixed CR/LF EOLs in a text variable, and you can get mixed EOLs in a do shell script. Both of these have had me pulling my hair out on a few occasions.

A useful feature of Script Debugger is to show invisibles, but you can save as .applescript and open the file in a hex-editor or Editra to visualize them.

A lot of people feel that way, but I like having some nitrous in the tank. It helps me get a lot of work done expeditiously.

Hi Shane,

That’s why. I think I copied Xpresso’s 3 lines of text. I wonder if AppleScript knows that all the line endings should be the same. I added two lines. I know the last line was a linefeed (shift return), but not sure about the 4th line.

Thanks,

… Yep, this looks like a delimiter rant!

////////

Yes, that is what i expected, but i thought that

would fix that, In fact that is what happened. I moved that line from the dd handler to the handler that actually reads out the file. Now it works exactly the way i meant it to work. Beautifully, but i do not understand why… lol

////////

I tried to use that line instead of txtlt as text, but it returned this:

////////

Hello again to you as well!

I did consider to pass the variable to the handler that way. But since i want to use dd very often i want to keep it as short as possible, that’s why i went for “global TXTLT”
In your code you write this:

set {oldTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return & k & " : "}

I partially see what you are doing there, but do not understand why you set tid to tid?

////////

Your code is probably much more effective than vanilla AS, the problem is, if i were to change mac’s again (or install it on other mac’s) it wouldn’t work anymore. That’s why i dont like to use myriad helpers, yes it’s easy, but i do so little with it i completely forgot i had installed it on my previous mac… I must agree with Dj BW.

////////


set theText to "Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit
Var2 : And it was 11 o'clock, which was time for a little something
Var3 : It's over 9000!
Var4 : Some Var5 text.
Var5 : More variables."
--
{dd_(theText, "Var1"), dd_(theText, "Var2"), dd_(theText, "Var3"), dd_(theText, "Var4"), dd_(theText, "Var5")}
--
on dd_(t, the_var)
   considering case
       set var_offset to offset of (linefeed & the_var) in (linefeed & t)
   end considering
   if var_offset is 0 then
       set the_paragraph to ""
   else
       set temp_text to (text var_offset thru -1 of t)
       set the_paragraph to first paragraph of temp_text
   end if
   return the_paragraph
end dd_

That’s an interesting way of doing this. I like it. Tough, how fast is it? My actual script will have to look thru a few 100’s of sentences.

////////

Meanwhile the handler still works, i added a few lines so it would remove "var2 : " from the result, check if the variable even excists before trying to delimit it, it will now also display it in an dialog:


on dd_(x)
if x is in TXTLT then
		log "Input: " & x
		set AppleScript's text item delimiters to (x & " : " as text)
		set step1 to text item 2 of TXTLT
		set AppleScript's text item delimiters to return
		set step2 to text item 1 of step1
		log "Output: " & step2
		display dialog step2
	else
		display dialog "This is not the text you are looking for." buttons {"Ok"} with title "TxTL Error"
	end if
end if

Thank you all for replying to me so quickly!

Hi Xpresso,

I was just playing around with this because haven’t used offset in a while. I think Apple fixed it. I think it only used to search for one character (not sure). Anyway, you asked for timing? :smiley: This searches 7000 times over 101 lines I think:


set theText to "Var0 : Added this with a return.
Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit
Var2 : And it was 11 o'clock, which was time for a little something
Var3 : It's over 9000!
Var4 : Some Var5 text.
Var5 : More variables."
repeat with i from 6 to 100
	set theText to theText & (linefeed & "Var" & i & " : abc123.")
end repeat

run script (do shell script "python -c 'import time; print time.time()'") --dummy
set t1 to run script (do shell script "python -c 'import time; print time.time()'")
set t2 to run script (do shell script "python -c 'import time; print time.time()'")
set time_calib to t2 - t1
set t1 to run script (do shell script "python -c 'import time; print time.time()'")
--
repeat 1000 times
	set the_data to {dd_(theText, "Var0"), dd_(theText, "Var1"), dd_(theText, "Var2"), dd_(theText, "Var3"), dd_(theText, "Var4"), dd_(theText, "Var5"), dd_(theText, "Var100")}
end repeat
--
set t2 to run script (do shell script "python -c 'import time; print time.time()'")
set time_diff to t2 - t1 - time_calib
{time_diff, the_data}

on dd_(t, the_var)
	considering case
		set var_offset to offset of (linefeed & the_var) in (linefeed & t)
	end considering
	if var_offset is 0 then
		set the_paragraph to ""
	else
		set temp_text to (text var_offset thru -1 of t)
		set the_paragraph to first paragraph of temp_text
	end if
	return the_paragraph
end dd_

→ {0.660000085831, {“Var0 : Added this with a return.”, “Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit”, “Var2 : And it was 11 o’clock, which was time for a little something”, “Var3 : It’s over 9000!”, “Var4 : Some Var5 text.”, “Var5 : More variables.”, “Var100 : abc123.”}}

Still under a second. I think DJ Bazzie Wazzie is right though that tids is faster.

gl,
kel

Hi kel,

awesome timing script! i gotta remember that one!

anyway, i replaced the offset way with the tid way, tid is about 30% faster on my second try.

Wow, log slows everything down by a factor of ~50… 24 seconds.
Anyway, it seems it’s fast enough. if i were to experience any slowdown’s it wont be because of my or your way :slight_smile:
It’s times like these that make you realise how fast and awesome AS really is!

Yeah, I think it’s the Script Editor doing that thing on the first run. But when I ‘run script’ the script, the timing goes way down on the first run:


set f to choose file
run script f

→ {0.519999980927, {“Var0 : Added this with a return.”, “Var1 : Lorem ipsum dolor sit amet, consectetur adipisicing elit”, “Var2 : And it was 11 o’clock, which was time for a little something”, “Var3 : It’s over 9000!”, “Var4 : Some Var5 text.”, “Var5 : More variables.”, “Var100 : abc123.”}}

Something is slowing down things on the first run.

Hello.

The first time you run it, things are loaded into viritual memory, and then into a framebuffer, or similar on a 64-bit machine, this slows things down during the first run.

So, this could be fixed this way?:

repeat 2 times
-- Insert script
end repeat

It’s because of the cost of an Apple Event, like Chris/ccstone his code. Offset and like chris satimage command needs to send an Apple Event to the Apple Event Manager and back into the process. Like some of you know from older machines (Mac OS era) Apple Events are expensive and, like Shane mentioned this week, was around 60 per minute. Today it’s normal to do a thousand events per second, but still they’re overhead considered to an non-Apple-Event solution. Therefore using text item delimiters is the way to go.

@Xpresso: I understand that the code needs to be efficient as possible. But did you actually time my code? It can do 20,000 dd’s each second on my old i7 MBP. Like I said, Apple Events are your enemy, so avoid them.

Ahh, that’s why. So if you ‘run script’ it’s dynamic and you don’t need to load those things in the buffer.

Editted: and Dj Bazzie Wazzie tids is vanilla. I got it now.

Thanks,