Simple List Reduction?

I have a string of unseparated characters: “abcdefg” and I’d like to test it for an embedded pairing, say “cd”, and then, if I find “cd” remove it to form the list to leave “abefg”.

One way to go is to find the offset of “cd” if it occurs and then rebuild the string as shown below.

I know this fails if the getit character (or group) isn’t found, if the getit character (or group) starts at the beginning or includes the end of the original string, and I could fix that with some tests, but it seems that the whole shebang involves too much testing for what seems like a simple task. Is there a better way to approach this?

set myString to "abcdef"
set len to length of myString
set getit to "cd"
set getnum to length of getit
set n to offset of getit in myString
set newString to ((characters 1 thru (n - 1) of myString) & (characters (n + getnum) through len of myString)) as string

I think this does it:

set myString to "abcdef"
set targetString to "cd"
set newString to do shell script "echo " & myString & "| sed 's/" & targetString & "//'"

  • Dan
    –As I sed, the stream editor is invaluable.
set myString to "cdabcdefcdghcdijklmcd"
set getit to "cd"
set AppleScript's text item delimiters to getit
set keptChars to text items of myString
set AppleScript's text item delimiters to ""
set newString to keptChars as string

Thank you both for these different views. In this instance, at least, the “sed” version is slightly faster, but the text item delimiters method is more transparent.

That comparison looked a bit curious to me, since external calls (to an application, scripting addition or the shell) usually involve a hit of some kind - whereas text item delimiters are native to AppleScript. So I carried out a number of tests here, based on slightly simplified versions of each suggestion.

The results, though varying slightly from one run to another, consistently showed the TID method well ahead - at least 140 times faster than the shell version. (Of course, if huge files were being processed, the results might well be different again.) While in practical terms, the impact of these differences may be relatively small, I thought it might be worth mentioning…

One other difference possibly worth pointing out: I’m not sure whether you wanted to remove all cases of the search string or just the first one. If it matters, you may like to note that the methods (as posted) don’t produce quite the same results. :slight_smile:

NovaScotian, if you need to remove all cases instead of just the first one, then add a “g” to the command dant posted.

set newString to do shell script "echo " & myString & "| sed 's/" & targetString & "//g'"

As soon as I saw dkmarsh’s post I knew that was a better solution, and that’s what I would have suggested had I thought of it. However, I believe mine has a significant advantage in that I could use the “As I sed” pun whereas no such advantage occurs with the TID solution. :stuck_out_tongue:

  • Dan

No, I don’t want to remove all cases, I want to remove them one at a time because I have to account for them. I must confess that I used the sed version because I wanted to learn more about sed than because the AppleScript text item delimiters method had anything wrong with it.

I had tried to get to dkmarsh’s solution but got the syntax slightly wrong so I couldn’t get my version to work properly. I was missing this clean phrase in dkm’s script (chasing characters or words instead of “text items”): “set keptChars to text items of myString

An auxiliary question: How do you get accurate timing for a script, Kai?

There isa n OSAX to get milliseconds since system startup. See http://osaxen.com/files/getmillisec1.0.1.html

Best wishes

John M

Very good point, Dan. While I wantid to come up with an equally witty pun, the challenge proved impossible… :wink:

Fair enough, NovaScotian - in which case the TID version of Dan’s suggestion might look something like:

to cutFirstCase of s from t
	set d to text item delimiters
	set text item delimiters to s
	tell t to if (count text items) > 1 then set t to text item 1 & text from text item 2 to -1
	set text item delimiters to d
	t
end cutFirstCase

cutFirstCase of "cd" from "cdabcdefcdghcdijklmcd"
--> "abcdefcdghcdijklmcd"

In case they might help, here are a couple of additional variations:

to countAndCutCases of s from t
	set d to text item delimiters
	set text item delimiters to s
	set t to t's text items
	set c to (count t) - 1
	set text item delimiters to ""
	set t to t as string
	set text item delimiters to d
	{c, t}
end countAndCutCases

countAndCutCases of "cd" from "cdabcdefcdghcdijklmcd"
--> {5, "abefghijklm"}
on indexList of s at t
	set l to {}
	set d to text item delimiters
	set text item delimiters to s
	repeat with n from 1 to (count t's text items) - 1
		set l's end to (count t's text 1 thru text item n) + 1
	end repeat
	set text item delimiters to d
	l
end indexList

indexList of "cd" at "cdabcdefcdghcdijklmcd"
--> {1, 5, 9, 13, 20}

Funnily enough, I considered including a brief description of the test script used - but wondered if that might perhaps be introducing too much noise. However, since you ask…

Some folks apparently use ‘current date’ to time scripts - although, since that gives results to the nearest second, it’s really a bit of a blunt instrument. For greater accuracy and convenience, consider using something more precise, such as Jon’s commands[1], GetMilliSec[2], Precision Timing Osax[3] or Smile’s ‘chrono’ [4].

[1] http://osaxen.com/files/jonscommands2.1.2.html
[2] http://osaxen.com/files/getmillisec1.0.1.html
[3] http://osaxen.com/files/precisiontiming1.0.html
[4] http://www.satimage.fr/software/en/index.html

To compare the performance of very fast routines, I usually place them inside a loop that repeats several (hundred/thousand) times (enough to achieve a clear and consistent difference). Since a run can sometimes throw up spurious results, I also run each test a number of times to establish a distinct pattern.

In addition, a certain amount of latency can occur during testing. This may, for example, extend the timing slightly on the first script run. One way around this is to reverse the order in which the scripts are run, and then to average out all the results.

To help focus on the essential differences between routines, it’s also a good idea to place any statements common to both (such as those initialising shared variables) outside the timed loops.

Having said all that, it may not be worth getting too hung up about minor timing differences - especially where a routine may be used only once within a script. (On the other hand, a script that iterates through hundreds or thousands of repeated operations may well benefit from some optimisation.) Remember, too, that performance can vary substantially from one machine to another - so it’s prudent to use caution when quoting any comparisons.

One final word of caution: Some third party scripting additions can enable certain operations and coercions that are not possible on a ‘vanilla’ system. To avoid confusion, it’s not a bad idea, after testing, to disable those that you don’t use regularly.

Anyway - here’s an example, based on earlier suggestions in this thread, using ‘the ticks’ from Jon’s commands:

set n to 100 (* number of repeats *)

set t to "cdabcdefcdghcdijklmcd"
set s to "cd"

set t1 to the ticks
repeat n times
	
	set text item delimiters to s
	set r to t's text items
	set text item delimiters to ""
	r as string
	
end repeat
set t2 to the ticks
repeat n times
	
	do shell script "echo " & t & "| sed 's/" & s & "//g'"
	
end repeat
set t3 to the ticks
{tids:t2 - t1, shell:t3 - t2}

My apologies for this rather lengthy reply… :slight_smile:

On the contrary; thanks for the lengthy reply. An example of one remaining problem is shown below. For a whole variety of combinations and functions to be repeated, Jon’s ticks and GetMilliSec give entirely different answers that do not seem to be factors of one another. Interesting to use ticks as the “load” on GetMilliSec and vice versa and then swap. The answers are about the same either way; tick: 14-15, gms: 239 - 240. Beats me what’s being measured here.

set n to 5000
set t1 to the ticks
repeat n times
	factorial(20)
end repeat
set t2 to the ticks

set g1 to GetMilliSec
repeat n times
	factorial(20)
end repeat
set g2 to GetMilliSec

{tick:t2 - t1, gms:g2 - g1}

on factorial(n)
	if n > 0 then
		return n * (factorial(n - 1))
	else
		return 1
	end if
end factorial

-- {tick:43, gms:690.0}

A second is comprised of 60 ticks or (more obviously) 1000 milliseconds. Given some variation in the results, they convert reasonably well:

on MsToTicks(n)
	n * 0.06 div 1
end MsToTicks

on ticksToMs(n)
	n div 0.06
end ticksToMs

set r to {tick:43, gms:690.0}
{ticksToMs:ticksToMs(r's tick), MsToTicks:MsToTicks(r's gms)}
--> {ticksToMs:716, MsToTicks:41}