Here’s a handler that returns the offset of the nth instance of a substring in a string. If an nth instance doesn’t exist, it returns 0. AppleScript’s text item delimiters can be used to find anywhere in the text that the substring occurs.
on nthOffset(myText, subString, n)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to subString
-- There will be one more text item than there are instances of the substring in the string, so:
if (count myText's text items) > n then
-- There are at least n instances of the substring in the text.
-- The first character of the nth instance comes immediately after the last
-- character of the nth text item.
set o to (count text from text item 1 to text item n of myText) + 1
else
-- There isn't an nth instance of the substring in this text.
set o to 0
end if
set AppleScript's text item delimiters to astid
return o
end nthOffset
-- Demo:
set myText to "Here is an occurence of =filename.pls in this text. And here is another occurrence: =filename.pls."
set pos to nthOffset(myText, "=filename.pls", 2)
THank you very much Nigel. I was hoping for a faster solution (I need to process almost 10 mbyes of data), but your suggestion is very elegant and works fine.
With an 11.5MB text file on my fairly modest 400 MHz G3 machine, the action’s virtually instantaneous. Even more instantaneous is this version:
on nthOffset2(myText, subString, n)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to subString
try
-- There will be one more text item than there are instances of the substring in the string.
-- The first character of the nth instance comes immediately after the last character of
-- the nth text item. This line will error if there are less than n text items.
set o to (count text from text item 1 to text item n of myText) + 1
-- This line will error if there are only n text items – ie. (n - 1) instances.
get character o of myText
on error
-- There isn't an nth instance of the substring in this text.
set o to 0
end try
set AppleScript's text item delimiters to astid
return o
end nthOffset2
However, perhaps you mean to work through every instance of the substring in the text. In that case, it might be better to get all the offsets at once:
on allOffsets(str, subStr)
script o
property offsets : {0}
end script
considering case
if str contains subStr then
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to subStr
-- Get all the text items except the last one, which isn't needed.
set textItems to str's text items 1 thru -2
set AppleScript's text item delimiters to astid
-- Each text item in the list will be replaced by the offset of the substring that follows it.
-- This works even when the text begins with the substring.
-- For speed, use a reference to the text item list.
set o's offsets to textItems
set subStrLen to (count subStr)
-- Initialise an offset marker, to which will be added the length of the substring
-- and the length of the current text item.
set thisOffset to 1 - subStrLen
repeat with i from 1 to (count textItems)
set thisOffset to thisOffset + subStrLen + (count item i of o's offsets)
set item i of o's offsets to thisOffset
end repeat
end if
end considering
return o's offsets
end allOffsets
-- Demo:
local myText
set myText to read file ((path to desktop as string) & "11.5MB text file")
set allPositions to allOffsets(myText, "=filename.pls")
This has only been tested as written above. Like any code that uses text item delimiters, you’ll get a stack error if there are more than about 4000 instances of the substring in the text, in which case you’ll need a more complex handler.
Nigel’s solution seemed lightning fast on my machine.
OK, Mr. Pre-Panther.
If you have system 10.3 or later, the string “chunking” limit has been fixed. Hm… I’ve never known what to call this bug. “Chunking” isn’t quite right. How about the “string element extraction into a list as the result of one statement” bug? As a matter of fact, that would make for a really cool scripting addition command:
set myList to string element extraction into a list as the result of one statement using myString as taught in school.