How do you find the 2nd recurrence of a text?

Hi everybody.

I am new on this forum.

I need to find the 2nd (or the n…th) recurrence of a string inside a text.

I know I can use the offset command, but this only returns me the first recurrence:

set position to offset of “=filename.pls” in mytext

How can I make it return the 2nd recurrence of “=filename.pls”?

Any help would be much appreciated.

Thanks.

Hi, Mikii.

Here’s a handler that returns the offset of the nth instance of a substring in a string. If an nth instance doesn’t exist, it returns 0. AppleScript’s text item delimiters can be used to find anywhere in the text that the substring occurs.

on nthOffset(myText, subString, n)
  set astid to AppleScript's text item delimiters
  set AppleScript's text item delimiters to subString
  -- There will be one more text item than there are instances of the substring in the string, so:
  if (count myText's text items) > n then
    -- There are at least n instances of the substring in the text.
    -- The first character of the nth instance comes immediately after the last
    -- character of the nth text item.
    set o to (count text from text item 1 to text item n of myText) + 1
  else
    -- There isn't an nth instance of the substring in this text.
    set o to 0
  end if
  set AppleScript's text item delimiters to astid
  return o
end nthOffset

-- Demo:
set myText to "Here is an occurence of =filename.pls in this text. And here is another occurrence: =filename.pls."
set pos to nthOffset(myText, "=filename.pls", 2)

THank you very much Nigel. I was hoping for a faster solution (I need to process almost 10 mbyes of data), but your suggestion is very elegant and works fine.

ciao,

Michele

Hi, Michele.

With an 11.5MB text file on my fairly modest 400 MHz G3 machine, the action’s virtually instantaneous. Even more instantaneous is this version:

on nthOffset2(myText, subString, n)
  set astid to AppleScript's text item delimiters
  set AppleScript's text item delimiters to subString
  try
    -- There will be one more text item than there are instances of the substring in the string.
    -- The first character of the nth instance comes immediately after the last character of
    -- the nth text item. This line will error if there are less than n text items.
    set o to (count text from text item 1 to text item n of myText) + 1
    -- This line will error if there are only n text items – ie. (n - 1) instances.
    get character o of myText
  on error
    -- There isn't an nth instance of the substring in this text.
    set o to 0
  end try
  set AppleScript's text item delimiters to astid
  return o
end nthOffset2

However, perhaps you mean to work through every instance of the substring in the text. In that case, it might be better to get all the offsets at once:

on allOffsets(str, subStr)
  
  script o
    property offsets : {0}
  end script
  
  considering case
    if str contains subStr then
      set astid to AppleScript's text item delimiters
      set AppleScript's text item delimiters to subStr
      -- Get all the text items except the last one, which isn't needed.
      set textItems to str's text items 1 thru -2
      set AppleScript's text item delimiters to astid
      
      -- Each text item in the list will be replaced by the offset of the substring that follows it.
      -- This works even when the text begins with the substring.
      
      -- For speed, use a reference to the text item list.
      set o's offsets to textItems
      set subStrLen to (count subStr)
      -- Initialise an offset marker, to which will be added the length of the substring
      -- and the length of the current text item.
      set thisOffset to 1 - subStrLen
      repeat with i from 1 to (count textItems)
        set thisOffset to thisOffset + subStrLen + (count item i of o's offsets)
        set item i of o's offsets to thisOffset
      end repeat
    end if
  end considering
  
  return o's offsets
end allOffsets


-- Demo:
local myText

set myText to read file ((path to desktop as string) & "11.5MB text file")
set allPositions to allOffsets(myText, "=filename.pls")

This has only been tested as written above. Like any code that uses text item delimiters, you’ll get a stack error if there are more than about 4000 instances of the substring in the text, in which case you’ll need a more complex handler.

Nigel’s solution seemed lightning fast on my machine.

OK, Mr. Pre-Panther. :wink:

If you have system 10.3 or later, the string “chunking” limit has been fixed. Hm… I’ve never known what to call this bug. “Chunking” isn’t quite right. How about the “string element extraction into a list as the result of one statement” bug? As a matter of fact, that would make for a really cool scripting addition command:


    set myList to string element extraction into a list as the result of one statement using myString as taught in school.

Thanks for the information, Admiral. I’d also like to apologise for my atrocious English in that quote. :oops:

Yuk! All those name-space conflicts! :wink: