Key value coding compliance error

Hi, seems easier to use shell

hi,
seems easier with shell:

set theString to “This is a test line with a Test word”
set thePattern to “(?i)test”
set matchingData to getMatches(theString, thePattern)

on getMatches(str, pattern)
set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern)
set res to (do shell script shellcmd)
set off to offset of “:” in res
return {text 1 thru (off - 1) of res as integer, count of (characters (off + 1) thru -1 of res)}
end getMatches
on qt(str)
return “"” & str & “"”
end qt

Hallenstal. Thanks for responding to my thread. I tested your script but it did not seem to return the expected results, which were {10, 27}. I am not knowledgeable with grep so perhaps I’m doing something wrong. BTW, your script threw an error as written and I had to escape the quote in the qt handler.

set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern) --> {10, 12}

on getMatches(str, pattern)
	set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern)
	set res to (do shell script shellcmd)
	set off to offset of ":" in res
	return {text 1 thru (off - 1) of res as integer, count of (characters (off + 1) thru -1 of res)}
end getMatches

on qt(str)
	return "\"" & str & "\""
end qt

@peavine

Try this:

-- Tested on Monterey 12.6.3
use framework "Foundation"
use scripting additions

set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)

on getMatches(theString, thePattern)
	set theString to current application's NSString's stringWithString:theString
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range") as list
	set theArray to current application's NSArray's arrayWithArray:theRanges
	set theLocations to (theArray's valueForKey:"location")
	return theLocations as list --> {10, 27}
end getMatches

Your script do not work with this string “This is a test line with a Test word other test”

@Hallenstal shell is working. It’s just the parsing method that is wrong.

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}

on getLocations(str, pattern)
	set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
	return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations

@ionah. Many thanks for your suggestions, both of which work great.

It was mentioned above that my original request arose from another thread (see link below), which had to do with timing results in finding 1288 instances of a substring in a string. I reran these tests with @ionah’s two suggestions and the results were as follows (all times are milliseconds):

ionah’s Grep script - 76
robertfern’s AppleScript script - 116
ionah’s ASObjC script - 160
peavine’s ASObjC script - 262

Although not of much (or any) significance, there are a few differences in the results returned by the script suggestions:

  • The substring locations in the grep and ionah’s ASObjC script are zero-based.
  • The grep script returns numbers as text; the other suggestions return integers.

The thread mentioned above can be found here

@peavine

I don’t know what method you’re using to get those results.
Here are the ones I get with Script Geek:

MacPro6.1, macOS Version 12.6.3 (21G419), 100 iterations

First Run Total Time Average
AppleScriptObjC 0.408 0.078 0.001
Shell grep 0.016 1.263 0.013

Ratio (excluding first run): 1:16.27

@peavine
The issue with the type ‘string’ for do shell script is when paragraph is used. It would be easy to convert item 1 of something to integer. Harder to convert paragraphs to integer without other repeat loop. Or AppleScript delimiter hack… I find the AppleScriptObjC version to be more simple and its far more easier to convert any type to anything integer, float or string.

1 Like

@ionah. I don’t use Script Geek in this particular instance because of the difficulty in getting a very large string without including the time it takes to get that string in the total timing results. My grep timing script:

use framework "Foundation"
use scripting additions

-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
	set theString to theString & theString
end repeat

-- start time
set startTime to current application's CACurrentMediaTime()

-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getLocations(theString, thePattern)
on getLocations(str, pattern)
	set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
	return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations

-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
	numberFormatter's setFormat:"0.000"
	set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
	(numberFormatter's setFormat:"0")
	set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if

-- result
elapsedTime --> 76 milliseconds
# count theOffsets --> 12288
# theOffsets

My ASObjC timing script:

use framework "Foundation"
use scripting additions

-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
	set theString to theString & theString
end repeat

-- start time
set startTime to current application's CACurrentMediaTime()

-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
	set theString to current application's NSString's stringWithString:theString
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range") as list
	set theArray to current application's NSArray's arrayWithArray:theRanges
	set theLocations to (theArray's valueForKey:"location")
	return theLocations as list
end getMatches

-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
	numberFormatter's setFormat:"0.000"
	set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
	(numberFormatter's setFormat:"0")
	set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if

-- result
elapsedTime --> 160 milliseconds
# count theOffsets --> 12288
# theOffsets

To avoid this, you can build a large string in any text editor and copy it as a global variable in your script. This way the string will be loaded at compile time and will not be included in the time calculation.

As you can see, using a loop or not does not make a significant difference.

1 Like

@peavine
Its simple to covert array of strings to array of integers…

use framework "Foundation"

set arrayOfStrings to current application's NSArray's arrayWithArray:{"1", "2", "3"}
set integerValues to (arrayOfStrings's valueForKey:"intValue") as list

@ionah
Peavine script use AppleScriptObjC inside the repeat loop. The script you did test on was mine. I didn’t use AppleScriptObjC calls in the repeat loop and thats why it was faster. On my Test it was big difference. Your version… why did I not think about that :).

Hi
grep -E support POSIX ERE regular expressions. You need to adapt the pattern accordingly.

BR

modified version.
observe that grep only support ERE using -E option, to ignore case, option -i is used
if you want to ignore case in first letter then use brackets e.g. “[Tt]est”
see for example wikipedia of POSIX ERE
BR
regexp.applescript (824 Bytes)

You can probably get even higher speed with just one pipe meaning only one call to the shell as below:

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}

on getLocations(str, pattern)
	set res to do shell script "echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern) & "| grep -Eo " & quoted form of ("\\d+")
	return paragraphs of res
end getLocations

BR

Apparently, the grep that comes on a mac —or at least v2.5.1— has a bug such that -b always returns 0. I should note that I haven’t seen anything official to that effect but after doing some searches for grep byte offset, I found several comments making this allegation.

On a whim, I used macports to install gnu grep (ggrep v3.8 which dates back to 2019) and it is returning offsets of 10, 27, 43 on the longer test string. I’m running Sierra so perhaps there are other versions available but at least the --byte-offset option now works.

This is a minor variation on @ionah’s script. While it’s possible to make ggrep the default grep and put it on the path, I’m holding off on that so I had to include its path in the shell command. Additionally, the (?i) option is a PCRE feature so instead of using -E, it requires the -P option. The last leg of the command limits the response to digits and I used text delimiters to remove the returns. I don’t have any tools to test its speed with but it gives the impression of being fairly quick.

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)

set AppleScript's text item delimiters to return
text items of (matchingData)

on getMatches(str, pattern)
	set shellcmd to "echo " & qt(str) & " | /opt/local/bin/ggrep -Pbo " & qt(pattern) & " | /opt/local/bin/ggrep -o '[[:digit:]]*'"
	set res to (do shell script shellcmd)
	return res
end getMatches

on qt(str)
	return "'" & str & "'"
end qt
--> {"10", "27", "43"}

This is the shell command that is being run:

echo 'This is a test line with a Test word other test' | /opt/local/bin/ggrep -Pbo '(?i)test' | /opt/local/bin/ggrep -o '[[:digit:]]*'

There are so many code generator online. So I run @peavine inputString and Pattern

Reference: regex101: build, test, and debug regex

Very useful web app could be build with GitHub - nativefier/nativefier: Make any web page a desktop application
With the link https://regex101.com and now you have Regex101 Web App for any Regex use case.

This is Swift

import Foundation

let pattern = #"(?i)test"#
let regex = try! NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)
let testString = #"This is a test line with a Test word"#
let stringRange = NSRange(location: 0, length: testString.utf16.count)
let matches = regex.matches(in: testString, range: stringRange)
var result: [[String]] = []
for match in matches {
    var groups: [String] = []
    for rangeIndex in 1 ..< match.numberOfRanges {
        let nsRange = match.range(at: rangeIndex)
        guard !NSEqualRanges(nsRange, NSMakeRange(NSNotFound, 0)) else { continue }
        let string = (testString as NSString).substring(with: nsRange)
        groups.append(string)
    }
    if !groups.isEmpty {
        result.append(groups)
    }
}
print(result)

Compare the Swift version with Rust I kind of like that better.

// include the latest version of the regex crate in your Cargo.toml
extern crate regex;

use regex::Regex;

fn main() {
  let regex = Regex::new(r"(?m)(?i)test").unwrap();
  let string = "This is a test line with a Test word";
  
  // result will be an iterator over tuples containing the start and end indices for each match in the string
  let result = regex.captures_iter(string);
  
  for mat in result {
    println!("{:?}", mat);
  }
}

No, this is Swift:

let pattern = /(?i)test/
let testString = "This is a test line with a Test word"
let matches = testString.matches(of: pattern)
let result = matches.map{NSRange($0.range, in: testString).location}
print(result) // [10, 27]

and if you want the substrings

let result = matches.map{String(testString[$0.range])}

1 Like

@StefanK
Your version I like, the Swift code I upload was made from the website https://regex101.com
In the code generator tab (I never touch it). I didn’t test it… but I did have a thought I didn’t like it. :wink: Its little cool that website could have code generator.

Thanks.

Here is a python version.

set theResult to do shell script "/usr/bin/python3 <<EOF
import re
s = 'This is a test line with a Test word'
r = re.finditer(r'(?i)test', s)
for match in r:
    print(match.start())
EOF
"
return paragraphs of theResult

And if you like to know the difference between Python 3.9 and 3.10.
Python.org have claimed that Python 3.10 is faster and previous versions.
The first is Python 3.9 and the second is 3.10. So in other words the difference between unix command grep and Python version 3.10 is almost none.

That said… and know I read this: Python is About to Become 64% Faster - Python 3.10 vs. Python 3.11 Benchmark
I did a test between 3.10 vs 3.11 and the difference was very little. But there was a big difference between version 3.9 and 3.10