Key value coding compliance error

Hi
grep -E support POSIX ERE regular expressions. You need to adapt the pattern accordingly.

BR

modified version.
observe that grep only support ERE using -E option, to ignore case, option -i is used
if you want to ignore case in first letter then use brackets e.g. “[Tt]est”
see for example wikipedia of POSIX ERE
BR
regexp.applescript (824 Bytes)

You can probably get even higher speed with just one pipe meaning only one call to the shell as below:

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}

on getLocations(str, pattern)
	set res to do shell script "echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern) & "| grep -Eo " & quoted form of ("\\d+")
	return paragraphs of res
end getLocations

BR

Apparently, the grep that comes on a mac —or at least v2.5.1— has a bug such that -b always returns 0. I should note that I haven’t seen anything official to that effect but after doing some searches for grep byte offset, I found several comments making this allegation.

On a whim, I used macports to install gnu grep (ggrep v3.8 which dates back to 2019) and it is returning offsets of 10, 27, 43 on the longer test string. I’m running Sierra so perhaps there are other versions available but at least the --byte-offset option now works.

This is a minor variation on @ionah’s script. While it’s possible to make ggrep the default grep and put it on the path, I’m holding off on that so I had to include its path in the shell command. Additionally, the (?i) option is a PCRE feature so instead of using -E, it requires the -P option. The last leg of the command limits the response to digits and I used text delimiters to remove the returns. I don’t have any tools to test its speed with but it gives the impression of being fairly quick.

set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)

set AppleScript's text item delimiters to return
text items of (matchingData)

on getMatches(str, pattern)
	set shellcmd to "echo " & qt(str) & " | /opt/local/bin/ggrep -Pbo " & qt(pattern) & " | /opt/local/bin/ggrep -o '[[:digit:]]*'"
	set res to (do shell script shellcmd)
	return res
end getMatches

on qt(str)
	return "'" & str & "'"
end qt
--> {"10", "27", "43"}

This is the shell command that is being run:

echo 'This is a test line with a Test word other test' | /opt/local/bin/ggrep -Pbo '(?i)test' | /opt/local/bin/ggrep -o '[[:digit:]]*'

There are so many code generator online. So I run @peavine inputString and Pattern

Reference: regex101: build, test, and debug regex

Very useful web app could be build with GitHub - nativefier/nativefier: Make any web page a desktop application
With the link https://regex101.com and now you have Regex101 Web App for any Regex use case.

This is Swift

import Foundation

let pattern = #"(?i)test"#
let regex = try! NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)
let testString = #"This is a test line with a Test word"#
let stringRange = NSRange(location: 0, length: testString.utf16.count)
let matches = regex.matches(in: testString, range: stringRange)
var result: [[String]] = []
for match in matches {
    var groups: [String] = []
    for rangeIndex in 1 ..< match.numberOfRanges {
        let nsRange = match.range(at: rangeIndex)
        guard !NSEqualRanges(nsRange, NSMakeRange(NSNotFound, 0)) else { continue }
        let string = (testString as NSString).substring(with: nsRange)
        groups.append(string)
    }
    if !groups.isEmpty {
        result.append(groups)
    }
}
print(result)

Compare the Swift version with Rust I kind of like that better.

// include the latest version of the regex crate in your Cargo.toml
extern crate regex;

use regex::Regex;

fn main() {
  let regex = Regex::new(r"(?m)(?i)test").unwrap();
  let string = "This is a test line with a Test word";
  
  // result will be an iterator over tuples containing the start and end indices for each match in the string
  let result = regex.captures_iter(string);
  
  for mat in result {
    println!("{:?}", mat);
  }
}

No, this is Swift:

let pattern = /(?i)test/
let testString = "This is a test line with a Test word"
let matches = testString.matches(of: pattern)
let result = matches.map{NSRange($0.range, in: testString).location}
print(result) // [10, 27]

and if you want the substrings

let result = matches.map{String(testString[$0.range])}

1 Like

@StefanK
Your version I like, the Swift code I upload was made from the website https://regex101.com
In the code generator tab (I never touch it). I didn’t test it… but I did have a thought I didn’t like it. :wink: Its little cool that website could have code generator.

Thanks.

Here is a python version.

set theResult to do shell script "/usr/bin/python3 <<EOF
import re
s = 'This is a test line with a Test word'
r = re.finditer(r'(?i)test', s)
for match in r:
    print(match.start())
EOF
"
return paragraphs of theResult

And if you like to know the difference between Python 3.9 and 3.10.
Python.org have claimed that Python 3.10 is faster and previous versions.
The first is Python 3.9 and the second is 3.10. So in other words the difference between unix command grep and Python version 3.10 is almost none.

That said… and know I read this: Python is About to Become 64% Faster - Python 3.10 vs. Python 3.11 Benchmark
I did a test between 3.10 vs 3.11 and the difference was very little. But there was a big difference between version 3.9 and 3.10

[quote=“Mockman, post:24, topic:74278”]

I’m runninggrep (BSD grep, GNU compatible) 2.6.0-FreeBSD, which is in MacOS Ventura Version 13.2.1, and it works. Even stranger that (?i) works, since only ERE should work. anyway you can use ERE for most patterns except forward looking.

BR

For the heck of it, I wrote the same thing (I hope) in JavaScript:

(() => {
const str = "My Rob is a cool Robert! His name is Robert... ".repeat(13);

/* start time */
const startTime = new Date().getTime();
for (let i = 0; i < 100000; i++) {
const regEx = /Rob/ig;
const matches = str.matchAll(regEx);
const locations = [...matches].map(m => m.index);
}
const elapsedTime = new Date().getTime() - startTime;
console.log(elapsedTime);
})()

Result is 3205 ms for 100,000 iterations, so about 0.03 ms per iteration when run in Script Editor. On the command line (osascript ...), it runs in a little less time: 2800 ms, i.e. 0.028ms per iteration. I don’t know what the Script Geek timings are in.
I tried the code with a considerably longer string, containing 4096 matches. Then it took about 34 ms per iteration (in Script Editor). So, the run-time behavior is > O(n).

Here is proff of consept that XMLRPC calls to python is very fast.
I made my python example in XMLRPC server and Script Geek return this result

First script XMLRPC call from AppleScript Second script is python 3.11 with do shell script

So I do not have to convise myself anymore that XMLRPC calls is very powerful in AppleScript.