set theString to “This is a test line with a Test word” set thePattern to “(?i)test” set matchingData to getMatches(theString, thePattern)
on getMatches(str, pattern) set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern) set res to (do shell script shellcmd) set off tooffset of “:” in res return {text 1 thru (off - 1) of res asinteger, countof (characters (off + 1) thru -1 of res)} end getMatches on qt(str) return “"” & str & “"” end qt
Hallenstal. Thanks for responding to my thread. I tested your script but it did not seem to return the expected results, which were {10, 27}. I am not knowledgeable with grep so perhaps I’m doing something wrong. BTW, your script threw an error as written and I had to escape the quote in the qt handler.
set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern) --> {10, 12}
on getMatches(str, pattern)
set shellcmd to "echo " & qt(str) & "| grep -bEo " & qt(pattern)
set res to (do shell script shellcmd)
set off to offset of ":" in res
return {text 1 thru (off - 1) of res as integer, count of (characters (off + 1) thru -1 of res)}
end getMatches
on qt(str)
return "\"" & str & "\""
end qt
-- Tested on Monterey 12.6.3
use framework "Foundation"
use scripting additions
set theString to "This is a test line with a Test word"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range") as list
set theArray to current application's NSArray's arrayWithArray:theRanges
set theLocations to (theArray's valueForKey:"location")
return theLocations as list --> {10, 27}
end getMatches
@Hallenstal shell is working. It’s just the parsing method that is wrong.
set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}
on getLocations(str, pattern)
set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations
@ionah. Many thanks for your suggestions, both of which work great.
It was mentioned above that my original request arose from another thread (see link below), which had to do with timing results in finding 1288 instances of a substring in a string. I reran these tests with @ionah’s two suggestions and the results were as follows (all times are milliseconds):
@peavine
The issue with the type ‘string’ for do shell script is when paragraph is used. It would be easy to convert item 1 of something to integer. Harder to convert paragraphs to integer without other repeat loop. Or AppleScript delimiter hack… I find the AppleScriptObjC version to be more simple and its far more easier to convert any type to anything integer, float or string.
@ionah. I don’t use Script Geek in this particular instance because of the difficulty in getting a very large string without including the time it takes to get that string in the total timing results. My grep timing script:
use framework "Foundation"
use scripting additions
-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
set theString to theString & theString
end repeat
-- start time
set startTime to current application's CACurrentMediaTime()
-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getLocations(theString, thePattern)
on getLocations(str, pattern)
set res to do shell script ("echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern))
return paragraphs of (do shell script ("echo " & quoted form of (res) & "| grep -Eo " & quoted form of ("\\d+")))
end getLocations
-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
numberFormatter's setFormat:"0.000"
set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
(numberFormatter's setFormat:"0")
set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if
-- result
elapsedTime --> 76 milliseconds
# count theOffsets --> 12288
# theOffsets
My ASObjC timing script:
use framework "Foundation"
use scripting additions
-- untimed code
set theString to "My Rob is a cool Robert! His name is Robert... "
repeat 12 times
set theString to theString & theString
end repeat
-- start time
set startTime to current application's CACurrentMediaTime()
-- timed code
set thePattern to "(?i)Rob"
set theOffsets to getMatches(theString, thePattern)
on getMatches(theString, thePattern)
set theString to current application's NSString's stringWithString:theString
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range") as list
set theArray to current application's NSArray's arrayWithArray:theRanges
set theLocations to (theArray's valueForKey:"location")
return theLocations as list
end getMatches
-- elapsed time
set elapsedTime to (current application's CACurrentMediaTime()) - startTime
set numberFormatter to current application's NSNumberFormatter's new()
if elapsedTime > 1 then
numberFormatter's setFormat:"0.000"
set elapsedTime to ((numberFormatter's stringFromNumber:elapsedTime) as text) & " seconds"
else
(numberFormatter's setFormat:"0")
set elapsedTime to ((numberFormatter's stringFromNumber:(elapsedTime * 1000)) as text) & " milliseconds"
end if
-- result
elapsedTime --> 160 milliseconds
# count theOffsets --> 12288
# theOffsets
To avoid this, you can build a large string in any text editor and copy it as a global variable in your script. This way the string will be loaded at compile time and will not be included in the time calculation.
@peavine
Its simple to covert array of strings to array of integers…
use framework "Foundation"
set arrayOfStrings to current application's NSArray's arrayWithArray:{"1", "2", "3"}
set integerValues to (arrayOfStrings's valueForKey:"intValue") as list
@ionah
Peavine script use AppleScriptObjC inside the repeat loop. The script you did test on was mine. I didn’t use AppleScriptObjC calls in the repeat loop and thats why it was faster. On my Test it was big difference. Your version… why did I not think about that :).
modified version.
observe that grep only support ERE using -E option, to ignore case, option -i is used
if you want to ignore case in first letter then use brackets e.g. “[Tt]est”
see for example wikipedia of POSIX ERE
BR regexp.applescript (824 Bytes)
You can probably get even higher speed with just one pipe meaning only one call to the shell as below:
set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set theLocations to getLocations(theString, thePattern) --> {"10", "27", "43"}
on getLocations(str, pattern)
set res to do shell script "echo " & quoted form of (str) & "| grep -bEo " & quoted form of (pattern) & "| grep -Eo " & quoted form of ("\\d+")
return paragraphs of res
end getLocations
Apparently, the grep that comes on a mac —or at least v2.5.1— has a bug such that -b always returns 0. I should note that I haven’t seen anything official to that effect but after doing some searches for grep byte offset, I found several comments making this allegation.
On a whim, I used macports to install gnu grep (ggrep v3.8 which dates back to 2019) and it is returning offsets of 10, 27, 43 on the longer test string. I’m running Sierra so perhaps there are other versions available but at least the --byte-offset option now works.
This is a minor variation on @ionah’s script. While it’s possible to make ggrep the default grep and put it on the path, I’m holding off on that so I had to include its path in the shell command. Additionally, the (?i) option is a PCRE feature so instead of using -E, it requires the -P option. The last leg of the command limits the response to digits and I used text delimiters to remove the returns. I don’t have any tools to test its speed with but it gives the impression of being fairly quick.
set theString to "This is a test line with a Test word other test"
set thePattern to "(?i)test"
set matchingData to getMatches(theString, thePattern)
set AppleScript's text item delimiters to return
text items of (matchingData)
on getMatches(str, pattern)
set shellcmd to "echo " & qt(str) & " | /opt/local/bin/ggrep -Pbo " & qt(pattern) & " | /opt/local/bin/ggrep -o '[[:digit:]]*'"
set res to (do shell script shellcmd)
return res
end getMatches
on qt(str)
return "'" & str & "'"
end qt
--> {"10", "27", "43"}
This is the shell command that is being run:
echo 'This is a test line with a Test word other test' | /opt/local/bin/ggrep -Pbo '(?i)test' | /opt/local/bin/ggrep -o '[[:digit:]]*'
import Foundation
let pattern = #"(?i)test"#
let regex = try! NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)
let testString = #"This is a test line with a Test word"#
let stringRange = NSRange(location: 0, length: testString.utf16.count)
let matches = regex.matches(in: testString, range: stringRange)
var result: [[String]] = []
for match in matches {
var groups: [String] = []
for rangeIndex in 1 ..< match.numberOfRanges {
let nsRange = match.range(at: rangeIndex)
guard !NSEqualRanges(nsRange, NSMakeRange(NSNotFound, 0)) else { continue }
let string = (testString as NSString).substring(with: nsRange)
groups.append(string)
}
if !groups.isEmpty {
result.append(groups)
}
}
print(result)
Compare the Swift version with Rust I kind of like that better.
// include the latest version of the regex crate in your Cargo.toml
extern crate regex;
use regex::Regex;
fn main() {
let regex = Regex::new(r"(?m)(?i)test").unwrap();
let string = "This is a test line with a Test word";
// result will be an iterator over tuples containing the start and end indices for each match in the string
let result = regex.captures_iter(string);
for mat in result {
println!("{:?}", mat);
}
}
let pattern = /(?i)test/
let testString = "This is a test line with a Test word"
let matches = testString.matches(of: pattern)
let result = matches.map{NSRange($0.range, in: testString).location}
print(result) // [10, 27]
and if you want the substrings
let result = matches.map{String(testString[$0.range])}
@StefanK
Your version I like, the Swift code I upload was made from the website https://regex101.com
In the code generator tab (I never touch it). I didn’t test it… but I did have a thought I didn’t like it. Its little cool that website could have code generator.
set theResult to do shell script "/usr/bin/python3 <<EOF
import re
s = 'This is a test line with a Test word'
r = re.finditer(r'(?i)test', s)
for match in r:
print(match.start())
EOF
"
return paragraphs of theResult
And if you like to know the difference between Python 3.9 and 3.10. Python.org have claimed that Python 3.10 is faster and previous versions.
The first is Python 3.9 and the second is 3.10. So in other words the difference between unix command grep and Python version 3.10 is almost none.