Based on a grep question in another recent thread, I began looking into how to do lookahead/behind in the shell; I know how to do this using TextWrangler”not standalone. Reading the man page is an exercise in frustration, so I’'d appreciate any hints/direction. The result of the following is 42. How might I arrive at {42, 52, 37, 47}? TIA.
set theText to "Males: 14-18 g/100mL 42%-52%
Females: 12-16 g/100mL 37%-47%"
--lookahead
do shell script "ruby -e " & ("puts /" & "(\\d)+(?=%)" & "/.match" & theText's quoted form)'s quoted form
One additional question… how could I target a file, rather than supplying the text? When I tried this using read file, I got a plethora of errors.
I don’t know ruby, but I suspect the problem is in your ruby syntax, not the grep pattern. The pattern you have should work as-is, but you’re only getting the first match.
The lookahead pattern for, say, a space or hyphen would be “(?<=( |-))”.
(And boy, using ruby like that seems to add a lot of overhead…)
Hi, Shane. Thanks for looking at the problem. I should have stated that I knew that my issue was with the Ruby, with which I am also unfamiliar. I don’t really mind if it’s horribly inefficient, as this is mostly for my own education, since”I believe”that egrep doesn’t handle lookahead/behind. I continued to hunt for answers and stumbled upon what I needed here:
http://www.regular-expressions.info/ruby.html
The main issue was using match, where scan was needed.
(do shell script "ruby -e 'puts \"Males: 14-18 g/100mL 42%-52%
Females: 12-16 g/100mL 37%-47%\".scan /\\d+(?=%)/'")'s paragraphs
You can also use lookahead/behind with ASObjC:
on findPattern:thePattern inString:theString
set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set theFinds to theRegEx's matchesInString:theString options:0 |range|:{location:0, |length|:length of theString}
set theFinds to theFinds as list -- so we can loop through
set theResult to {} -- we will add to this
set theNSString to current application's NSString's stringWithString:theString
repeat with i from 1 to count of items of theFinds
set theRange to (item i of theFinds)'s |range|()
set end of theResult to (theNSString's substringWithRange:theRange) as string
end repeat
return theResult
end findPattern:inString:
On a very long string with lots of finds, the repeat loop might slow it down a tad, but on the simple example you posted it takes about 2ms, compared to the ruby version’s 43ms.
And just for interest, even egrep and sed ” which can’t do lookahead ” are both a little over twice as fast as the ruby version (on my machine):
(do shell script "egrep -o '\\d+%' <<<\"Males: 14-18 g/100mL 42%-52%
Females: 12-16 g/100mL 37%-47%\" | egrep -o '\\d+'")'s paragraphs
(do shell script "sed -En '/([0-9]+)%[^0-9]*/ { s//\\'$'\\n''\\1/g ; s/[[:print:][:blank:]]*\\n([0-9]+)/\\1/p ; }' <<<\"Males: 14-18 g/100mL 42%-52%
Females: 12-16 g/100mL 37%-47%\"")'s paragraphs
Hey Marc,
No real point in using positive lookahead when a simple capture will do.
I would use the Satimage.osax for this job, because it makes the job very simple. It will handle a string OR a file with aplomb.
set _text to "Males: 14-18 g/100mL 42%-52%
Females: 12-16 g/100mL 37%-47%"
set _list to find text "(\\d+)%" in _text using "\\1" with regexp, all occurrences and string result
--> {"42", "52", "37", "47"}
Or to be more precise we can find each percent range and then break it up:
set _list to find text "(\\d+)%-(\\d+)%" in _text using "\\1 \\2" with regexp, all occurrences and string result
set _list to join _list using " "
set _list to splittext _list using " "
--> {"42", "52", "37", "47"}
Then of course you can always use Perl.
set _text to "Males: 14-18 g/100mL 42%-52%
Females: 12-16 g/100mL 37%-47%
"
set AppleScript's text item delimiters to ","
set _list to text items of (do shell script "perl -we '
my @matches;
while(<>){
chomp;
push @matches, m!(\\d+)%!g;
}
$, = \",\";
print @matches;
' <<< " & quoted form of _text)
--> {"42", "52", "37", "47"}
This one operates on text input, but it’s easy use a file instead.