How could I instantiate a lookahead or target a file (ruby/grep)?

Marc_Anthony · June 29, 2014, 3:18pm

Based on a grep question in another recent thread, I began looking into how to do lookahead/behind in the shell; I know how to do this using TextWrangler”not standalone. Reading the man page is an exercise in frustration, so I’'d appreciate any hints/direction. The result of the following is 42. How might I arrive at {42, 52, 37, 47}? TIA.

set theText to "Males: 14-18 g/100mL	42%-52%
Females: 12-16 g/100mL	37%-47%"
--lookahead
do shell script "ruby -e " & ("puts /" & "(\\d)+(?=%)" & "/.match" & theText's quoted form)'s quoted form

One additional question… how could I target a file, rather than supplying the text? When I tried this using read file, I got a plethora of errors.

Shane_Stanley · June 29, 2014, 11:55pm

I don’t know ruby, but I suspect the problem is in your ruby syntax, not the grep pattern. The pattern you have should work as-is, but you’re only getting the first match.

The lookahead pattern for, say, a space or hyphen would be “(?<=( |-))”.

(And boy, using ruby like that seems to add a lot of overhead…)

Marc_Anthony · July 1, 2014, 2:06pm

Hi, Shane. Thanks for looking at the problem. I should have stated that I knew that my issue was with the Ruby, with which I am also unfamiliar. I don’t really mind if it’s horribly inefficient, as this is mostly for my own education, since”I believe”that egrep doesn’t handle lookahead/behind. I continued to hunt for answers and stumbled upon what I needed here:
http://www.regular-expressions.info/ruby.html

The main issue was using match, where scan was needed.

(do shell script "ruby -e 'puts \"Males: 14-18 g/100mL	42%-52%
Females: 12-16 g/100mL	37%-47%\".scan /\\d+(?=%)/'")'s paragraphs

Shane_Stanley · July 1, 2014, 11:48pm

You can also use lookahead/behind with ASObjC:

on findPattern:thePattern inString:theString
	set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set theFinds to theRegEx's matchesInString:theString options:0 |range|:{location:0, |length|:length of theString}
	set theFinds to theFinds as list -- so we can loop through
	set theResult to {} -- we will add to this
	set theNSString to current application's NSString's stringWithString:theString
	repeat with i from 1 to count of items of theFinds
		set theRange to (item i of theFinds)'s |range|()
		set end of theResult to (theNSString's substringWithRange:theRange) as string
	end repeat
	return theResult
end findPattern:inString:

On a very long string with lots of finds, the repeat loop might slow it down a tad, but on the simple example you posted it takes about 2ms, compared to the ruby version’s 43ms.

Nigel_Garvey · July 2, 2014, 1:23pm

And just for interest, even egrep and sed ” which can’t do lookahead ” are both a little over twice as fast as the ruby version (on my machine):

(do shell script "egrep -o '\\d+%' <<<\"Males: 14-18 g/100mL	42%-52%
Females: 12-16 g/100mL	37%-47%\" | egrep -o '\\d+'")'s paragraphs

(do shell script "sed -En '/([0-9]+)%[^0-9]*/ { s//\\'$'\\n''\\1/g ; s/[[:print:][:blank:]]*\\n([0-9]+)/\\1/p ; }' <<<\"Males: 14-18 g/100mL	42%-52%
Females: 12-16 g/100mL	37%-47%\"")'s paragraphs

ccstone · July 9, 2014, 8:45am

Hey Marc,

No real point in using positive lookahead when a simple capture will do.

I would use the Satimage.osax for this job, because it makes the job very simple. It will handle a string OR a file with aplomb.


set _text to "Males: 14-18 g/100mL	42%-52%
Females: 12-16 g/100mL	37%-47%"

set _list to find text "(\\d+)%" in _text using "\\1" with regexp, all occurrences and string result

--> {"42", "52", "37", "47"}

Or to be more precise we can find each percent range and then break it up:


set _list to find text "(\\d+)%-(\\d+)%" in _text using "\\1 \\2" with regexp, all occurrences and string result
set _list to join _list using " "
set _list to splittext _list using " "

--> {"42", "52", "37", "47"}

Then of course you can always use Perl.


set _text to "Males: 14-18 g/100mL	42%-52%
Females: 12-16 g/100mL	37%-47%
"
set AppleScript's text item delimiters to ","
set _list to text items of (do shell script "perl -we '
my @matches;
while(<>){
	chomp;
	push @matches, m!(\\d+)%!g;
}
$, = \",\";
print @matches;
' <<< " & quoted form of _text)

--> {"42", "52", "37", "47"}

This one operates on text input, but it’s easy use a file instead.