Grep and double Backslash in do shell script

Thanks I forgot to put back the literal period.

Just for the fun of it. doing the same as grep -o inline with sed, without tr.

[code]#!/usr/bin/sed -nf
:start

We jump back here, as long as we aren’t done with a line

h

We save a copy of the line (pattern space) we are about to process.

s/(.[^a-zA-Z0-9._%±])([[:<:]][a-zA-Z0-9._%±]{1,}@{1}[a-zA-Z0-9.-]{1,}.[a-zA-Z]{2,4}[[:>:]])([^a-zA-Z].)/\2/p

we find a valid email address.

t purge

we did find it so we’ll purge it from the line of text in holdspace.

n

we didn’t substitute anything so we fall down here, having grabbed the (n)ext line.

b start

we branch back to top.

:purge
g

purging, we get the pristine copy from holdspace into pattern space.

s/(.[^a-zA-Z0-9._%±])([[:<:]][a-zA-Z0-9._%±]{1,}@{1}[a-zA-Z0-9.-]{1,}.[a-zA-Z]{2,4}[[:>:]])([^a-zA-Z].)/\1\3/

we remove the found mail address we printed out from pattern space.

b start

we jump back to start looking for more mail-addresses on the same line.[/code]

Due to greediness by the regex, and lazyiness by me, the mailaddresses comes out in wrong order when more than one on a line.

The “grep” solution is the best for this problem, but like McUsr, I’ve been trying ” just for the hell of it ” to find an economical “sed” way to do it.

An easy way would be have two passes: the first inserting linefeeds immediately before and after each e-mail address and the second printing those lines of the result which contain the addresses:

set myText to "This is Fred's email address: <Fred.Jones@fibble.co.uk>.
applescript-users@lists.apple.com
rhubarb
hello <john@yahoo.com>, steve@apple.com " & tab & "- "

do shell script ("<<<" & quoted form of myText & " sed -En '/[[:<:]][[:alnum:]._%+-]+@[[:alnum:].-]+\\.[[:alpha:]]{2,4}[[:>:]]/ { s//\\'$'\\n''&\\'$'\\n''/g ; p ; }' | sed -En '/^[[:alnum:]._%+-]+@[[:alnum:].-]+\\.[[:alpha:]]{2,4}$/ p ;'")

A one-pass method’s a bit harder ” not least because there appears to be a bug in “sed” (as implemented in Snow Leopard) whereby if text is skipped with a “not linefeed” character class (“[^\n]*”), any “@” character in the way will be interpreted as a linefeed and sabotage the scan. The negative class “[^[:cntrl:]]” would be an acceptable substitute here, except that it trips on tabs and returns. The safest alternative I’ve found so far is “[[:print:]‘$’\t\r’']”. This represents any printable character, plus the “control” characters tab and return, here provided as literal characters by the shell.

set myText to "This is Fred's email address: <Fred.Jones@fibble.co.uk>.
applescript-users@lists.apple.com
rhubarb
hello <john@yahoo.com>, steve@apple.com " & tab & "- "

do shell script ("<<<" & quoted form of myText & " sed -En '/[[:<:]][[:alnum:]._%+-]+@[[:alnum:].-]+\\.[[:alpha:]]{2,4}[[:>:]]/ { s//\\'$'\\n''&\\'$'\\n''/g ; s/[[:print:]'$'\\t\\r'']*\\n([[:print:]'$'\\t\\r'']+\\n)/\\1/g ; s/\\n[[:print:]'$'\\t\\r'']*$//p ; }'")

Or the same thing with comments:

set myText to "This is Fred's email address: <Fred.Jones@fibble.co.uk>.
applescript-users@lists.apple.com
rhubarb
hello <john@yahoo.com>, steve@apple.com" & tab & "- "

do shell script ("<<<" & quoted form of myText & " sed -En '/[[:<:]][[:alnum:]._%+-]+@[[:alnum:].-]+\\.[[:alpha:]]{2,4}[[:>:]]/ {	# If a line contains one or more e-mail addresses .
	s//\\'$'\\n''&\\'$'\\n''/g ;	# . put a linefeed at the beginning and end of each address .
	s/[[:print:]'$'\\t\\r'']*\\n([[:print:]'$'\\t\\r'']+\\n)/\\1/g ;	# . delete everything before and between the addresses, leaving just the trailing linefeeds .
	s/\\n[[:print:]'$'\\t\\r'']*$//p ;	# . delete everything after the last address and print what is left.
}'")

Edits: Apostrophe removed from the last comment in the last script as it was causing an error! Wrong word corrected in the post narrative.

Neo can’t be much good! :wink:

:slight_smile:

Nice solution. I am not going to speculate over how you found the bug (the non linefeed character class).

I realized later, that I could have adjusted my solution’s greedyness, and speeded it up some by seeking for an email address, like you have done.

I didn’t get the idea that I could wrap the email addresses in between line feeds!

As a matter of fact, I didn’t believe it was possible to do it one pass, so your solution is just amazing!