Using grep and returning a backreference to AppleScript?

Is this possible? I want to use grep to find some text that is between double quotes. I don’t quite have the regular expression worked out yet, but let’s say my file contains:

this line has some “quoted text”

and I have a backreference \1 to the text in between the quotes. How can I return \1 to AppleScript so I can assign it to a variable?

I do say
I like hay
Nay, I fear
I hear, “My Dear”. Ah, I’ll steer
and leave here.

set search_text to "Dear"
set search_file to "/Users/$USER/Desktop/sample.txt"

--set search_command to "grep -e" & space & the quoted form of search_text & space & the quoted form of search_file & space & "| awk -F '\"' '{ print $2 }'"
set search_command to "grep -e" & space & the quoted form of search_text & space & search_file & space & "| awk -F '\"' '{ print $2 }'"
set resulting_text to do shell script search_command

--> "My Dear"

grep will only find the line the string occurs on, awk will parse on the quote and return whatever is between the 1st & 2nd quote. Note: this will probably not work when the quoted material is split between 2 lines. That can be fixed with adding some -num arguments to the grep command

grep just matches lines; you’d need to use awk to extract the bits you want.

Alternatively, you could use a scripting-based solution like TextCommands: its search command includes excellent regex support and is unicode-aware too.

Thanks for the help. The quoted text will never be split across lines, so that’s not a worry. But there will be more than one set of quotes on a line. Basically, I’ll need to isolate the quoted text that corresponds to some tag, i.e.

employee=“John Doe” id=“12345” level=“5”

I was hoping to use grep to match the tag, capturing the text in the quotes with a grouping (for example, something like “employee *= "([^"])"” would get the name of the employee). Then I could use a backreference to return the grouping. There’s no way to do something like “echo \1” so that it will be returned to AppleScript? The reason I’ve steered clear of awk is that the tags are in no particular order, so I won’t know which field corresponds to which tag. Sorry if I’m totally clueless on this subject, I haven’t worked much with unix tools.

employee=“John Doe” id=“12345” level=“5”
employee=“Mary Jane” id=“43210” level=“4”
employee=“Baby Doe” id=“0” level=“Dead”

echo \1 just echoes the string “1”, so I don’t know what you are trying to accomplish with that. Are you trying to get a variable to hold a string? That would be: name=‘Ollie’; echo “$name”
So: [code]Aline=grep Baby /Users/$USER/Desktop/employees.txt; Aname=echo "$Aline" | awk -F = '{print $2}' | awk -F '"' '{print $2}'; AnID=echo "$Aline" | awk -F = '{print $3}' | awk -F '"' '{print $2}'; ALevel=echo "$Aline" | awk -F = '{print $4}' | awk -F '"' '{print $2}'; echo “”“$Aname”“”“$AnID”“”“$ALevel”

→ Baby Doe0Dead[/code]

Thanks for your help! The only problem I see is that the employee/id/level tags may not be in that order so I just modified your code a little bit. I hadn’t realized that awk’s FS can be a regular expression! Since it can, I can do everything in awk:

awk -F “employee *= *” ‘{print $2}’ data.txt | awk -F " ‘{print $2}’

If data.txt contains

employee = “John Doe” id=“12345” level=“5”
id=“43210” level=“4” employee =“Mary Jane”
id=“0” employee=“Baby Doe” level=“Dead”

it will print out:

John Doe
Mary Jane
Baby Doe

Now I can just modify the -F option to get the value of whatever tag I want. Thanks everyone, this board is great.

edit: An alternate way could be:
awk ‘/employee/ {gsub(/ *= */,“=”); sub(/.employee=“/,”“); sub(/”./,“”); print}’ data.txt

Now which one is faster? :confused:

jamdr raises an important issue here I think–notwithstanding the work-around suggested using AWK offered by anaxamander. He should be able to isolate and return text using the regurlar expression back-reference feature. That’s where the part of the regular expression matched within parenthesis can be returned using ‘\n’ where n is the nth set of parenthesis. Of course the grep engine used within OS X requires that the parenthesis are escaped using a back-slash (AppleScript browser will require yet another escape back-slashes). So he should be able to enter something like the following in terminal to show all matched instances:

employee=“John Doe” id=“12345” level=“5”
employee=“Mary Jane” id=“43210” level=“4”
employee=“Baby Doe” id=“0” level=“Dead”

grep -e “(.*)”\1 /Users/yourHomeFolder/Desktop/targetFile.txt

Strangely, this won’t work, though the manual for grep suggest that it should.