Grep substitution from all matches in TextWrangler

wladdy · December 16, 2013, 11:02am

A text document in TextWrangler contains many instances such as:

The grapes of wrath
Author: John Steinbeck

I need to get a list of all the titles and another of the corresponding authors to eventually export into a database.

The following script gets me the title and the author of the first match.
Unfortunately, I can’t find a way to repeat the process with the subsequent matches. Any ideas would be very appreciated. Thanks in advance.


tell application "TextWrangler"
set book_reference to find "\\r(.*)\\rAuthor:.(.*)\\r" searching in text document 1 ¬
		options {search mode:grep, starting at top:true, returning results:true, showing results:false}
	set book_title to grep substitution of "\\1"
	set book_author to grep substitution of "\\2"
end tell

Nigel_Garvey · December 16, 2013, 12:04pm

Hi.

Something like this?

set book_titles to {}
set book_authors to {}

tell application "TextWrangler"
	set book_reference to find "\\r[^\\r]+\\rAuthor:.[^\\r]+\\r" searching in text document 1 ¬
		options {search mode:grep, starting at top:true, returning results:true, showing results:false}
	if (book_reference's found) then
		repeat with this_match in book_reference's found matches
			set title_and_author to this_match's match_string
			-- Because of the leading and trailing returns in the regex,
			-- the matching text begins and ends with blank paragraphs.
			set end of book_titles to paragraph 2 of title_and_author
			set end of book_authors to text 9 thru -1 of paragraph 3 of title_and_author
		end repeat
	end if
end tell

return {book_titles, book_authors}

McUsrII · December 16, 2013, 12:12pm

Hello.

Truth to be told: I wasn’t sure if it could return a list of results in a single document, so I came up with this, which off course need wellformed text, having hardcoded the offset for the Author and all, and not considering multiple spaces/tabs here and there.

tell application "BBEdit"
	set {ctr, parList, resList} to {0, (contents of its text document 1), {}}
	repeat with aPar in every paragraph of parList
		if contents of aPar is not "" then
			set ctr to ctr + 1
			if (ctr mod 2) = 1 then
				set thisLine to {}
				set end of thisLine to (contents of aPar as text)
			else
				set end of thisLine to text 9 thru -1 of (contents of aPar)
				set end of resList to thisLine
			end if
		end if
	end repeat
end tell
-- provisional code to show results
try
	text 0 of resList
on error e
	display dialog (text ((offset of "{" in e) + 1) thru -2 of e)
end try

TecNik · December 16, 2013, 1:07pm

Hi There,

Rather than trying to script it can’t you just use ‘Process Lines Containing’ under the ‘Text’ menu?

Using a search of Author:(.*) (using grep) would find all the authors. You could then do the same search but check the ‘Delete matched lines’ option. This would then give you the titles.

This solution assumes you always have a title and and author for each book.

wladdy · December 16, 2013, 7:35pm

Thank you very much for your replies. Your workarounds, especially Nigel’s, made me realize that I could actually forgo grep altogether:

set book_titles to {}
set book_authors to {}

tell application "TextWrangler"
	set book_reference to find "Author: " searching in text document 1 ¬
		options {starting at top:true, returning results:true, showing results:false}
	repeat with current_match in (found matches of book_reference)
		set l to result_line of current_match
		set book_title to contents of line (l - 1) of text document 1
		set book_author to contents of line l of text document 1
		set book_author to (characters 9 thru -1 of book_author) as string
		set end of book_titles to book_title
		set end of book_authors to book_author
	end repeat
end tell

However, such a trick only works because in my case the text between the two search strings is of constant length. I can imagine (and will surely encounter in the near future) more complex searches absolutely requiring grep and returning many matches. Therefore my initial question remains: is there a way to get the grep substitutions for all matches of a grep search? Thanks again!

McUsrII · December 16, 2013, 7:43pm

Hello.

If there had been an easier way to do it in TextWrangler I’m sure Nigel would have posted that.

Maybe TextWrangler’ replace command, (in the dictionary) can make life a little bit more easier for you.

If you want something more compact than that, then I suggest you use sed, or something else, after all there is an abundance of unix commandline tools that handles text substitutions.

You can also use an asoc handler that takes a template string and returns find groups if you are on Mavericks.

Edit

Consciousness commands me to say that I believe that TechNik’s approach is the best one, when it comes to transforming some text: You’ll probably have some trial and error anyway. Doing the filtering directly in TextWrangler/BBedit, gives you lives update, and it is all undoable, until you get your results. Then, you might record your actions into a script for all I know.

This approach is the most rational one, especially if it is just a “one-off” script, it doesn’t give you any scripting experience though.

Marc_Anthony · December 17, 2013, 7:44pm

Would someone please elaborate on what comprises a grep substitution and for what purpose it might be used? TextWrangler’s dictionary says it’s the computed replacement string based on a grep search (which is, itself, a pattern), but that just leaves me scratching my head. :rolleyes: I ran the OP’s code and both of the “substitutions” appear to be empty texts. I ran Nigel’s code, which appears to only capture the second author/title in a series. I changed the grep pattern and put results into a record, although I’m not sure if this is what is sought.

set recordlist to {}

tell application "TextWrangler"
	set matchList to (find "(?(?=.+\\rAuthor).+|(?<=Author: ).+)" searching in text document 1 options {search mode:grep, starting at top:true, returning results:1, showing results:0})'s found matches
	repeat with Numbr from 1 to count matchList by 2
		set recordlist's end to {Book:(matchList's item Numbr's match_string), Author:(matchList's item (Numbr + 1)'s match_string)}
	end repeat
	end tell

recordlist

kel1 · December 17, 2013, 7:51pm

That’s one thing good about doing with something like sed or others. In an application, you can’t modify it.

Marc_Anthony · December 18, 2013, 11:29pm

I don’t mean to importune, but I’m giving this thread a one-time bump to see if anyone can explain the grep substitution question I posed. I’ll delete the question, if nobody knows.

McUsrII · December 18, 2013, 11:33pm

Rome wasn’t built in a day.

I may not know, now, nor have time to look into the manual, but my guess is that there will be some explanation, either explicitly or implicitly in the reg-exp section of TextWrangler’s manual.

Shane_Stanley · December 18, 2013, 11:38pm

From the BBEdit manual:

Scripting Single Replaces
To do a single find and replace via AppleScript, you can write:
tell application “BBEdit”
set result to (find “BBEdit” searching in text window 1¬
with selecting match)
if (found of result) then
set text of (found object of result) to “Replacement”
end if
end tell
When performing a grep search, you cannot just replace the matched pattern with a
replacement string; the grep subsystem needs to compute the substitutions. The grep
substitution event is provided for this purpose; given a preceding successful Grep
search, it will return the appropriate replacement string. So if you perform a grep
search, the script would look like:
tell application “BBEdit”
set result to find “BBEdit(.+)$” searching in text window 1 ¬
options {search mode:grep}
if (found of result) then
set text of (found object of result) to ¬
grep substitution of “\1”
end if
end tell

Marc_Anthony · December 19, 2013, 1:25am

Thanks.