A text document in TextWrangler contains many instances such as:
The grapes of wrath
Author: John Steinbeck
I need to get a list of all the titles and another of the corresponding authors to eventually export into a database.
The following script gets me the title and the author of the first match.
Unfortunately, I can’t find a way to repeat the process with the subsequent matches. Any ideas would be very appreciated. Thanks in advance.
tell application "TextWrangler"
set book_reference to find "\\r(.*)\\rAuthor:.(.*)\\r" searching in text document 1 ¬
options {search mode:grep, starting at top:true, returning results:true, showing results:false}
set book_title to grep substitution of "\\1"
set book_author to grep substitution of "\\2"
end tell
set book_titles to {}
set book_authors to {}
tell application "TextWrangler"
set book_reference to find "\\r[^\\r]+\\rAuthor:.[^\\r]+\\r" searching in text document 1 ¬
options {search mode:grep, starting at top:true, returning results:true, showing results:false}
if (book_reference's found) then
repeat with this_match in book_reference's found matches
set title_and_author to this_match's match_string
-- Because of the leading and trailing returns in the regex,
-- the matching text begins and ends with blank paragraphs.
set end of book_titles to paragraph 2 of title_and_author
set end of book_authors to text 9 thru -1 of paragraph 3 of title_and_author
end repeat
end if
end tell
return {book_titles, book_authors}
Truth to be told: I wasn’t sure if it could return a list of results in a single document, so I came up with this, which off course need wellformed text, having hardcoded the offset for the Author and all, and not considering multiple spaces/tabs here and there.
tell application "BBEdit"
set {ctr, parList, resList} to {0, (contents of its text document 1), {}}
repeat with aPar in every paragraph of parList
if contents of aPar is not "" then
set ctr to ctr + 1
if (ctr mod 2) = 1 then
set thisLine to {}
set end of thisLine to (contents of aPar as text)
else
set end of thisLine to text 9 thru -1 of (contents of aPar)
set end of resList to thisLine
end if
end if
end repeat
end tell
-- provisional code to show results
try
text 0 of resList
on error e
display dialog (text ((offset of "{" in e) + 1) thru -2 of e)
end try
Rather than trying to script it can’t you just use ‘Process Lines Containing’ under the ‘Text’ menu?
Using a search of Author:(.*) (using grep) would find all the authors. You could then do the same search but check the ‘Delete matched lines’ option. This would then give you the titles.
This solution assumes you always have a title and and author for each book.
Thank you very much for your replies. Your workarounds, especially Nigel’s, made me realize that I could actually forgo grep altogether:
set book_titles to {}
set book_authors to {}
tell application "TextWrangler"
set book_reference to find "Author: " searching in text document 1 ¬
options {starting at top:true, returning results:true, showing results:false}
repeat with current_match in (found matches of book_reference)
set l to result_line of current_match
set book_title to contents of line (l - 1) of text document 1
set book_author to contents of line l of text document 1
set book_author to (characters 9 thru -1 of book_author) as string
set end of book_titles to book_title
set end of book_authors to book_author
end repeat
end tell
However, such a trick only works because in my case the text between the two search strings is of constant length. I can imagine (and will surely encounter in the near future) more complex searches absolutely requiring grep and returning many matches. Therefore my initial question remains: is there a way to get the grep substitutions for all matches of a grep search? Thanks again!
If there had been an easier way to do it in TextWrangler I’m sure Nigel would have posted that.
Maybe TextWrangler’ replace command, (in the dictionary) can make life a little bit more easier for you.
If you want something more compact than that, then I suggest you use sed, or something else, after all there is an abundance of unix commandline tools that handles text substitutions.
You can also use an asoc handler that takes a template string and returns find groups if you are on Mavericks.
Edit
Consciousness commands me to say that I believe that TechNik’s approach is the best one, when it comes to transforming some text: You’ll probably have some trial and error anyway. Doing the filtering directly in TextWrangler/BBedit, gives you lives update, and it is all undoable, until you get your results. Then, you might record your actions into a script for all I know.
This approach is the most rational one, especially if it is just a “one-off” script, it doesn’t give you any scripting experience though.
Would someone please elaborate on what comprises a grep substitution and for what purpose it might be used? TextWrangler’s dictionary says it’s the computed replacement string based on a grep search (which is, itself, a pattern), but that just leaves me scratching my head. :rolleyes: I ran the OP’s code and both of the “substitutions” appear to be empty texts. I ran Nigel’s code, which appears to only capture the second author/title in a series. I changed the grep pattern and put results into a record, although I’m not sure if this is what is sought.
set recordlist to {}
tell application "TextWrangler"
set matchList to (find "(?(?=.+\\rAuthor).+|(?<=Author: ).+)" searching in text document 1 options {search mode:grep, starting at top:true, returning results:1, showing results:0})'s found matches
repeat with Numbr from 1 to count matchList by 2
set recordlist's end to {Book:(matchList's item Numbr's match_string), Author:(matchList's item (Numbr + 1)'s match_string)}
end repeat
end tell
recordlist
I don’t mean to importune, but I’m giving this thread a one-time bump to see if anyone can explain the grep substitution question I posed. I’ll delete the question, if nobody knows.
I may not know, now, nor have time to look into the manual, but my guess is that there will be some explanation, either explicitly or implicitly in the reg-exp section of TextWrangler’s manual.