I’m struggling to find a way to delete all lines in a text document that do NOT begin with a alphabetical character (upper OR lowercase, a to z only) in TextWrangler (v5).
I simply need to strip out any lines wnich begin with numerical characters or special characters like \
Regular Expressions are your friend for things like this. You will find a grep reference in Textwrangler’s Help menu. Something like:
^\d+.*
Should find all lines starting with numerics simply replace them with a null string. Special characters like “/” can be listed in a character class to accomplish the same for them. Since you didn’t list the specific characters or provide a simple file to work with I hesitate to give any example for them.
The most reliable regex I’ve been able to produce for this this morning is:
[format]^(\r|[^A-Za-z][^\r]*\r?)[/format]
It matches lines beginning with line breaks (ie. empty lines) or lines beginning with anything other than A-Z or a-z, up to and including the line breaks at the ends (if any).
You can type it into TextWrangler’s “Find” dialog, make sure the “Replace” field is empty, check the “Grep” checkbox, and click on “Replace All”. Unfortunately, it doesn’t delete any trailing line break at the end.
You can also do it by script with this, which does also remove any trailing line break:
tell application "TextWrangler"
tell front text window
-- Remove empty lines or lines not beginning with A-Z in either case.
replace "^(\\r|[^A-Za-z][^\\r]*\\r?)" using "" options {search mode:grep, starting at top:true}
-- Remove any trailing line break at the end.
replace "\\r\\z" using "" options {search mode:grep, starting at top:true}
end tell
end tell
TextWrangler’s searches are case-insensitive by default, so you could shorten [^A-Za-z] to either [^A-Z] or [^a-z]. If you also want to allow lines beginning with accented or non-English letters, change it to [^[:alpha:]].
Nigel has covered the find/replace scripting perfectly well, but don’t forget about the “Process Lines Containing” command.
It can be operated manually or scripted:
tell application "TextWrangler"
tell front text document
process lines containing matching string "^(?:[^a-z]+|[[:blank:]]*$)" matching with grep true ¬
output options {deleting matched lines:true}
end tell
end tell
Like find/replace it is case-insensitive by default.
Nice, Chris. It explicitly deletes lines, which is neater than my approach of replacing the lines’ contents and endings with “” ” even though it’s essentially the same thing. Fooling around with it myself, I see that ‘process lines’ only has to match something in a line, not the whole line. This allows the regex to be simpler.
It still seems that the only way to delete a spare line break at the end of the document is to use a separate instance of ‘replace’. Annoyingly, ‘process lines’ only works in documents and ‘replace’ only in windows: Edit: Both commands do in fact work in either text documents or text windows provided they’re addressed to the ‘text’ of those containers. I’ve replaced the original scriptwork below in the light of Chris’s reply in post #13.
tell application "TextWrangler"
tell text of front text document -- or: tell text of front text window
process lines containing matching string "^(?=[^a-z]|$)" matching with grep true ¬
output options {deleting matched lines:true}
replace "\\r\\z" using "" options {search mode:grep, starting at top:true}
end tell
end tell
As you can see. grep is new to me, but it’s beginning to make (some) sense now.
re: “Annoyingly, ‘process lines’ only works in documents and ‘replace’ only in windows”
I found this this appears to work to strip out any blank lines:
[format]
tell application “TextWrangler”
tell front text document
process lines containing matching string “^[[:space:]]*$” matching with grep true ¬
output options {deleting matched lines:true}
end tell
end tell[/format]
I still have one small problem - what I’m trying to delete is any line that does not start with a single space followed by an alphabetical character.
I tried adapting your search string:
[format]^(?=[^a-z]|$)[/format]
like this:
[format]^(?=[^[[:space:]][a-z]]|$)[/format]
but it doesn’t work, even though this does:
[format]tell application “TextWrangler”
tell front text document
process lines containing matching string “^[[:space:]][a-z]” matching with grep true ¬
output options {deleting matched lines:true}
– ^ means start of line
end tell
end tell[/format]
One final question - [format]grep -v[/format] inverts the search, but why can’t I type [format]matching with grep -v true[/format] ?
Yes. I think all of the scripts above strip out empty lines within text. It’s just if there’s one left over at the end ” ie. the text actually ends with a line break ” that an extra step’s needed to remove it. I think the BBEdit people decided to leave it be normally in case people wanted to keep it.
The regex (grep) in the script is correctly written to match a “white space” character (which could be either a space or a tab, or in some cases a line break) at the beginning of a line, followed by a letter between a and z. The script would delete any lines which do begin this way.
The regex you tried immediately above it isn’t correctly formed. There’s an additional layer of square brackets which shouldn’t be there. (Posix shortcuts like [:space:] can be a bit confusing!)
As Marc said, the form (?= . ) is what’s known as a positive lookahead. The regex it contains doesn’t form part of the final match but indicates what must come immediately after what is matched:
[format]^(?=[^a-z]|$)[/format]
. matches the beginning of a line in which the beginning is followed either by a character which isn’t a letter or by the end of the line.
[format]^([^a-z]|$)[/format]
. includes the non-letter or line end in the match. The difference is academic in the case of an empty line. It’s actually also academic with ‘process lines’, where the regex is only used to identify the line, not an exact piece of text.
A negative lookahead matches something not followed by whatever the lookahead matches, which is what you seem to need here:
[format]^(?![[:space:]][a-z])[/format]
Since an empty line is one of those not beginning with a space and an alphabetic character, it’s already covered. If you want the space to be literally a space character, not a tab, use a literal space in the regex:
tell application "TextWrangler"
tell front text document
-- Delete lines not beginning with a space and a letter.
process lines containing matching string "^(?! [a-z])" matching with grep true ¬
output options {deleting matched lines:true}
end tell
end tell
grep -v is a command-line command which invokes the system’s ‘grep’ program with the -v option. It’s not the same as the ‘grep’ term used in TextEdit’s AppleScript implementation.
By the way, this forum has special tags for posting AppleScript code: [applescript] and [/applescript]. There’s a button for them on the posting page. Enclosing AppleScript code in them causes it to be displayed as above with a clickable link which opens it in the clicker’s default script editor.
Starting with this text in the front TextWrangler 5.0.1 document:
[format]
01 Now is the time for all good men to come to the aid of their country.
02 Now is the time for all good men to come to the aid of their country.
03 Now is the time for all good men to come to the aid of their country.
[/format]
All of these work.
You do have to reference the text object in a couple of places though.
(This seems a bit inconsistent to me, so I think I’ll report it to Bare Bones.)
Look for the lines with " → Note ."
-------------------------------------------------------------------------------------------
tell application "TextWrangler"
tell text of front text document --> Note the use of text here.
replace "Now" using "¢¢¢" options {search mode:grep, case sensitive:false, starting at top:true}
end tell
end tell
tell application "TextWrangler"
tell front text window
replace "Now" using "¢¢¢" options {search mode:grep, case sensitive:false, starting at top:true}
end tell
end tell
-------------------------------------------------------------------------------------------
tell application "TextWrangler"
tell front text document
process lines containing matching string "2" output options {deleting matched lines:true} ¬
with matching with grep
end tell
end tell
tell application "TextWrangler"
tell text of front text window --> Note the use of text here.
process lines containing matching string "5" output options {deleting matched lines:true} ¬
with matching with grep
end tell
end tell
-------------------------------------------------------------------------------------------