Regex Satimage question

professore · June 8, 2013, 1:19pm

Does anyone know if there is a way to use Satimage’s change feature to change the case of a backreference? In at least some regex implementations, it appears that something akin to “\L1” will change the case of backreference 1 to lowercase…

if not, is there a way to use an applescript command to do the same thing?


set newText to change "((min)(or)?)" into " min" in newText with regexp without case sensitive
set newText to change "((maj)(or)?)" into " maj" in newText with regexp without case sensitive

I would like the capture of “minor” and “major” to match any capitalization, but the replacement string should always be lowercase. I want to combine these types of statements so that I don’t have to separately search for “maj” and “min”, just find either and change the capitalization.

Thanks,

Nigel_Garvey · June 8, 2013, 2:26pm

Hi Eric.

This isn’t quite what you meant, but you can combine Satimage’s ‘change’ commands using parallel lists:

set newText to "Symphony No. 1 in B MAJOR ” or was it C Minor?"
set newText to change {"min(or)?", "maj(or)?"} into {"min", "maj"} in newText with regexp without case sensitive
--> "Symphony No. 1 in B maj ” or was it C min?"

professore · June 8, 2013, 4:55pm

ahha! still far better than the cumbersome multi-line versions I have been using.

Thanks!

ccstone · June 8, 2013, 10:22pm

Nice. I forgot you could do that with regex in addition to plain text.

However it’s easy to handle the case with the lower operator: ‘\l’:

set newText to "Symphony No. 1 in B MAJOR ” or was it C Minor?"
set newText to change "((min|maj)(or)?)" into "\\l\\1" in newText with regexp without case sensitive
--> "Symphony No. 1 in B major ” or was it C minor?"

Nigel_Garvey · June 9, 2013, 12:02am

ccstone:

Nice. I forgot you could do that with regex in addition to plain text.

However it’s easy to handle the case with the lower operator: ‘\l’:
set newText to "Symphony No. 1 in B MAJOR ” or was it C Minor?"
set newText to change "((min|maj)(or)?)" into "\\l\\1" in newText with regexp without case sensitive
--> "Symphony No. 1 in B major ” or was it C minor?"

Nicer still.

ccstone · June 9, 2013, 1:26pm

I decided to fiddle with the lists of regex find/replace specs.

While it’s not exactly pretty it seems the nested find/replace is decidedly faster than multiple calls to the change event, but I haven’t yet produced a big enough sample to definitively demonstrate the difference.

At the moment I’m seeing sub 1/1000 second times.

set _text to "
01 Now is the time for all good men to come to the aid of their country.
02 Now is the time for all good men to come to the aid of their country.
03 Now is the time for all good men to come to the aid of their country.
"
set _find to items 1 thru -2 of {¬
	"Now+", ¬
	"good.*men", ¬
	" *come to the *", ¬
	"\\b(aid)\\b", ¬
	"(countr)y", ¬
	"^\\d+ *", ¬
	""}
set _replace to items 1 thru -2 of {¬
	"NOW", ¬
	"malicious politicians", ¬
	" ", ¬
	"\\1 in the sacking", ¬
	"\\1ies", ¬
	"", ¬
	""}
set newText to change _find into _replace in _text with regexp
# return result
set newText to change "(?i)^now.+?all *(.)" into "\\u\\1" in newText with regexp
# return result
set newText to change " +(to|the|of) +" into " " in newText with regexp
# return result
set newText to change "(?m)^\\s+([^\\n]+).+" into "\\1" in newText with regexp

ccstone · June 9, 2013, 1:55pm

I decided that trying to maintain those lists would be a nightmare, so I build a table. It’s a little ugly on the board but will aliign properly in any of the Applescript editors. (I used spaces for the table separators in my local version, but the board squashed them ” so I used a mid-dot in this version.)

I ran speed tests to check the difference between the list-based patterns and your basic call to change.

The text sample was a 10,000 line file in BBEdit:

000001 Now is the time for all good men to come to the aid of their country.
…
010000 Now is the time for all good men to come to the aid of their country.

The speed difference was not a lot - about 5/100’s of a second.

~ 0.25 seconds - simple-call
~ 0.20 seconds - list-based

Tests were run in Smile, but I used the LapTime.osax instead of Smile’s chrono function for timing.

The regex is Satimage.osax-dependent of course.

set _collate to {}
set findRepl to text 2 thru -2 of "
'Now+'····························'NOW'
'good.*men'·······················'malicious politicians'
' *come to the *'·················' '
'\\b(aid)\\b'·····················'\\1 in the sacking'
'(countr)y'·······················'\\1ies'
'^\\d+ *'·························''
'(?i)now.+?all *(.)'··············'\\u\\1'
' +(to|the|of) +'·················' '
'(?m)^\\s+([^\\n]+).+'············'\\1'
"
set _find to find text "^'(.+?)'" in findRepl using "\\1" with regexp, all occurrences and string result
set _repl to find text "·{3,}'(.*?)'" in findRepl using "\\1" with regexp, all occurrences and string result
tell application "BBEdit"
	set _text to text of front document
end tell
repeat 20 times
	set _tmr to start timer
	set newText to change _find into _repl in _text with regexp
	set end of _collate to format ((stop timer _tmr) / 1000) into "##.####"
end repeat
mean of (statlist _collate)

--> ~ 0.20 seconds

set _collate to {}
tell application "BBEdit"
	set _text to text of front document
end tell
repeat 20 times
	set _tmr to start timer
	set newText to change "Now+" into "NOW" in _text with regexp
	set newText to change "good.*men" into "malicious politicians" in newText with regexp
	set newText to change " *come to the *" into " " in newText with regexp
	set newText to change "\\b(aid)\\b" into "\\1 in the sacking" in newText with regexp
	set newText to change "(countr)y" into "\\1ies" in newText with regexp
	set newText to change "^\\d+ *" into "" in newText with regexp
	set newText to change "(?i)now.+?all *(.)" into "\\u\\1" in newText with regexp
	set newText to change " +(to|the|of) +" into " " in newText with regexp
	set newText to change "(?m)^\\s+([^\\n]+).+" into "\\1" in newText with regexp
	set end of _collate to format ((stop timer _tmr) / 1000) into "##.####"
end repeat
mean of (statlist _collate)

--> ~ 0.25 seconds

professore · June 10, 2013, 2:26am

GAAAAAARRRR!!

You mean that, all these hours of debugging and stupid workarounds were the result of the fact that Satimage’s suite does not allow a CAPITAL letter in the case-change escape character???

So


	set newText to change "((min|maj)(or)?)" into "\\l\\1" in newText with regexp without case sensitive

works fine. But,


	set newText to change "((min|maj)(or)?)" into "\\L\\1" in newText with regexp without case sensitive

throws an error!!

Crap. I know Satimage is free, but I searched its manual and Googled in vain for any reference for the use of case-changing back references in Satimage, and it never occurred to me to try lowercase. I saw Nigel’s answer, scurried off to code workarounds, and didn’t check back in time to see Chris’.

Chris, thanks once again. That is amazing.

Out of pique, I feel that I should share my workaround. It requires a ChangeCase function. Here:


-- Accepts a regex to search for, text to search in, and a case in {"UPPER", "lower", "Title", "Sentence"}, changing the found text to 
-- the specified case and returning the entire string.
on FindandChangeCase(thisRegex, searchText, newCase)
	local foundText, changeText
	
	set foundText to find text thisRegex in searchText with regexp, all occurrences, string result and case sensitive -- Does the text exist
	if foundText's length is greater than 0 then -- yes, it exists
		set foundText to the first item of foundText
		set changeText to my ChangeCase(foundText, newCase) -- change the case
		set searchText to change thisRegex into changeText in searchText with regexp and case sensitive -- replace in the search text
	end if
	return searchText
end FindandChangeCase

Sigh.

ccstone · June 10, 2013, 5:32am

Hey Eric,

The moral of that story is to never spend hours in lieu of asking questions. Sometimes things are just not reasonably discoverable. I’ve fallen into that trap many times me’self, but these days I try to restrict frustrations by imposing time limits on how much effort I’ll put into something before asking questions.

The Satimage.osax supports a number of different flavors of regex:

syntax ("POSIX" | "POSIX_EXTENDED" | "EMACS" | "GREP" | "GNU_REGEX" | "JAVA" | "PERL" | "RUBY") » Default: "RUBY"

The default mode is Ruby, and I only occasionally step out to Perl.

Unfortunately the Perl is not PCRE standard, and I don’t know where the differences lie. (There are no exact specifications of what regex libraries are used, so some trial and error is required.)

You should pick up my regex cheet-sheet for BBEdit and TextWrangler: https://gist.github.com/ccstone/5385334

This might also be of use: http://net.tutsplus.com/tag/regular-expressions/

But there are many regex tutorials on the net.

This very issue that has had you pulling your hair had me doing so about a decade ago when I began using the Satimage.osax. (I had used the Regular-Expressions.osax for a number of years before adopting OSX, but it was never updated. Fortunately Satimage filled the gap, and I’ve been a relatively happy user ever since.)

You should join the Satimage User List (SUL): http://www.satimage.fr/software/en/support/sul.html

It’s low-volume, but most of the Satimage experts are members.

set _text to "
01 Now is the time for all good men to come to the aid of their country.
02 Now is the time for all good men to come to the aid of their country.
"
change "(?imsx)

    time.+     # 'x' Free-spacing mode: ignores unescaped white space
                # allows inline comments in grep patterns.

" into "¢" in _text syntax "PERL" with regexp

This demonstrates both the syntax for syntax and free-spacing mode in the regex.

Free-spacing is freaky if you’ve never seen it before, but it can be a God-send when composing a very complex regular expression.

professore · June 10, 2013, 1:33pm

Chris:

All those refs are very helpful, so thank you. I did ask the question, here! – I just didn’t give it enough time, and incorrectly assumed that Nigel’s answer meant that the \l and \u operators were not supported. Should have followed that old advice about ass-uming things.

I very much like the idea of free spacing. I have learned over the years that overcommenting my code is far easier and more efficient than trying to reconstruct my thought process (sometimes years) later.

Well, live and learn. Thanks for all the help.

PS: Yes, the Beethoven Sym #1 is in C min. But the text I was pasting was from www.regexpal.com, and I had altered a whole mess of copied names to test regex for finding different sections… I didn’t think about anyone looking at the actual text!

Warmly

Nigel_Garvey · June 10, 2013, 2:11pm

Sorry to have inadvertently misled you, Eric. :rolleyes: I’d personally not heard of these operators before, since they’re not mentioned (as far as I can see) on the site from which I learned regex. Glad you got your answer in the end.

StefanK · June 10, 2013, 2:20pm

Objection: Beethoven Sym #1 is in C maj. (Maybe it’s a confusion with Brahms)

professore · June 14, 2013, 2:35am

Correct you are! sigh. too much programming, not enough listening to the music.