GNU SED behaves differently in AppleScript and in Terminal

I need a quick and robust way in AppleScript to turn a string into lowercase, taking into account unicode accented characters.

Running MacOS Catalina 10.15.4, I installed gnu-sed in order to have more flexibility manipulating text strings.

In Terminal, the following command turns ‘ÉTÉS’ into ‘Étés’.

echo ÉTÉS | gsed -E ‘s_^(.)(.*)$\U\1\L\2

However in AppleScript, the same command returns ‘ÉtÉs’. The second ‘É’ did not switch case.

set cmd to "echo ÉTÉS | /usr/local/Cellar/gnu-sed/4.8/bin/gsed -E 's_^(.)(.*)$_\\U\\1\\L\\2_'"
set str2 to do shell script cmd

No matter if going from LC to UC or the reverse, all accented characters seem to keep their original case when the command is ran in AppleScript whereas they change case when the command is ran on Terminal.

Any help to understand and solve this issue would be greatly appreciated. Thanks in advance. W.

PS: another irritation is that AppleScript won’t take simply ‘gsed’, like Terminal does, but requires the full address to the command. I can live with that if I must, but should there be a better way I am all ears…

Hi.

I can’t help with GNU sed, but you can do this with ASObjC, which is probably a tad faster:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"

set str to "ÉTÉS"
set str2 to ((current application's class "NSString"'s stringWithString:(str))'s capitalizedString()) as text

I don’t know a better way to handle this but, FWIW, Apple’s explanation is:

https://developer.apple.com/library/archive/technotes/tn2065/_index.html#//apple_ref/doc/uid/DTS10003093-CH1-TNTAG1

Thank you very much. Both your answers open interesting paths.

Nigel, can you recommend a good starting point to learn ASObjC?

I am not a programmer and I am not interested in writing full applications, be they in Swift or in C. I don’t fully understand what are Cocoa and Xcode.
I am just familiar with AS and with shell scripting. My goals is to understand macOS better in order to automate my workflows with more power and flexibility.

Shane Stanley wrote what appears to be the only book on ASObjC but it dates to El Capitan (>4y) and I fear that most of the examples and many of the concepts will be obsolete by now.

I’d be interested in anything: books, tutorials, online courses.

Thanks again for your help so far, and any more tips if you have the time. W.

Not at all. It covers the basics and the “must know” stuff, all of which (I think) is still relevant. The few things it doesn’t cover (one or two frameworks and classes) you can add to your knowledge later if you need them, either by doing your own research in the developer documentation (which Shane’s book will empower you to do) or by asking here. :slight_smile:

When installing gnu sed on Mac, the shortcut to gsed executable is created in the /usr/local/bin location. So, you can write script shortly:


set cmd to "echo ÉTÉS | /usr/local/bin/gsed -E 's_^(.)(.*)$_\\U\\1\\L\\2_'"
set str2 to do shell script cmd

But I drew attention to the original script not for the reason that it produces an incorrect result on my Mac, but because it produces an empty string as a result.

And with other starts of the original code, I generally get this error:

error “/usr/local/Cellar/gnu-sed/4.8/bin/gsed: line 1: syntax error near unexpected token (' /usr/local/Cellar/gnu-sed/4.8/bin/gsed: line 1: ÉTÉS -E s_^(.)(.*)$\U\1\L\2 ÉTÉS’” number 2

Can someone explain to me why this is happening? This oddity may be the key to unraveling the behavior of gnu sed with regex expressions.

If you’re going to use a shell command to change case, I would use TR, as it has a ready-made function for it.

do shell script "echo ÉTÉS | LANG='fr_FR.UTF-8' tr [:upper:] [:lower:] "

Nigel posted one of the settings available.
Below is the complete set as coded by Shane Stanley.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"

on changeCaseOfText(sourceText, caseIndicator)
	-- create a Cocoa string from the passed text, by calling the NSString class method stringWithString:
	set sourceString to current application's NSString's stringWithString:sourceText
	-- apply the indicated transformation to the Cocoa string
	if caseIndicator is 0 then
		set adjustedString to sourceString's uppercaseString()
	else if caseIndicator is 1 then
		set adjustedString to sourceString's lowercaseString()
	else
		set adjustedString to sourceString's capitalizedString()
	end if
	-- convert from Cocoa string to AppleScript string
	return (adjustedString as string)
end changeCaseOfText

my changeCaseOfText("Please enter no more than 4 comma-delineated tags.", -1) -- may be 0, 1 or any other value

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 15 mai 2020 13:55:02