In an AppleScript, I’m trying to determine if a string contains a capital letter.
I see many references to changing an entire string from lower to upper and vice versa. However, I do not see any references on how to determine if a string contains a particular case.
I’m looking to determine if a string “sTring” contains any capitalization.
use AppleScript version "2.3.1"
use scripting additions
use framework "Foundation"
my isItAllLower:"string" --> true
my isItAllLower:"sTring" --> false
on isItAllLower:aString
set allLower to (current application's NSString's stringWithString:aString)'s lowercaseString() as text
considering case
aString = allLower
end considering
return result
end isItAllLower:
Yvan KOENIG running High Sierra 10.13.3 in French (VALLAURIS, France) vendredi 16 mars 2018 14:29:53
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theString to "sTring"
set theString to current application's NSString's stringWithString:theString
return not (theString's isEqualToString:(theString's lowercaseString())) as boolean
Or:
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theString to "sTring"
set theString to current application's NSString's stringWithString:theString
set theRange to theString's rangeOfCharacterFromSet:(current application's NSCharacterSet's uppercaseLetterCharacterSet())
return (|length| of theRange = 1)
Here is a sed shell script solution that returns a bit more case information. Not as slick or fast as ASObjC, but it works:
on caseInfo(theString)
-- Returns 1 if lowercase only, 2 if uppercase only, 3 if both lowercase and uppercase, and 0 if neither (i.e., no alphabetic characters)
return (do shell script "echo $(( $(sed -E 'h ; s/[^a-z]//g ; s/.+/+1/ ; x ; s/[^A-Z]//g ; s/.+/+2/ ; G ; ' <<<" & theString's quoted form & ") ))") as integer
end caseInfo
caseInfo("string") --> 1
caseInfo("STRING") --> 2
caseInfo("sTring") --> 3
caseInfo("123456") --> 0
Applying Nigel’s technique of prefixing sed with LC_ALL=‘en_US’ (or LC_ALL=‘en_GB’, if you prefer), and using the POSIX [:lower:] and [:upper:] character classes to match lowercase and uppercase characters robustly, the modified sed solution now handles both ASCII and non-ASCII text:
Two further refinements were made to the sed command:
It now handles multiline strings properly by loading all lines before performing case testing.
It now performs less text substitution and eliminates one unnecessary hold space read and is thus a bit more efficient (although this gain in execution speed would be small in comparison with the fixed 0.02 seconds or so of overhead of executing the do shell script command).
Although the previously posted sed solution works, the following ASObjC solution, adapted from Shane Stanley’s handler, returns the same case information as the sed handler but 80 to 90 times faster on my machine, no doubt because of the overhead of the do shell script command. (When will I ever learn?)
use framework "Foundation"
use scripting additions
on caseInfo(theString)
-- Returns 1 if lowercase only, 2 if uppercase only, 3 if both lowercase and uppercase, and 0 if neither (i.e., no alphabetic characters)
tell ((||'s NSString)'s stringWithString:theString)
set hasLowercase to not ((its isEqualToString:(its uppercaseString())) as boolean)
set hasUppercase to not ((its isEqualToString:(its lowercaseString())) as boolean)
end tell
return (hasLowercase as integer) + 2 * (hasUppercase as integer)
end caseInfo
The one scenario where the sed approach might make sense is when case information is needed in the midst of a larger shell script, and one didn’t want to break the shell script up. Otherwise, the ASObjC approach presented here is the preferred of the two methods.
FWIW, do shell script shouldn’t shoulder all the blame. If I run your code here it takes about 0.26 seconds. Subtract 0.02 * 8 for the do shell script overhead (and I think 0.02 might be on the high side outside an editor) and you still get 0.1. The ASObjC code takes less than 0.002, so there’s still a factor of about 50 times. (Timings done in Script Geek.app.)
I used gdate, GNU’s version of bash’s date command, to measure actual sed command execution time within the do shell script command with an accuracy in the range of about a millisecond. (Accuracy beyond that is limited by the time it takes to execute the gdate command itself.) I also measured total do shell script command execution time with the LapTime osax with an accuracy in the range of about a tenth of a millisecond. In this case, do shell script contained only the sed command without the time-testing commands so that it could be compared to ASObjC equivalently. Finallly, I measured ASObjC command execution time with the LapTime osax.
Here are the accumulated times to perform 100 repetitions of the sed vs ASObjC algorithms I posted earlier:
Accumulated time for 100 repetitions of the sed handler containing the do shell script command and its sed command:
3.682 seconds
3.747 seconds
3.718 seconds
Accumulated time for 100 repetitions of the sed command itself:
0.371 seconds
0.410 seconds
0.411 seconds
Accumulated time for 100 repetitions of the ASObjC handler containing the ASObjC commands:
0.023 seconds
0.022 seconds
0.023 seconds
Ratio of do shell script / ASObjC:
160
170
162
Ratio of sed command alone / ASObjC:
16
19
18
I’m not sure why my do shell script command execution times are about double what they were when I measured them previously, but the results are telling nonetheless. Just as you point out, sed is much slower than ASObjC, about 16 to 19 times slower in the current tests. But even that slowness is exacerbated another 10-fold or so by the overhead of do shell script, a veritable double whammy.
Bottom line: Outside of a larger shell script where the sed solution might be convenient, ASObjC is the way to go.
P.S. What I have been calling sed is actually a combination of a sed command, bash addition, and an echo command. It’s hard to imagine a shell solution that would be dramatically more efficient. Even if such a solution were available, do shell script imposes such a time burden (in this case, about 90% of the burden) that it wouldn’t have a chance in a speed test against ASObjC.
It might be slower using sed, but your approach of using regex might still be the better one. For example, this is nearly 20% faster than the previous ASObjC method:
That forced me to read about Unicode Property Names and the “\p{ Lu}” and “\p{ Ll}” search expressions. It turned into a great learning exercise! What a great idea. The search will stop as soon as it encounters the first character of the matching case. That is very efficient.
Can you please answer one silly question: Why do you place a space character before Lu and Ll inside the curly braces?
Having grown comfortable over the years with the ability to express powerful regular expressions in tersely coded grep and sed commands, I at first balked at the verbosity and clunkiness of NSRegularExpression, NSRegularExpressionSearch, and related items. But, my goodness, what powerful animals they are! Not only do they offer Perl-like regex features such as lookahead and lookbehind searching and so much more, but they also make full use of the Unicode standard, a nice example being your use of Unicode Property Names to find case-specific information efficiently to solve the current problem. I suspect that through repetition, NSRegularExpression will become just as comfortable to use over time, and the effort will be well rewarded.