Detect case of text

Hello,

I need a way to detect the case of a string of text. I am aware of the considering case method, but I could not find a way to actually return the case in use. A simple boolean true/false value for upper/lower case returned would be ideal.

Can someone please help?

Hello.

I would use the string functions in hubi’s address book scripts for doing that, since he has made a very thorough representation of the unicode set in upper and lower case.
Hopefully we can one day use the unicode text classes in AppleScript like in Java.

Have a look in the file _hubionmac’s AddressBook-scripts-lib.

Here you should find functions for makesmall and makebig, use these as the base for yours.

(if a char is not in the normal small ascii, nor in the normal big ascii, then lookup the char in the special character sets)

Doing this should give you some very accurate isUC and isLc functions. If you ask me, i think the functions should take parameters of data type text.

Best Regards

McUsr

This is a small handler which tests if a character is uppercase.


CharacterIsCase("A") --> true
CharacterIsCase("a") --> false
CharacterIsCase("Abc") --> true
CharacterIsCase("aBC") --> false
CharacterIsCase("") --> missing value

on CharacterIsCase(aChar)
	try
		set aChar to aChar as string
		if aChar is "" then error
	on error
		return missing value
	end try
	
	local upperCaseABC
	local lowerCaseABC
	set upperCaseABC to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	set lowerCaseABC to "abcdefghijklmnopqrstuvwxyz"
	
	
	set realChar to character 1 of aChar
	
	considering case
		set isCase to realChar is in (every character of upperCaseABC)
	end considering
	
	return isCase
end CharacterIsCase

Hope it helps,
ief2

Hello.

I stole the char arrays from hubion’s addressbook scripts and made this function for you.
I’m emailing hubi and giving him the function to include in his library.
It should work pretty good at the American and Eurasian continents where unicode text are starting to be a standard. Although they might not be perfect, but still better than plain Ascii.

This function assumes it is fed one word at a time, it will report a whitespace character as a character that it can’t deduce the case from and return null.


property special_bigChars : {"Ä", "Å", "Ç", "É", "Ñ", "Ö", "Ü", "À", "Ã", "Õ", "Ÿ", "Â", "Ê", "Á", "Ë", "È", "Í", "Î", "Ï", "Ì", "Ó", "Ô", "Ò", "Ú", "Û", "Ù"} as text
property special_smallChars : {"ä", "å", "ç", "é", "ñ", "ö", "ü", "à ", "ã", "õ", "ÿ", "â", "ê", "á", "ë", "è", "í", "î", "ï", "ì", "ó", "ô", "ò", "ú", "û", "ù"} as text

log "" & isUC("FÃ…RIKÃ…L") -->true

on isUC(someText) -- or a string.
	local theChars,thisChar,chrNum

	set theChars to characters of someText
	
	repeat with aChar in theChars
		repeat
			set thisChar to contents of aChar
			set chrNum to (id of thisChar) -- thanks to Nigel Garvey.
			if (chrNum is greater than 64) and (chrNum is less than 91) then
				exit repeat -- vanilla uppercase
			else if (chrNum is greater than 96 and chrNum is less than 123) then
				return false -- vanilla lowercase
			else if thisChar is in (my special_bigChars) then
				exit repeat -- unicode uppercase
			else if thisChar is in (my special_smallChars) then
				return false -- unicode lowercase
			else
				return null -- when none of above.
			end if
		end repeat
	end repeat
	return true
end isUC

Best Regards

McUsr

AppleScript has many advantages but this is not one of them and its one more reason I love Ruby. You can open up a class and add methods to it just like you can in Objective-C.

In this example, I add two new methods onto the String class.

Hello

I love Apple Script :slight_smile: and Ill stick with it forever!
-Maybe because it is so difficult, that I’m always challenged by it. If you ask me about small languages I would state that C is a small language, AppleScript is not, and though it is an ECMA language, I find it pretty far from javascript in difficulty though javascript is hard to debug until you get a descent debugger.

It would have been nice to extend a text object like you do in Ruby - (or derive a new class like that).
Uppercase and lowercase properties of text should be intrinsic even in AppleScript.
You can do much with AppleScript objects, but derive a new class from an internal one like that is impossible. But then again: with AppleScript you are having access to objects like MsWord and Excel and so on … :slight_smile: You can pretty much use that language to get the environment you want.

I bet both Ruby and Objective C both are working with Unicode as default, so your example will just work perfectly with any kind of text. Maybe I should have a look at Ruby - or Python.

I’ve worked a little bit with utf-encoding in Java (through regexp?), and to tell the truth: that doesn’t really recall any good memories.

It is a pity that there are so few scripting additions that works well with Snow Leopard at the moment. Because functions like these truly belong in a Scripting Addition, but even Satimage’s osax’s produced output in the console log under Snow Leopard.

Best Regards

McUsr.

Hi.

The “case of a string of text” may be upper, lower, mixed, or none of these. Here’s a version with that interpretation:


property special_bigChars : {"Ä", "Å", "Ç", "É", "Ñ", "Ö", "Ü", "À", "Ã", "Õ", "Ÿ", "Â", "Ê", "Á", "Ë", "È", "Í", "Î", "Ï", "Ì", "Ó", "Ô", "Ò", "Ú", "Û", "Ù"} as text
property special_smallChars : {"ä", "å", "ç", "é", "ñ", "ö", "ü", "à ", "ã", "õ", "ÿ", "â", "ê", "á", "ë", "è", "í", "î", "ï", "ì", "ó", "ô", "ò", "ú", "û", "ù"} as text

on caseOf(txt)
	set upperIDs to id of ("ABCDEFGHIJKLMNOPQRSTUVWXY" & special_bigChars)
	set lowerIDs to id of ("abcdefghijklmnopqrstuvwxyz" & special_smallChars)
	
	set uc to false
	set lc to false
	
	repeat with thisID in (id of txt) as list
		if (thisID is in upperIDs) then
			set uc to true
		else if (thisID is in lowerIDs) then
			set lc to true
		end if
		if ((lc) and (uc)) then return "mixed"
	end repeat
	
	if (uc) then
		return "upper"
	else if (lc) then
		return "lower"
	else
		return "none"
	end if
end caseOf

caseOf("Hello world!") --> "mixed"

Hello.

Nice!

Some questions I hope you take your time to answer.

1.) I just don’t understand why it works without the content’s of operator inside the repeat loop.
-So i guess it is some kind of automatic coercion for a primitive datatype in a list?

2.) And since this is you doing this - I also believe that the list created at the top of the repeat loop is only evaluated once.
Am I right in my guess that this is also an effect of the list consisting of primitive elements?
Or is this because it is a value, which is then coerced to a list?

I'm not finished with reading the manual yet.:|

I also long for the day when the expression that creates the lists of id’s of characters comes naturally.

Good Afternoon from Southern Norway

Best Regards

McUsr

Thanks everyone for your helpful comments.

thanks especially to ief2 for your solution. Ideal for my purposes.

:smiley:

Hello.

I have been reading up on both the repeat loop and Speed in Matt Neuburg’s book “AppleScript The Definitive Guide”.

What I believe I have understood from my question 2 in my previous post, please correct me if I’m wrong:

AppleScript differs between a static list or a dynamic list when evaluation the list in a repeat while loop.
With " dynamic list" I mean a list which contains references to objects in an application.
If we have a dynamic list it is useful to set a variable as a reference to that list outside the repeat loop
in order to not getting it evaluated under every iteration.
This:


tell application "Finder"
	set L to ( every folder of target of finder window 1)
	repeat with aF in L
		....
	end repeat
end tell

executes faster than this:

tell application "Finder"
	repeat with aF in (get  every folder of target of finder window 1)
		....
	end repeat
end tell

whereas this


repeat with aNum in {1,2,3,4,5}
.....
end repeat

does not execute any faster than this


set L to {1,2,3,4}
repeat with aNum in L
.....
end repeat

Best Regards

McUsr

Hi, McUsr.

The value of the loop variable in that kind of repeat is a reference to a particular item in the list. Something like:

item 1 of {72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33}

In most cases, the reference is followed to the referenced value (72 in this case), as it would be if it were written directly into the script. It’s only necessary to use ‘contents of’ when testing for equality or when storing the value inferred by the reference somewhere else:


set aList to {72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33}
set thisID to a reference to item 1 of aList

thisID is in {71, 72, 73} --> true (the stored reference is resolved in most cases)
thisID = 72 --> false (the stored value is the reference, not 72)
contents of thisID = 72 --> true (the stored reference is explicitly resolved before being compared with 72)

set anotherList to {}
set end of anotherList to thisID
set end of anotherList to contents of thisID
anotherList --> {item 1 of {72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33}, 72}

That’s right.
¨

I don’t quite understand the question. The explicit coercion to list does two things.

  1. It resolves the reference ‘id of txt’ to the actual list, so that it’s only evaluated once and not every time round the repeat. (In this case, the reference is written into the script code, not stored in a variable. if you were to write instead ‘repeat with thisID in contents of (id of txt’), you be dealing instead with the reference ‘contents of id of txt’!)

  2. It coerces the integer returned when txt has only one character to the single-item list required by the repeat.

I’ve just seen your later post immediately above this one. I’ll reply to that as soon as I can.

Thank you very much!

I think I got it.

Matt Neuburg has this example on page 290, showing a loop which generates a list every time,
and thereby is highly ineffective (As an example of ineffectiveness).


tell application "Microsoft Entourage"
	repeat with x i n every contact -- gets evaluated for each iteration
		get name of x
	end repeat

This could have been a lot faster, but this example is made to illustrate list variables.
So a more efficient way to to this as you did would be to get it evaluated only once. As he also states on page 291. - Somebody should have gotten some spectacles some years ago!


-- like this
tell application "Microsoft Entourage"
	repeat with x in (get every contact) as list	
		-- effectively "freezes" list variable when loop is first encountered.
		get name of x
	end repeat

It is to add that he uses get and get also resolves references, which is also a deeper method to
freeze the state of the list. -Making the contents of the list non-volatile.
Then I wonder if “as list” is an unnecessary coercion really.
And if I could get by with this example if I wanted a list with “frozen” references?
If I had some other script running in a background process at the same time renaming my contacts
the intent is that I should get correct names if the other process talking to Entourage, catched up with this process and surpassed it. But this script would not be able to reflect that the other script added or deleted contacts if the other script did so.


-- like this
tell application "Microsoft Entourage"
	repeat with x in (every contact) as list
		get name of x
	end repeat

Thank you very much for explaing this to me.

Best Regards

McUsr

this can’t be an original Matt Neuburg script. He would never write this needless coercion :wink:

I am interested by more accented chars than the one treated above.

This is why I use a more powerful tool.


set thestring to "o"

set theCase to my isItUpperCase(thestring)

on isItUpperCase(t)
	copy t to maybe
	set upperT to do shell script "/usr/bin/python -c \"import sys; print unicode(sys.argv[1], 'utf8').upper().encode('utf8')\" " & quoted form of (t)
	considering case
		set maybe to upperT = t
	end considering
	return maybe
end isItUpperCase


It returns true if the character tested is an uppercase one, false if itsn’t.

Of course, you may call it in a loop.

Yvan KOENIG (VALLAURIS, France) mardi 8 juin 2010 14:28:14

It’s correct Stefan.

It is true that I added the as list, but it doesn’t harm, and adds readability.
For people like me who doesn’t have full control over what is going on all of the time.
Hope you all forgive me for that one including Matt Neuburg, I really shouldn’t have done that
but it was clear that I had tinkered the code already by commenting it. I won’t do it again without
stating explicitly what I have done.

Thanks for showing your way Yvan. I’ll ponder stealing it and using your lines for my self,
maybe creating some lists with all characters. If a faster solution is needed.
I’ll test the code on my old G4 if it is acceptable there, it is everywhere (at this time I guess).

Best Regards

From McUsr that currently listens to the WWDC podcast.

I don’t agree, both the loop form repeat . in and the element specifier every . imply that the parameter is a list

When you put it that way I disagree with my self and agree with you.
:slight_smile:

Best Regards

Mcusr

I think you could have served both of your purposes by commenting, instead of coercing…

tell application "Microsoft Entourage"
   repeat with x in (get every contact) --repeat through list of contacts    
       -- effectively "freezes" list variable when loop is first encountered.
       get name of x
   end repeat

I totally agree :slight_smile:

Best Regards

McUsr