Automating word entry with AppleScript - Umlauts

Hello to all,

I use the following script on translation projects when I don’t want to have to keep retyping repeated company names from scratch:

set word_ to ("Company Name") as string -- word to enter
tell application "System Events"
	tell application "OmegaT" to activate
	tell process "OmegaT" to keystroke word_
end tell

The problem here, though, is that if the word has an umlaut letter in it (ä, ö, ü), these letters are not rendered correctly (ü, for instance, is rendered as a).

Is there a short and easy way to make sure these appear correctly without having to script some sort of long conversion process?

Thanks,

Bowjest

Hello.

See if this works, I haven’t tested it.


-- no word can contain more than one umlaut
set word_ to ("üÄÖ") as string -- word to enter
tell application "OmegaT" to activate
typeWithUmlauts for word_

to typeWithUmlauts for aWord
	set l to length of aWord
	considering case and diacriticals
		repeat with i in {"ä", "ö", "ü", "Ä", "Ö", "Ü"}
			set a to offset of i in aWord
			if a = 0 then
				-- keystroke as usual
				if i = "ü" then
					tell application id "sevs"
						tell application process "OmegaT"
							keystroke aWord
						end tell
					end tell
					exit repeat
				end if
			else if a = 1 then
				typeUmlaut for i
				tell application id "sevs"
					tell application process "OmegaT"
						keystroke text 2 thru -1 of aWord
					end tell
				end tell
				exit repeat
			else if a = l then
				tell application id "sevs"
					tell application process "OmegaT"
						keystroke text 1 thru -2 of aWord
					end tell
				end tell
				typeUmlaut for i
				exit repeat
			else -- in the middle 
				tell application id "sevs"
					tell application process "OmegaT"
						keystroke text 1 thru (a - 1) of aWord
					end tell
				end tell
				typeUmlaut for i
				tell application id "sevs"
					tell application process "OmegaT"
						keystroke text (a + 1) thru -1 of aWord
					end tell
				end tell
				exit repeat
			end if
		end repeat
	end considering
end typeWithUmlauts

to typeUmlaut for aChar
	tell application id "sevs"
		tell application process "OmegaT"
			considering case
				if aChar = "ä" then
					key code 30
					key code 0
				else if aChar = "ö" then
					key code 30
					key code 31
				else if aChar = "ü" then
					-- i = ü
					key code 30
					key code 32
				else if aChar = "Ä" then
					key code 30
					keystroke "A"
				else if aChar = "Ö" then
					key code 30
					keystroke "O"
				else -- achar = "Ü"
					key code 30
					keystroke "U"
				end if
			end considering
		end tell
	end tell
end typeUmlaut

Edit
Added considering clauses where it should be necessary, and code for uppercase characters.

Or possibly:

set word_ to ("Don't München It") -- word to enter
set the clipboard to word_
tell application "OmegaT" to activate
tell application "System Events" to keystroke "v" using command down

Why not use the replicate of the manual entry ?


tell application "TextEdit" to activate
tell application "System Events" to tell application process "TextEdit"
	keystroke "¨o"
	keystroke "¨O"
	keystroke "¨a"
	keystroke "¨A"
	keystroke "¨e"
	keystroke "¨E"
end tell

Yvan KOENIG (VALLAURIS, France) jeudi 11 avril 2013 20:38:52

Now, that was a cheap way to get three different representations of the same word. I really hope this works, because this is so much less. :slight_smile:

Edit

And if that doesn’t work, then I hope Yvan’s method works, because then it is just a question of processing the words through text items delimiters. Given the a the OP got which would turn up as a result of key code 0, when feeding ü to the app, I think the app expects precomposed characters, but gets decomposed, and that Yvan’s solution will work, since there are nothing decomposed, when the character are sent as two different keystrokes. -Hindsightly. :slight_smile: Maybe I am wrong about the precomposed/decomposed, but I am sure the problem is there, for some reason not deviating far from my explanation. Because if ü is not in the first bitplane, then it would need more bytes for its representation, than the usual two. I am not totally condifident with how that sequence is built up, but it would be natural that the first byte were empty to signal that something special is about to happen. :slight_smile:

Hello

The code which I posted was tested under 10.8.3.

I tested also the Nigel’s code under 10.8.3 targetting TextEdit.

I tested both in a single pass targetting Pages with :


set word_ to ("Don't München It") -- word to enter
set the clipboard to word_


tell application "Pages" to activate
tell application "System Events" to tell application process "Pages"
	keystroke "¨o"
	keystroke "¨O"
	keystroke "¨a"
	keystroke "¨A"
	keystroke "¨e"
	keystroke "¨E"
	keystroke return
	keystroke "v" using command down
end tell

When the text to insert is already defined in a string or in a file, Nigel’s code relying upon paste is the easier scheme.

Yvan KOENIG (VALLAURIS, France) jeudi 11 avril 2013 22:06:02

Hello.

I’ll just post this handler anyway, that uses your, easier scheme (than mine) Yvan, for the case that Nigel’s way doesn’t work. I think that will depend on how the OmegaT app accepts the clipboard, if it accepts the UTF-8, then I think it will be good, if not, I think it will fail, but I am not sure! :slight_smile:

set AppleScript's text item delimiters to ""
set aword to "üöüÄÖÜ"
tell application "OmegaT" to activate
set aword to process(aword)
tell application "System Events"
	tell application process "OmegaT" to keystroke aword
end tell

to processWord(aword)
	set tids to AppleScript's text item delimiters
	considering case and diacriticals
		repeat with i in {{"ä", "¨a"}, {"ö", "¨o"}, {"ü", "¨u"}, {"Ä", "¨A"}, {"Ö", "¨O"}, {"Ãœ", "¨U"}}
			set AppleScript's text item delimiters to item 1 of i
			set worditems to text items of aword
			set AppleScript's text item delimiters to item 2 of i
			set aword to worditems as text
		end repeat
	end considering
	set AppleScript's text item delimiters to tids
	return aword
end processWord

Guys! Thanks so much for your very generous help!

Nigel’s seems to have done the job perfectly and is short and easy for me, the non-programmer, to understand. I’m going to try the other options you’ve all presented and save them for possible future use.

I’m really chuffed at everyone’s input. I really appreciate it.

Bowjest

Hello.

:slight_smile: I am glad the nicests solution worked for you, and I don’t think you’ll ever need any other.

I am curious.

I wonder why that solution works, is it because the clipboard contains decomposed utf-16, or because the app accepted the clipboard as utf-8, or did it work for some other reason?

What I guess happens, is that that the paste operation silently decomposes it.

The more interesting question is “Why doesn’t the original ‘keystroke’ method work?” The clue seems to be in the name: ‘keystroke’. For those of us using an English-language keyboard (or one of many others), there is no key for a character with an umlaut. The same goes for other diacritical characters. Yvan understood that when he reproduced the multiple keystrokes required to enter each problem character. However, if you switch to a German keyboard, which has keys for six umlaut characters, the script works for those characters:

set word_ to ("äëïöü")
tell application "TextEdit" to activate
tell application "System Events" to keystroke word_

-- Result with British English keyboard selected: aaaaa
-- Result with German keyboard selected: äaaöü
-- The German keyboard has "ä", "ö", and "ü" (both cases), but not "ë" or "ï".

Edit Sorry. The Mac’s British English keyboard does have the upper-case characters “Ë” and “Ï” for some reason and these can be ‘keystroked’ when it’s active.

Hello.

That angle never occured to me, though my first solution with the keycodes sought to remedy it, as I also saw the keycodes as the problem, as with the second. But I thought the characters where typed one at a time, as byte codes, hence, I interpreted the ü to be interpreted as a of OmegaT, to be caused of the ü sent as a precomposed character, from the second unicode plane. hehe Creating complicated hypothesis are never difficult.

But your solution, when you paste into the application, must decompose, anyhow, to create edible characters, like Yvan did (and I) manually, but behind the scenes, sly. :slight_smile:

Because, the pasteboard must be fed to the keyboard, when you paste into the app, when the app isn’t something that has an NSTextField, which is how I have understood the OmegaT application to be (Java), due to previous encounters with it.

Hello McUsr

Look at this sample :


set word_ to ("Don't München It") -- word to enter
set the clipboard to word_


the clipboard as record

--> {Unicode text:"Don't München It", string:"Don't München It", scrap styles:«data styl01000000000010000E00030000000C00000000000000», «class utf8»:"Don't München It", «class ut16»:"Don't München It"}

As You may see, when we store the string in the clipboard, this one get several informations :
{
Unicode text:“Don’t München It”,
string:“Don’t München It”,
scrap styles:«data styl01000000000010000E00030000000C00000000000000»,
«class utf8»:“Don’t München It”,
«class ut16»:“Don’t München It”
}

When we paste, the target application grab one of these informations.

TextEdit and Pages grab the «class utf8» and the scrap styles.
They have no other special job to achieve.

Quite often, when I grab datas from existing documents I edit the clipboard’s contents.

For instance, when the datas are grabbed from Safari, I have to remove some informations because they are badly formed. If I don’t do that, the pasted “tables” are changed in an awful text in which tab characters are replaced by linebreaks. It’s the case if I copy the table describing the state of my bank account or if I copy the list of songs displayed by Amazon for CDs.
The clipboard receive :
«class weba»
«class rtfd»
«class RTF »
«class utf8» in which tab chars are correctly used
«class ut16» in which tab chars are replaced by linebreaks
uniform styles
string in which tab chars are replaced by linebreaks
scrap styles
Unicode text in which tab chars are correctly used

So, when I want to keep the table structure I use


set the clipboard to (the clipboard as «class utf8»)

So, i’m sure that the target app will receive a table, not a list of single values.

Yvan KOENIG (VALLAURIS, France) vendredi 12 avril 2013 17:05:53

Hello Yvan.

Now that was interesting, about the tab characters in utf16 that gets transformed into linebreaks, and that you can bypass it by coercing the clipboard contents to utf8. :slight_smile:

I was actually thinking in those orbits earlier, when I said something about utf8, and, if Pages do it, and TextEdit do it, then I see no reason why any Cocoa application shouldn’t do it. ( I take it every application uses Cocoa for gui on OS X by now.)

What startles me at the moment, is how to look at the “Cocoa Application model”: What happens really when I paste text into a Terminal window? This should result in a number of bytes filling up a buffer, like text having been entered by keystrokes, but really bypassing the keyboard, and filling that buffer directly. If this scheme is to be bullet proof with regards to input, then the clipboard also must pass the text as it would have been passed if the text was typed from the keyboard, taking the essence of what Nigel explained above; the clipboard is to work with every language, so at least when outputing the contents of the clipboard, it is output with “type-able” characters.

I think the keystroke command should see to that what it tries to type is type-able too. :slight_smile:

Alas, keystroke doesn’t behave like paste.

When I try to keystroke a ü, I get a q.

This is why, when a piece of text may embed non-ascii characters, I use paste or of course dedicated syntax when there is an available one.

Exemple :


tell application "Pages" to tell document 1
	set body text to "When I try to keystroke a [b]ü[/b], I get a [b]q[/b]."
end tell

tell application "Numbers" to tell document 1
	tell sheet 1 to tell table 1
		tell cell 3 of column 2
			set value to "When I try to keystroke a [b]ü[/b], I get a [b]q[/b]."
		end tell
	end tell
end tell

Yvan KOENIG (VALLAURIS, France) vendredi 12 avril 2013 18:55:11

That’s interesting, of course. The rest of us have been getting a. I see that q on the French keyboard is where the a is on mine, and the keyboard code for that key is 0, so that’s presumably significant.

Hello Nigel


activate application "TextEdit"
tell application "System Events" to tell application process "TexEdit"
	
	keystroke "ä "
	keystroke "â "
	keystroke "ë "
	keystroke "ê "
	keystroke "ï "
	keystroke "î "
	keystroke "ö "
	keystroke "ô "
	keystroke "ü "
	keystroke "û "
end tell

--> Inserted characters when running the system in French with a French keyboard
--> q q ë ê ï î q ô q q 

If I change the keyboard setting to an English one, I get :

a a a a a a a a a a

If I change the keyboard setting to German one, I get :

ä a a a a a ö a ü a

Yvan KOENIG (VALLAURIS, France) vendredi 12 avril 2013 22:59:11

Hello.

So keystroke is localization agnostic, and keycode isn’t, implicating that key code should just be used for “rigid stuff” like function keys, return, escape, and alike. Keystrokes for the rest.

This has been enlightening!

:slight_smile:

Thanks Yvan. Those are the results I get on my own machine.

Interesting. I hadn’t realised before this came up that the operation of ‘keystroke’ itself (as opposed to its effect on the receiving application) depended on the software keyboard in use at the time.

Software: yes it is, I believe that it is Coca framework, that uses CoreFoundation, to call IOKit, which in turns call HID manager, which in turns sets the USB keyboard layout page, ( everything is translated into USB, as the Bluetooth works through USB).

What character is returned by a keyboard code, is actually configured, as deep down as you can get with software, just a tad above firmware. :slight_smile:

I really don’t care whether it is the character that is translated into non-existing characters on the keyboard, or if the problem lies with passed text that keystroke can’t type. Though I find Nigels and Yvans explanation to be the most probable one. Still, I can’t see why ü on my keyboard would translate into keycode 0, unless an offending byte where put there (0x00).

The most important thing is to know about the issue, and the fix for it, which really is using the clipboard when the text is suspectible to contain diacriticals, and only use key code for “special keys”, that are “hardwired” on every keyboard. :slight_smile:

That doesn’t convey anything to a native English speaker! :wink:

The thrust of the last few posts has been the realisation that ‘keystroke’ can only directly type characters or modifiers which exist in the software “keyboard” in use at the time. Previously, most of us had assumed it simply typed whatever was passed to it in a script.

‘key code’, of course, directly addresses the keys of the physical keyboard, whatever they’ve got printed on them and whatever software “keyboard” is in use at the time. 0, for example, is the code for the key immediately to the right of the caps lock key (on my current machines), which is used for “A” on English, German, and Norwegian keyboards, but for “Q” on French keyboards. The character actually produced when this key’s pressed (or when ‘key code 0’ is used in a script) depends on what’s mapped to it in the prevailing software “keyboard”.

This all suggests that ‘keystroke’ actually works by looking up each character in the current software keyboard and issuing the equivalent ‘key code’ command to the computer. The failure to find a character in the software “keyboard” results in a zero, which, by accident or design, is output as a keyboard code, producing the “a” or “q” observed by the contributors to this thread.

‘keystroke’ and ‘key code’ have been part of System Events’s Processes Suite since at least Tiger, but I’m pretty sure they existed outside that suite on my now-defunct Jaguar system. In any case, they still work whether GUI Scripting’s enabled or not and their effect is received by the frontmost process, whatever that may be. In Yvan’s script four posts up, telling the application process to ‘keystroke’ is both superfluous and ineffective:


activate application "TextEdit"
activate application "Stickies"

tell application "System Events" to tell application process "TexEdit"
	
	keystroke "I've been typed here!"
end tell