Talkin' The Talk with Apple's Speech Tools

Kevin_Bradley · October 22, 2007, 11:00am

Speaking of History…
The most overlooked, under-rated feature of the Mac operating system has to be the speech recognition and text-to-speech tools. Most folks activate them, play with them for a few minutes, then turn them off and forget about them. But with a little help from Applescript (and a good microphone) you can control your Mac verbally and create your own additions to Apple’s speakable items.

The text-to-speech part of the tools goes all the way back to 1984 and the introduction of the Mac, believe it or not! Yep, even though MacInTalk (as it was called then) didn’t make it into the OS until System 6, the software for text-to-speech had already been written at Apple and was being licensed to software developers.

Speech recognition didn’t come along until the early 90’s, with the Casper project. Later refined into what we know know as Mac Speech Recognition, Casper was ahead of its time, as most Mac technology usually is (think 3.5 inch floppy disks, SCSI, USB, etc.).

System 7 brought the release of both Applescript and Speech Recognition, so naturally Apple provided the ability to extend the effectiveness of the recognition software by adding Applescript support. When OS X came along, that support was carried over along with the text-to-speech software and new voices.

Say What You Will
The simplest command to play with is the say command. Although it is in the Standard Additions dictionary, it is still a necessary part of the Applescript speech tools. After all, when you write a cool script that uses speech recognition, do you still want to have your Mac respond with a dialog box? Or a spoken reply?

Here’s a fun one that will make your friends and family do a double-take. Save the script below as “Thank You” in your Speakable Items folder (~/Library/Speech/Speakable Items/). This is where you will always save additions to speech recognition; the file name will be the words you must say to start the script.


set theOptions to {"You are very welcome.", "You're welcome.", "No problem, dude!", "Don't mention it.", "Forget it.", "I hope you tip well!"}
set theChoice to some item of theOptions
say theChoice displaying theChoice with waiting until completion

You’ve taught your Mac some manners! When you say, “Thank you,” your Mac will choose a reply from the list and say it back to you. Try it sometime when there’s someone looking over your shoulder, right after you have asked your Mac to perform some command.

You may have used say previously for giving feedback. The two optional clauses I’ve added above are displaying and waiting until completion. These two are only meaningful if speech recognition is on. The first displays the spoken phrase (or some other text, if you choose) above the round speech recognition floating doodad (I won’t say “widget,” since those belong on the Dashboard). The second one determines if the speech recognition waits until the spoken string is finished before continuing to listen for new input. And don’t forget that you can add the using clause to select a different voice for speaking.

Brave New Word
If, like the folks I mentioned above, you’ve played with speech recognition and couldn’t find a use for it, take a look at what you can do with a little help from Applescript. Let’s start with a simple script:


tell application "SpeechRecognitionServer"
	set theResponse to listen for {"yes", "no"} with prompt "Hello. Do you like me?"
	if theResponse is "yes" then
		say "I like you, too."
	else
		say "I don't care whether you like me or not."
	end if
end tell

The Speech Recognition Server is another Apple “helper” application like System Events, Image Events and Database Events designed specifically for use with Applescript. It only has 3 commands, but within those commands lies the power to create lots of speech-driven fun and usefulness.

This script uses the simplest of the Speech Recognition Server’s commands, listen for. It listens for any item in a list of phrases, words, or numbers and returns the item that was spoken. When you run the script, you will be presented with the speech recognition doodad. You’ll need to press the escape key to get the Mac to “listen” to you (unless you’ve customized your Speech Preferences, in which case use your setup as you usually do). The Mac will wait for your answer and won’t respond to any speech except the two words we asked it to listen for, “yes” and “no.”

Here’s a more practical example. I often forget to eject my iPod before I quit iTunes. And I hate having to use Expose to find the desktop and then right-click the iPod to use the context menu to eject it. There had to be an easier way, and here it is:


--get mounted disks
set theDisks to list disks
set filteredDisks to {}

--filter the list for only ejectable items
repeat with aDisk in theDisks
	tell application "Finder"
		if disk aDisk is ejectable then set end of filteredDisks to aDisk
	end tell
end repeat

set theCount to count items of filteredDisks

--if only 1 item, we'll eject without question
if theCount = 1 then
	set ejectMe to item 1 of filteredDisks
else if theCount > 1 then
	--otherwise, we'll ask which one to eject
	tell application "SpeechRecognitionServer"
		set ejectMe to (listen for filteredDisks with prompt "Which disk do you want to eject?" displaying filteredDisks)
	end tell
else
	say "No ejectable disks." displaying "No ejectable disks."
	quit
end if

tell application "Finder" to eject disk ejectMe
delay 2
say "Ejected disk " & ejectMe displaying "Ejected disk " & ejectMe

Save this as “Eject a disk” in the Speakable Items folder.

If you want to show your user the acceptable responses you can use the displaying {list of string} addition to the listen for command. However, the user will only see the list if the Speech Commands Window is open. And if you only want to wait a short time for a response and then go on, you can use giving up much like you do with display dialog.

Apple’s speech recognition is only designed for executing commands or scripts and not for dictation or data entry. But using some scripting, you can create scripts that fill in things for you. Here’s an example of a number entry script:


--set up variables
property numList : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, "hundred", "thousand", "million", "negative", "minus", "done"}
set theText to {}
set negative to 1
set theChoice to ""

--give feedback that we're listening
say "number"

tell application "SpeechRecognitionServer"
	--loop until we're done
	repeat until theChoice is "done"
		--listen for input
		set theChoice to listen for numList
		--accumulate our input for later processing
		if theChoice is in {"negative", "minus"} then
			set negative to -1
		else if theChoice is not "done" then
			set end of theText to theChoice
		end if
	end repeat
end tell

--figure out what I just said!
set theInput to interpretNumber(theText, negative)

--now input it in the current document
tell application "System Events"
	keystroke (theInput as text)
end tell

on interpretNumber(theList, negative)
	local hundreds, thousands, millions
	
	set hundreds to 0
	set thousands to 0
	set millions to 0
	
	--loop through the items
	repeat with anItem in items of theList
		if anItem < 99 then
			set hundreds to hundreds + (anItem as integer)
		else if anItem as text is "hundred" then
			--{2,"hundred"} would be 2*100
			set hundreds to hundreds * 100
		else if anItem as text is "thousand" then
			--bump the hundreds number to thousands
			set thousands to hundreds
			set hundreds to 0
		else if anItem as text is "million" then
			--same idea as above - allows {1,"hundred","million"} to work right
			set millions to hundreds
			set hundreds to 0
		else
			display dialog "Don't understand " & (quoted form of anItem as text) & "."
		end if
	end repeat
	return (negative * ((millions * 1000000) + (thousands * 1000) + hundreds))
end interpretNumber

Save this as “Number Input” or something you’ll remember that is not similar to another file in the Speakable Items folder. Not a terribly long script, but now you’ve given yourself the ability to input numbers – to dictate to your Mac! If you have specialized input problems like entering passwords or other data that you hate, you can also script it to work in the same way as the above script. Use the speech recognition software to input data and System Events to input it to your current document.

Now, if you’ve opened the dictionary for the recognition application, you’ll see that it also contains listen continuously for and stop listening for identifier. These two are used together. The difference between listing “continuously” and just listening is this: While both cause your script to pause during execution and wait for a phrase to be said, the “continuously” also allows other phrases to be spoken while your script is in “listening” mode.

Why would you want this? If you are already using speech recognition, you may already understand why: Speech recognition has commands for menus, front windows, the current application, etc. all resident at the same time. If you want NO OTHER commands to be executed while your script waits, then don’t use “continuously.” If, on the other hand, you create a “stay open” script application that will execute custom commands you write, and you still want all the regular speech commands available, then use listen continuously for.

This command also requires an identifier, even though it’s shown as optional in the dictionary. The identifier is used so that when you want to quit using the list of phrases you were listening for, you can tell your script to stop listening for that list of phrases. If you want your phrases listed in the speech commands window, use with section title.

Here’s an example of a stay open script application that you might use to input things you use frequently, saving you some typing:


on run
	repeat --keep listening until we're done
		tell application "SpeechRecognitionServer"
			--listen for phrases
			set theChoice to listen continuously for {"insert date", "insert my home address", "insert my email address", "close my info"} with identifier "mine" with section title "My Info"
		end tell
		
		if theChoice is "insert date" then
			--insert date
			tell application "System Events"
				keystroke (current date) as text
			end tell
		else if theChoice is "insert my home address" then
			--insert address
			set myAddress to "123 Main St." & return & "Independence, MO 64055"
			tell application "System Events"
				keystroke myAddress
			end tell
		else if theChoice is "insert my email address" then
			--insert email addr.
			tell application "System Events"
				keystroke "kevinb@macscripter.net"
			end tell
		else
			exit repeat
		end if
	end repeat
end run

on quit
	-- stop listening
	tell application "SpeechRecognitionServer"
		stop listening for identifier "mine"
	end tell
	--remember to continue quit!
	continue quit
end quit

As you can see, by combining speech recognition with System Events, you can build some really powerful timesavers!

The Last Word
I hope I’ve shown you some fun things to think about and maybe try. Apple’s speech tools are pretty darned nice, considering that for a long time no other PC OS had anything like them. Furthermore, if you enable the entire set of vocal commands (look under the “Commands” tab in the “Speech Recognition” tab of the Speech system prefs), you get a whole host of abilities, like adding appointments to iCal, getting information from Address Book, the ability to use the menu system verbally, and others.

The best piece of advice I can give you is to get a good microphone if you want to use the speech recognition. I have a set of USB headphones known as the “C-Media USB Headphones” that has a nice microphone (very sensitive) and I’ve got the sound input ratcheted all the way down to almost zero, and it hears me just fine. I do recommend either a noise-cancelling mic or a very quiet place to experiment, otherwise your TV will start launching programs!

'Til next time, have fun. Crunch code!

JSzaszIV · April 11, 2010, 3:06am

I seem to be having a Problem, I copied and pasted your example for the “yes” “no” prompt into my applescript and ran the script and it just sat there. In the result it just showed that it was still “Running” then timed out with “number -1712”
is there something I am doing wrong because none of the ones that refer to the “SpeechRecognitionServer” seem to be working for me. Thank you in advance.
-John

Ray_Barber · April 11, 2010, 9:10am

When Speech Recognition starts, it can only listen when the user presses the escape-key, by default that is. Maybe you have to do such when the script expect you to say something.

Hope it works,
ief2

JSzaszIV · April 11, 2010, 12:42pm

No such luck. I’ve pressed the escape key and still nothing. Also I’ve gone into the script and tried to make it “…listen continuously…” for the user response and still it just sits there running the script then times out. I’ve even increased the timeout to 300 and sadly beyond and still nothing.
Could it be that it just can’t find the “SpeechRecognitionServer” on my computer or something and its trying to look for it? Or could it be something else entirely?

Adam_Bell · April 11, 2010, 2:29pm

It sounds like the speechrecognitionserver is not getting any sound input so it just keeps on listening. Make sure that in the Speech pref pane you’ve turned speakable items ON, and that in the Sound pref pane you’ve got your mike selected as the sound source and that the input volume is not zero.

JSzaszIV · April 11, 2010, 8:17pm

Nope, still same issue. Heres an example of what I’m dealing with. I’m typing this into applescript;

tell application “SpeechRecognitionServer”
with timeout of 300 seconds
set user_response to listen continuously for {“Sal”, “Sue”, “Bob”, “Wanda”} with prompt “Who’s your friend?”
end timeout
end tell
say the user_response & “is my friend too!”

Its not prompting me or anything yet it still replies to my normal voice commands “What time is it?, ect.” In the end in the Result field I’m getting, “error “SpeechRecognitionServer got an error: AppleEvent timed out.” number -1712”

Any clues??

Dylan_Weber · April 11, 2010, 8:44pm

When the script runs, a little circle appears showing you voice controls. On the small circle-window, there is green bar with the keystrokes you have to press while saying the words.
Or, go into system preferences, go into “Speech”, and look for “Listening Key.”

Adam_Bell · April 11, 2010, 9:13pm

And also, you have to hold the “listening key” down while speaking; not just press it once.

Dylan_Weber · April 11, 2010, 9:47pm

Sorry, that, too.

JSzaszIV · April 17, 2010, 11:56pm

Check, Check and Check. No matter what I do I am still getting the same error code and what not. Why isn’t it prompting me with “Who’s your friend?” when I hit “run” in the applescript anyways?

Kevin_Bradley · April 27, 2010, 6:41pm

When I wrote this article, in early 2008, this still worked for me on OS X 10.4 Tiger. Since then, Apple’s broken the Speech Recognition Server (it happened in Leopard and it is still not fixed in Snow Leopard). The problem is the “say” lines. If you re-write the “Hello” script as I have below, it will work.

say "Hello. Do you like me?"

tell application "SpeechRecognitionServer" to set theResponse to listen for {"yes", "no"}
if theResponse is "yes" then
	say "I like you, too."
else
	say "I don't care whether you like me or not."
end if

Kind of a pain, I know. Doesn’t seem that Apple’s very concerned about this, it’s been broken quite a while. Also, make sure your mic has been calibrated in the Speech pref pane in System Prefs.

JSzaszIV · May 9, 2010, 7:17pm

Thank you so much! That worked perfectly but of course after a few days of testing I have run into another issue that I can’t figure out and again its this blasted SpeechRecognitionServer.

This is what I’m trying to do;

Run a script and based upon the time of day it runs another script then its prompts me with a question and based upon that answer runs another script then bases upon that answer runs another script. The after the first prompt and my answer it runs the next script but then it only prompts me and the speech circle closes and wont accept my response.

Hopefully this explanation makes sense. Thank you.

Kevin_Bradley · May 9, 2010, 8:48pm

UM, no not really clear. Try posting your code, that might help.

lemuralex13 · June 23, 2011, 5:15am

Hey,
I know this is a little late, you probably figure it all out by now, but I use the following code in between every speechrecognitionserver tell. What it does is it waits for the user to say nothing, timing out after almost zero seconds, and has no prompt. Its pretty much just a placeholder. I find it takes about 5 seconds to execute, so that gets a little annoying but it fixes the problem. Every other speech recognition works (all the odd ones, first, third, fifth…), so if you put this in between them all, your code works, placeholder doesnt, yours works, placeholder doesnt… And since the placeholder times out quick, the code still runs. And thanks to every other person on this post for the killall idea!

try
	tell application "SpeechRecognitionServer"
		listen for {} with prompt "" giving up after 1.0E-27
	end tell
end try

Model: Macbook
Browser: Safari 533.21.1
Operating System: Mac OS X (10.6)

Oledan_ethey · September 27, 2012, 2:47am

I’m trying to help a friend with physical disability to be able to communicate using speech recognition and voice over. Can anyone share an applescript that would be able to reply to questions like what’s your name , how old are you , are you ok using a sppech recognition/voice over. She’s using Modbook with Lion OS. Thank you in advance.

tsinn · October 2, 2012, 6:15pm

@Oledan_ethey

Use Google Chrome. It implements speech-to-text for any text field automatically (You should see a microphone icon on the right side of the text field. Click it and you can speak your search request.

You can also send text to google translate via regular http post/gets and specify English in and English out (ie: it’s “translating” from english to english) and it will generate audio of the text. It’s a bit of a hack, but it’s a fairly simple, brute force way to hopefully help your fiend.

Oledan_ethey · October 24, 2012, 3:45am

google chrome will only display what you have said and will not give reply to questions like “what’s your name?” but thanks anyway…

diefledermaus · August 25, 2013, 6:45pm

Has anyone been able to make the above scripts work in 10.8.4. I would like to use this ability but the scripts when compiled ask where is speechrecognitionserver. Well it seems that has been replaced with the dictation like Siri piece is my guess. Can I add the server back in? How can I script this Siri like ability?