PDA

View Full Version : Speech Plugin Tips and Tricks (thread, perhaps?)



headshotfairy
March 11th, 2005, 12:35 AM
Hey again...

As you all know, I'm ab-so-lutely nuts over this speech plugin like a man with no arms!

Anyways, I've been working with it for a few days and I found out a couple (obvious and not-so-obvious) ways to make it work more efficiently. Add your own to this thread if you want to!

#1: Train your speech profile! Yes, really. If you don't do even the basic training, the speech plugin is quite useless.

#2: Don't use trigger words that are easily picked up. Using words like "up" and "boat" and "stuff" are easily misconstrued by the software when the microphone is exposed to general noise and clatter from your environment. Use words like "terminate" instead of "close", or "lockdown" instead of "lock"... it will reduce the number of accidental events sent.

#3: Put powerful commands (like ones that kill or start processes, lock the desktop or nuke files) into separate groups that are unlocked by a special keyword, and make the group lock itself afterwards.... I use the most excellent OSDMenu plugin to automate some of this. Again, this helps to prevent some accidental catastrophies. Plus, the OSDMenu plugin can be used to automate the re-locking of the "dangerous commands" group, and the menu could be populated with all the commands that the group contains (as a reminder, or you could rig it so that you can click them), and the OSDMenu can also be rigged with a timeout that would re-lock the "dangerous commands" group if you don't say anything...

#4: Better yet, I have almost all my commands in a toplevel group that is unlocked when I say "computer" (I even have star-trek nerdy computer beeps that play when I do this :D )
When I say "dismissed", this toplevel group locks itself. Common sense would dictate that you put the "computer" command outside the toplevel group so that you can always reenable your commands by voice.

#5: If you are still dealing with many false events being sent, leave the Girder Window open, disable all speech commands, and just keep an eye on the bottom right corner. Every time an erroneous event is sent, you can see a number indicating the index number of the word being picked up. Edit your girderspeech.xml file and make that word more complex.

#6: When choosing words, don't always spell them "properly".... instead, spell them phonetically! The reason behind doing this is that the speech recognition software does not always know when certain vowels have a long or short sound (i.e. a "long" vowel sounds like the letter itself, while a "short" vowel is more of a terse utterance...

examples:
Long "A" sounds like "Ay" (like "plate"), Short "A" sounds like "a" (like "flat")
Long "E" sounds like "EEE" (like "steep", Short "E" sounds like "eh" (like "step")
Long "I" sounds like "Aye" (like "fight"), Short "I" sounds like "ih" or "eeh" (like "fit")

You get the picture.... in addition, there are tons of silent letters and weird spelling conventions used in the english language, and the speech recognition software doesn't know them all. Spell things phonetically so that you can define explicitly the sounds the plugin will react to.

Instead of using the word "vocabulary", spell it "vokahbewleree"... or instead of "firefox" use "farfawks" -- this is important, because last time I told it to launch my webbrowser with the word "firefox", I had to pronounce it "FEER-FAWKS" because it thought the letter "I" was a short "I"...

Even better, if you really want a good idea of what your computer will expect you to say, launch the Speech control panel, go to the tab that says "Text To Speech" and enter the word you want to test into the test textbox... Then hit Preview Voice... The computer will try to say what you have entered, and if it's saying something different than what you expect, modify the spelling to be more phonetic until both you and the computer agree on the pronunciation of the word. Then copy and paste the modified word back into the Girderspeech.xml file. Results Guaranteed!

#7: If you are lazy (like me) and can't be bothered to turn off the microphone when you aren't planning to send voice commands, go to your Speech control panel, click the voice profile you are using, and hit the Settings Button...

At the bottom of the window there is a checkbox for "Background Adaptation" -- uncheck this -- having it checked makes the speech recognizer adapt to your voice over time as you use it, but a lot of other noises and general clatter get through to the mic when you're not using it and wind up "un-training" your profile.

Of course, I am really only GUESSING this, but it seems like a common sense thing to do, no?

#8: Make two actions that will save you a lot of time when messing around wth your speech files:

One, that will load the girderspeech.xml into your favorite text editor -- the purpose of this is obvious... I say "vocabulary" to do this...

Another, that will unload and reload the speech plugin -- this will make the speech plugin reread any changes you applied to your girderspeech.xml... You can do this by creating a Variable Manipulation Script containing the following code:



UnloadPlugin (88)
LoadPlugin (88)


Or, if you are MEGA lazy, load the attached GML and stick it where you need it.