PDA

View Full Version : Speech Recognition Plugin v1.0 beta release



gbumgard
May 23rd, 2006, 05:50 PM
I've attached a zip containing the initial distribution of the new Girder 4 Speech Recognition plugin. The manual for the plugin is also attached.

A detailed description of the usage and capabilities of the plugin can be found in the manual.

I would like some feedback on whether the plugin distribution is complete (the plugin can be installed, configured and generate events) prior to uploading the plugin to the plugin download area.

Thanks in advance for your help!

-g.b.

Promixis
May 23rd, 2006, 09:22 PM
it looks you compiled against the debug v of mfc. going to look for it ;)

gbumgard
May 23rd, 2006, 09:23 PM
Yep. I did. Does it matter. If so, will recompile as Release and repost.

Promixis
May 23rd, 2006, 09:26 PM
http://www.dll-files.com/dllindex/download.php?msvcp71ddownload0UGkREVGhT

downloaded it but no go...

gbumgard
May 23rd, 2006, 09:32 PM
Hmmm. I wonder if I'm going to have to redist the latest mfc and msvcrt libraries. You may need all three. Mine were probably installed with Visual Studio.

I've attached a Release build of the DLL.

-g.b.

Promixis
May 23rd, 2006, 09:33 PM
maybe, i do not think we include the 7.1 mfc dll's. will have to check with Ron. found the other debug dll. testing now..

gbumgard
May 23rd, 2006, 09:44 PM
Now that I think about it, I shouldn't be linking in MFC.
A C-runtime (MSVCRT) library version mismatch might be a problem though.

The following is the linker command line produced by the project settings:

/OUT:"C:\Program Files\Promixis\Girder\plugins\Speech.dll" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files\Promixis\Girder\lib" /DLL /DEF:".\SpeechPlugin.def" /DEBUG /PDB:"Debug/SpeechPlugin.pdb" /SUBSYSTEM:WINDOWS /IMPLIB:"Debug/Speech.lib" /MACHINE:X86 girder.lib shared.lib dui.lib lua5.lib lualib5.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "\Program Files\Promixis\Girder\lib\girder.lib" "\Program Files\Promixis\Girder\lib\Shared.lib" "\Program Files\Promixis\Girder\lib\dui.lib" "\Program Files\Promixis\Girder\lib\lua5.lib" "\Program Files\Promixis\Girder\lib\lualib5.lib" "\Program Files\Microsoft Platform SDK\Lib\Kernel32.Lib" "\Program Files\Microsoft Platform SDK\Lib\User32.Lib" "\Program Files\Microsoft Platform SDK\Lib\Gdi32.Lib" "\Program Files\Microsoft Platform SDK\Lib\WinSpool.Lib" "\Program Files\Microsoft Platform SDK\Lib\ComDlg32.Lib" "\Program Files\Microsoft Platform SDK\Lib\AdvAPI32.Lib" "\Program Files\Microsoft Platform SDK\Lib\Shell32.Lib" "\Program Files\Microsoft Platform SDK\Lib\Ole32.Lib" "\Program Files\Microsoft Platform SDK\Lib\OleAut32.Lib" "\Program Files\Microsoft Platform SDK\Lib\Uuid.Lib" "\Program Files\Microsoft Platform SDK\Lib\odbc32.lib" "\Program Files\Microsoft Platform SDK\Lib\odbccp32.lib"

gbumgard
May 23rd, 2006, 09:46 PM
I guess that one or more of the other libs might have an MFC dependency.

Promixis
May 24th, 2006, 05:43 AM
getting the window below on setup

Promixis
May 24th, 2006, 05:47 AM
it goes away on G4 restart.

gbumgard
May 24th, 2006, 12:37 PM
Its looks like the form was not initialized by the plugin.
What shows up after G4 restart?
Does the plugin post any messages to the log?

quixote
May 29th, 2006, 10:28 PM
If you need another system to test on, I'm here too.
I got the same message about MSVCP71D.dll not being found, but I assumed that it was because I didn't have any recognition software installed. Needless to say, installing it made no difference.
I'll be following the thread.

gbumgard
May 30th, 2006, 01:02 AM
My copy of that library was installed with the Visual Studio C++ (MSVC). I've found that one of my systems doesn't have MSVC installed on it, so I may be able to track down, and hopefully eliminate the dependencies on the newer runtime libraries.

I may need to create an installer that installs those DLLs along with the plugin files.

I will look at this sometime within the next couple of days.

Thanks for the feedback!

gbumgard
June 1st, 2006, 05:58 PM
To solve the problem with library dependencies, I have created a windows installer for the plugin. This installer will install the redistributable MSVC DLLs that the plugin requires, but are missing on many systems.

The installer package can be downloaded from:

http://www.avepro.com/SpeechPlugin/SpeechPlugin.2006.06.01.zip

The manual can be downloaded from:

http://www.avepro.com/SpeechPlugin/SpeechPluginManual.2006.06.01.zip

The manual can be browsed online at:

http://www.avepro.com/SpeechPlugin/manual.html

There is a known, but non-fatal error that occurs when the plugin is first enabled. See the README for details.

YOU MUST HAVE A SPEECH RECOGNITION ENGINE INSTALLED TO USE THIS PLUGIN. See the README for information on downloading the Microsoft engine.
(UPDATE - You will likely need to do some training of your engine before using it with the demo. Use the Speech Control Panel to access the training wizard)

Please let me know if you can or cannot successfully install the plugin. I would appreciate any feedback.

Thanks!

-g.b.

barca0
June 2nd, 2006, 08:41 AM
hi gbumgard,

wow, what an excellent plugin!! I use Dragon with Natlink and Vocola to control a mediacenter and I wrote girder scripts to dynamically change and extend grammars. It works pretty good for now, but since I am not a programmer, I have a feeling it could fall apart any minute :-)

I installed you plugin and it seems to work fine. I had to use the MS engine because it did not find the Dragon profiles. Anyways, I will keep on playing with it and post some feedback here. Bearing in mind that Nuance is now more enterprise oriented and Vista SR appearently comes close to Dragon, I might eventually switch.

By the way, do you know a more comfortable grammar editor instead of writing directly the xml files? That would help a lot.

so long

barca

gbumgard
June 2nd, 2006, 12:01 PM
(UPDATED token discussion and Vista note)

A couple of years ago I started working on a VoiceXML grammar editor for use in the Open Source Java-based Eclipse IDE framework. I never finished this effort, and I have abandoned support VoiceXML at the moment. An editor would be nice, but I'm not sure I could find the time to create one. I will give it some thought.

As for the Dragon profiles - I'll have to look into it. To appear in the plugin, a recognition profile must currently be defined in SPCAT_RECOPROFILES category as recognized by SAPI. These tokens are simply registry entries. I used some SAPI helper functions to enumerate over the list of tokens. These functions only enumerate categories list under the HKEY_LOCAL_MACHINE\Software\Microsoft\Speech registry key. The Dragon profiles may not be listed there. If you are interested in the details, take a look at this document:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/html/ResourceManager_topnode.asp

If you feel comfortable doing so, you might run regedt32 to inspect the registry tree to see where (or if) Dragon keeps its profile tokens. If you do, let me know what you discover.

I'm not sure how the plugin will work under Vista. The plugin was written using the SAPI v5.1 API. Vista provides a new and improved SAPI API (v5.3). I assume Vista will provide backwards compatibility, but some functions may be deprecated and there may be additional functions/features available that could be made available through the plugin. I won't have access to Vista until it is released, so I will have to rely on feedback from beta users until that time.

-g.b.

barca0
June 2nd, 2006, 04:38 PM
...just found out that Dragon Speaking (and I guess also ViaVoice) use SAPI 4. Thatīs why you donīt find them. But if you donīt do dictation the girder/sapi5 combination seems to have advantages anyways...

quixote
June 2nd, 2006, 04:55 PM
...just found out that Dragon Speaking (and I guess also ViaVoice) use SAPI 4. Thatīs why you donīt find them. But if you donīt do dictation the girder/sapi5 combination seems to have advantages anyways...

What kind of advantages? I was thinking of somehow adding a command that would pass on what I say to another program if it is not recognized as a Girder command. The other program is a natural language processor, so I would be able to teach it how to respond appropriately and even do useful things like tell me how to make drinks at my bar, should I request such info.

gbumgard
June 3rd, 2006, 01:07 AM
The Microsoft engine and the plugin both support dictation.

Lua code could be used to test dictation phrases for recognizable patterns. Regular expression evaluation could be used to test for key word patterns within a more complex phrase.

Dictation can be used to supply names for searching within a music or contacts database. Exact matches would be hard to obtain, especially for unusual names (e.g. albums). I think some sort of girder indexing application could be constructed that dynamically lists the best matches for each recognizable word from which the speaker could select the desired match. The dictation topic can also be changed to "spelling" to allow the speaker to spell out a name that is difficult to pronounce or is not recognized by the engine. Use of a soundex or other relaxed matching algorithms may be required for general purpose indexing.

-g.b.

gbumgard
June 3rd, 2006, 02:47 PM
I have discovered that the plugin will not initialize properly immediately after the Microsoft SR engine is installed. The plugin reports an error because it can't find a token value for the default user profile or audio input device. This appears to occur because the Microsoft Speech Engine installer did not properly initialize the registry.

I was able to get the plugin to initialize properly by opening the Speech Control Panel and running the mic training wizard. It may be as simple as just opening and closing the control panel. I'm not sure.

...

Load Command Grammar action in the SpeechRecognitionDemo.gml file specifies the wrong location for the demo grammar file. It should look under a directory called UserData not GML. This is where the plugin installer has placed the file. The log will indicate whether or not the file was found. I will fix this in the next installer build.

The speech recognition plugin will log most unexpected errors. It is probably a good idea to keep the log window open when first setting up the plugin.
...

I seem to be having quite a bit of difficulty transferring GML files between Girder installations. Action forms that work on one machine appear blank or are missing stored values when copied to another machine. This seems to occur even though I'm running the same version of Girder on both machines. No clue as to why this is happening.

If this happens you may find some action and conditions are not initialized with the values I intended. This was especially true for the rule and property conditionals that appear in the demo GML. These nodes may need to be edited to ensure that a recognizer, context, and grammar is specified (or "any" is checked).

...

Found another error in the GML file. The trigger event action in the translate event macro specifies pld5 when it should be pld6, which contains the value associated with the eventstring property in the grammar file.

...

NOTE: Links to the plugin installer package appear up-thread.

gbumgard
June 3rd, 2006, 04:22 PM
The plugin itself has not changed.

I've fixed some errors in the demo GML file and the manual.

- The demo GML file has been changed to fix some nodes.
- The manual was changed to correct errors in the RuleRecognized and PropertyChanged event descriptions.

The installer package can be downloaded from:

http://www.avepro.com/SpeechPlugin/SpeechPlugin.2006.06.03.00.zip

The manual can be downloaded from:

http://www.avepro.com/SpeechPlugin/SpeechPluginManual.2006.06.03.zip

The manual can be browsed online at:

http://www.avepro.com/SpeechPlugin/manual.html

Thanks!

-g.b.

PooC-Maw
June 7th, 2006, 08:17 AM
Hi,

I've installed the plugin yesterday, messed a bit with the settings, modified the grammar.xml file to my needs, and what can I say, it works :)

I already wrote 20 or so scripts to perform all kinds of functions...

Very impressive!

Promixis
August 27th, 2006, 12:06 PM
hi g.b.

wondering if you have seen

http://www.way2call.com/

and if we could use your plugin with a device like this and a phone.

gbumgard
August 28th, 2006, 01:39 PM
I would need to add TAPI stream support.

I had planned to work on a TAPI plugin for Girder but have not gotten around to it. I was playing around with using a VOIP client on a PocketPC to provide voice input to the SAPI plugin (instead of a Mic on the Girder PC). This was a few years ago.

I think a TAPI plugin would be quite useful as a modem/phone could be used to provide input to the SAPI plugin as well as touch-tone digit events. You could control your system by calling your computer (whether by phone or VOIP). Very cool.

I'm not feeling very motivated to work on a TAPI plugin while the weather is nice, but come fall/winter I'm sure I will have more time to look into it.

-g.b.

Promixis
August 29th, 2006, 12:56 PM
Thanks for the update. I sent you a PM.

Promixis
August 29th, 2006, 01:07 PM
Quick question, out of the sapi sdk, are there files we should include so that voice can work without requiring everyone to download the enitre sdk?

gbumgard
August 29th, 2006, 01:37 PM
Yes, Microsoft allows one to install individual components of the SAPI SDK. See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/WhitePapers/WP_Setup_Whitepaper_51.asp

The required components are distributed as MSM (merge) files. I can probably install these using the plugin installer. I'm not sure how big these files are, but they might substantially increase the size of the installer .exe. I should probably have done this in the first place, but requiring a separate SDK download was easier for me.

-g.b.

mhwlng
August 31st, 2006, 11:31 AM
you can also get the latest speech recognition engine (6.1) from the office 2003 cd.
Click on Custom Install
Click on Choose advanced customization of applications
Office Shared Features -> Alternative User Input -> Speech

Promixis
August 31st, 2006, 05:49 PM
Yes, Microsoft allows one to install individual components of the SAPI SDK. See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/WhitePapers/WP_Setup_Whitepaper_51.asp

The required components are distributed as MSM (merge) files. I can probably install these using the plugin installer. I'm not sure how big these files are, but they might substantially increase the size of the installer .exe. I should probably have done this in the first place, but requiring a separate SDK download was easier for me.

-g.b.

Likely easier for the user :D -> like me :D

We can host the file so that the bandwidth is not a problem for you.

gbumgard
October 10th, 2006, 12:41 PM
Yeah, you are right, the example is pretty simplistic. I will probably add
more to the example later this fall (too busy now). Look at the lua actions
that are fired by the channel change grammar. You could easily translate the
property/payload values into IR events (instead of displaying the results).

There will always be a trade-off between managing state in the GML or
managing state in the Grammar. I would try to define top-level rules that
define a phrase hierarchy that can be used to differentiate between
individual commands rather than use the GML to constantly disable and enable
rule states. You can also have multiple top-level rules active at the same
time. There will always be some trade-off between using the GML or the
grammar to track the state changes that lead to a specific command.

And yes, you will probably need to mute your audio while speaking commands.
The example uses the trigger phrase "[Sam/Mary] can you hear me?" to trigger
GML actions that generate mute command(s) and enable the top-level rule used
to parse speech commands. You will need to choose a trigger phrase that
contains words that have high recognition probability and is of sufficient
length so as to prevent false positives. If, for example, you want to
address your system by name, use names or words with multiple
syllables/phonemes, e.g. a name like "Veronica". Once triggered, your GML
should include an action that enables your command rules. The active rule
set should also include a command to disable your command rules, reenable
the trigger phrase rule and unmute your audio; the example uses "thank you
[Sam/Mary]" to accomplish this. Even if you get an occasional false positive
on the trigger phrase, you can simply recite the phrase used to end speech
command input to restore things to where they were (unmuted).

And yes, it would be beneficial to all if you use the forums so everybody
can see this stuff. When you post to a thread for which I have registered an
interest, I will receive an e-mail. If you start a new thread, send me a
private message identifying the thread so I can start monitoring it...

Good luck!

-g.b.

On 10/10/06, Jim Nickel <jim@digitaldigs.ca> wrote:
>
> Thanks,
>
> I will post some stuff in the forums.
>
> You have some excellent documentation, but I am still confused about the
> grammar.xml file as you don't seem to have any examples of commands that
> get activated by voice.
>
> For example, you have a channel changing piece, but there is nothing in
> the GML that corresponds to it, so I am unsure how to add additional ones.
>
> I think I may have set one up correctly, but my method looks kludgy. I
> replaced the word Commands, with my command and then enabled it. However, I
> doubt you intended us to enable/disable every command individually....but I
> don't understand how to create a sub-command so that I can make a Commands
> section and then create KitchenLightsOn or KitchenLightsOff underneath it.
>
> I am also having a problem where the response from the computer is being
> recognized as the command phrase and is activating the command again
> resulting in a endless loop. Obviously, I need to disable recognition while
> the computer is speaking and then enable it again after. I did try to do
> this, without complete success. If you had any suggestions on this part
> too, I would appreciate it.
>
> I can definitely post this in the forum, so I apologize if it is wrong of
> me to email you directly.
>
> Jim
>
>
> At 11:16 PM 10/9/2006 -0700, Gregory Bumgard wrote:
>
> Cool! Be sure to post what you learn on the Promixis forums.
>
> If you are using the MS engine, you will probably need to train it before
> you will get satisfactory results. Enable the "continuous" training option
> in the Speech control panel. This will let you continue to train the engine
> while developing and testing a C&C grammar.
>
> Good Luck!
>
> -g.b.
>
> On 10/9/06, *Jim Nickel* <jim@digitaldigs.ca> wrote:
>
> Hiya!
>
> Thanks for writing this. I am going to give it a whirl and see how
> successfully I can integrate Crown PZM-11's into my Girder system for
> whole
> house control with my/my families voice.
>
> Just thought I would let you know to tell others not to try Dragon
> Naturally Speaking.
>
> As of version 9, it still isn't SAPI 5.1 compliant, only version 4.0a.
>
> See this post for reference:
>
>
> http://support.lhsl.com/databases/dragon/webdisc.nsf/2b03ac191573b70a85256b09006fc459/df424bbb15eb065a852571ee00356841?OpenDocument
>
> Thanks,
>
> Jim

blubberhoofd
December 30th, 2006, 05:58 AM
hi,

I've just tried to get the plugin to work, followed all instructions from the readme, but I ran into a problem upon enabeling the plug-in.

this is my logger output:
Time Date Source Details Payloads
12:51:10:531 12/30/2006 Girder Plugin Open failed: C:\Program Files\Promixis\Girder\plugins\Speech.dll
12:51:10:515 12/30/2006 Speech Initialization failed!
12:51:10:500 12/30/2006 Speech Cannot get default token: The system cannot find the file specified.
(0x80070002)


can you tell me what file(s) it is missing?

gbumgard
December 30th, 2006, 01:38 PM
hi,

I've just tried to get the plugin to work, followed all instructions from the readme, but I ran into a problem upon enabeling the plug-in.

this is my logger output:
Time Date Source Details Payloads
12:51:10:531 12/30/2006 Girder Plugin Open failed: C:\Program Files\Promixis\Girder\plugins\Speech.dll
12:51:10:515 12/30/2006 Speech Initialization failed!
12:51:10:500 12/30/2006 Speech Cannot get default token: The system cannot find the file specified.
(0x80070002)


can you tell me what file(s) it is missing?

I think there are three possibilities for this error:

1. You don't have a speech recognition engine selected.
2. You don't have a speech recognition training profile selected.
3. You don't have a default audio/microphone input source selected.

I would not think any of these situations would be possible, but because the error refers to a token, those are the three most likely possibilities. Tokens, which are really just registry entries, are used in Windows to identify various system resources, including speech recognition engines, training profiles, and audio devices.

Bring up the Speech control panel (from the Windows Control Panel). If you don't find one, you don't have an SR engine installed and will need to download the free one provided by Microsoft (see plugin docs). If you can bring up the panel, check the engine selection. You should also check that a training profile exists. If not, create one and perform at least one training session. The default profile should be acceptable (default profile refers to the one created for the current user).

The speech panel also lets you select the audio input source for speech recognition - verify that is set (the Audio Input... button). I usually select the default input defice (which is set in the Sounds and Audio devices Control Panel). It's also a good idea to bring up the microphone configuration dialog to automatically adjust the input gain for your mic. This dialog will give you an indication as to whether you have the correct input device selected.

Please let me know if any of these suggestions solves your problem.

Thanks!

-g.b.

blubberhoofd
December 30th, 2006, 04:23 PM
hi,

thanks for your answer, I'm a complete newbie in this field...
turns out that the settings I've made in the 'speech' part of the control panel weren't saved properly, re-configured and all is well.

will do some testing the next couple of days to see if I can integrate speech recognition in my setup.

I'm using one of those cheap generic microphones right now, which will be useless for the use in a domotics/home-automation role, so do you have any tips on what microphone hardware to use if you don't want to be holding your mike or wearing a headset?

btw: do you have any tips on how to change the language to my native dutch?

hope you can help ;)

gbumgard
December 30th, 2006, 05:38 PM
hi,

thanks for your answer, I'm a complete newbie in this field...
turns out that the settings I've made in the 'speech' part of the control panel weren't saved properly, re-configured and all is well.

will do some testing the next couple of days to see if I can integrate speech recognition in my setup.

I'm using one of those cheap generic microphones right now, which will be useless for the use in a domotics/home-automation role, so do you have any tips on what microphone hardware to use if you don't want to be holding your mike or wearing a headset?

btw: do you have any tips on how to change the language to my native dutch?

hope you can help ;)

Try googling for "array microphone" or "video conference microphone". What you want is a microphone that's designed for open-air environments. They are pretty expensive. I think I identify one such mic in the plugin docs.

As for supporting Dutch -- I don't believe Microsoft provides a Dutch speech recognizer engine. You would likely need a third-party engine (you want a speech recognition (SR) engine, not a text-to-speech (TTS) engine).

Microsoft lists a few third-party vendors on their site:

http://www.microsoft.com/speech/evaluation/thirdparty/engines.mspx

You should try googling for Dutch/SAPI/Speech Recognition to see what else might be available.

-g.b.

gbumgard
December 30th, 2006, 05:56 PM
I forgot about an alternative method for recognizing non-English phrases. The grammar file syntax allows you to specifiy a phrase rule using phonemes (the PRON XML attribute). You might try creating a phonetic implementation using the English language phoneme set:

http://msdn2.microsoft.com/en-us/library/ms717239.aspx

-g.b.

blubberhoofd
December 30th, 2006, 07:37 PM
hi,


thanks for your reply ;)

already found a good dutch text-to-speech engine with semi-natural sounding voices... finding a good speech recognizer seems more challenging.

will give 'Naturally speaking' a try tomorrow.

will have a look around for microphone hardware... it stikes me as odd that these should be expensive, since handsfree-kits mobile phones etc. incorporate them.

hope you can give me some pointers ;)