Posted: Wed May 14, 2008 7:04 pm Post subject: [asterisk-speech-rec] Speech Recognition Problems
Hello,
Kudos to the asterisk community.
I am trying to integrate nuance speech recognition and microsoft speech server with asterisk and was partly successful. The present solution records voice with silence detection set to 2 and then sends the recorded file to the speech servers with the voice grammar and then acts on the result. Everything is done in agi. We use cajo for load distribution.
- the present solution has a big drawback. The users have to wait till the end of the prompt before they can start recording. They cannot speak in between and this is causing a problem. Is there a way to stop playback of prompt when the speech is detected from the callee end.
- I see that speech api is available and connectors can be written but there is no proper documentation. Can we check how lumenvox has done the connector? Since its GPL licensed - I am assuming it should be shared.
- backgrounddetect does stop during play but it jumps to talk extension. we want the entire speech to be recorded and leaving out the first fragment which triggered to jump to talk extension will not server the purpose in speech detection. further, since its agi , its not extension driven.
- the present solution has a big drawback. The users have to wait till
the end of the prompt before they can start recording. They cannot
speak in between and this is causing a problem. Is there a way to stop
playback of prompt when the speech is detected from the callee end.
There's nothing really built in to do speaker detection in this instance and stop it.
Quote:
- I see that speech api is available and connectors can be written but
there is no proper documentation. Can we check how lumenvox has done
the connector? Since its GPL licensed - I am assuming it should be
shared.
The Lumenvox connector is a binary module and is not under a GPL license. The source is therefore not available. As for documentation for the API it is correct there is no example connector module but the API is intuitive enough that you should be able to figure it out if you are a developer.
You create an ast_speech_engine structure with callbacks to everything that your engine can handle. Create/destroy callbacks exist for when a speech object is created and destroyed. Load/unload/activate/deactivate callbacks exist for grammars. The start callback is called when res_speech is going to start feeding audio into your engine. The write callback gets called with audio that be fed into your engine. The get callback is called when res_speech wants to get results of the code. It is up to your engine to set flags on the speech object to indicate various things. AST_SPEECH_QUIET signals that the person is speaking and AST_SPEECH_HAVE_RESULTS signals that your engine has results from the decode.
This is a rough view of things.
Quote:
- backgrounddetect does stop during play but it jumps to talk
extension. we want the entire speech to be recorded and leaving out
the first fragment which triggered to jump to talk extension will not
server the purpose in speech detection. further, since its agi , its
not extension driven.
If you want to approach it this way you will need to do some custom coding.
Joshua Colp
Software Developer
Digium, Inc.
_______________________________________________
--Bandwidth and Colocation Provided by http://www.api-digital.com--
Posted: Mon May 19, 2008 12:40 pm Post subject: [asterisk-speech-rec] Speech Recognition Problems
Joshua,
Thanks for your insight.
I started looking at your app_speech_utils.c and wanted to use it as base code to customize as my first version of nuance decoder just takes a file and outputs the result.
After looking at the code and making first attempt, I then realized that its much easier to add the connector module rather than customizing speechbackground as it relies on so many things. Speech_create etc.
I made a dummy module which just prints the status of each callback. Here is the trace.
Write callback is never called. I see that in your speech_utils code that you write to engine only when AST_SPEECH_STATE_READY is set. When is the right time to set this? Should we set it in speechcreate? Which function call is the right way? I cannot use change callback as it will in turn call me again.
I thought the internal engine should call this when it detects that the caller has started talking and then does it automatically.
if (ast_test_flag(speech, AST_SPEECH_QUIET)) ;; Who sets this flag? Does your underlying dsp engine in asterisk processes and sets it or is the connector module incharge to detect using its own algorithm and then set it? I hope its the dsp engine. The reason I ask you is because the speech didn't stop in my case at all.
I'm getting there. Thanks Joshua.
On Thu, May 15, 2008 at 9:44 PM, Joshua Colp <jcolp@digium.com (jcolp@digium.com)> wrote:
> - the present solution has a big drawback. The users have to wait till
> the end of the prompt before they can start recording. They cannot
> speak in between and this is causing a problem. Is there a way to stop
> playback of prompt when the speech is detected from the callee end.
>
There's nothing really built in to do speaker detection in this instance and stop it.
> - I see that speech api is available and connectors can be written but
> there is no proper documentation. Can we check how lumenvox has done
> the connector? Since its GPL licensed - I am assuming it should be
> shared.
The Lumenvox connector is a binary module and is not under a GPL license. The source is therefore not available. As for documentation for the API it is correct there is no example connector module but the API is intuitive enough that you should be able to figure it out if you are a developer.
You create an ast_speech_engine structure with callbacks to everything that your engine can handle. Create/destroy callbacks exist for when a speech object is created and destroyed. Load/unload/activate/deactivate callbacks exist for grammars. The start callback is called when res_speech is going to start feeding audio into your engine. The write callback gets called with audio that be fed into your engine. The get callback is called when res_speech wants to get results of the code. It is up to your engine to set flags on the speech object to indicate various things. AST_SPEECH_QUIET signals that the person is speaking and AST_SPEECH_HAVE_RESULTS signals that your engine has results from the decode.
This is a rough view of things.
> - backgrounddetect does stop during play but it jumps to talk
> extension. we want the entire speech to be recorded and leaving out
> the first fragment which triggered to jump to talk extension will not
> server the purpose in speech detection. further, since its agi , its
> not extension driven.
>
If you want to approach it this way you will need to do some custom coding.
Joshua Colp
Software Developer
Digium, Inc.
_______________________________________________
--Bandwidth and Colocation Provided by http://www.api-digital.com--
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum