Asterisk Guru

AsteriskGuru Archives
Mailing List Archives

FAQ

Memberlist

Usergroups

Profile

[asterisk-speech-rec] Speech Recognition Problems

AsteriskGuru Archives Forum Index -> Asterisk-Speech-Rec

View previous topic :: View next topic

Author

Message

pbx.kumar at gmail.com
Guest

Posted: Wed May 14, 2008 7:04 pm Post subject: [asterisk-speech-rec] Speech Recognition Problems

Hello, Kudos to the asterisk community. I am trying to integrate nuance speech recognition and microsoft speech server with asterisk and was partly successful. The present solution records voice with silence detection set to 2 and then sends the recorded file to the speech servers with the voice grammar and then acts on the result. Everything is done in agi. We use cajo for load distribution. - the present solution has a big drawback. The users have to wait till the end of the prompt before they can start recording. They cannot speak in between and this is causing a problem. Is there a way to stop playback of prompt when the speech is detected from the callee end. - I see that speech api is available and connectors can be written but there is no proper documentation. Can we check how lumenvox has done the connector? Since its GPL licensed - I am assuming it should be shared. - backgrounddetect does stop during play but it jumps to talk extension. we want the entire speech to be recorded and leaving out the first fragment which triggered to jump to talk extension will not server the purpose in speech detection. further, since its agi , its not extension driven. Any pointers will be appreciated. Thanks.

jcolp at digium.com
Guest

Posted: Thu May 15, 2008 3:20 pm Post subject: [asterisk-speech-rec] Speech Recognition Problems

----- "praveen kumar" <pbx.kumar@gmail.com> wrote:

Quote:

Hello,

Kudos to the asterisk community.

Greetings and salutations.

Quote:

- the present solution has a big drawback. The users have to wait till
the end of the prompt before they can start recording. They cannot
speak in between and this is causing a problem. Is there a way to stop
playback of prompt when the speech is detected from the callee end.

There's nothing really built in to do speaker detection in this instance and stop it.

Quote:

- I see that speech api is available and connectors can be written but
there is no proper documentation. Can we check how lumenvox has done
the connector? Since its GPL licensed - I am assuming it should be
shared.

The Lumenvox connector is a binary module and is not under a GPL license. The source is therefore not available. As for documentation for the API it is correct there is no example connector module but the API is intuitive enough that you should be able to figure it out if you are a developer.

You create an ast_speech_engine structure with callbacks to everything that your engine can handle. Create/destroy callbacks exist for when a speech object is created and destroyed. Load/unload/activate/deactivate callbacks exist for grammars. The start callback is called when res_speech is going to start feeding audio into your engine. The write callback gets called with audio that be fed into your engine. The get callback is called when res_speech wants to get results of the code. It is up to your engine to set flags on the speech object to indicate various things. AST_SPEECH_QUIET signals that the person is speaking and AST_SPEECH_HAVE_RESULTS signals that your engine has results from the decode.

This is a rough view of things.

Quote:

- backgrounddetect does stop during play but it jumps to talk
extension. we want the entire speech to be recorded and leaving out
the first fragment which triggered to jump to talk extension will not
server the purpose in speech detection. further, since its agi , its
not extension driven.

If you want to approach it this way you will need to do some custom coding.

Joshua Colp
Software Developer
Digium, Inc.

_______________________________________________
--Bandwidth and Colocation Provided by http://www.api-digital.com--

asterisk-speech-rec mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-speech-rec

pbx.kumar at gmail.com
Guest

Posted: Mon May 19, 2008 12:40 pm Post subject: [asterisk-speech-rec] Speech Recognition Problems

Joshua,

Thanks for your insight.

I started looking at your app_speech_utils.c and wanted to use it as base code to customize as my first version of nuance decoder just takes a file and outputs the result.

After looking at the code and making first attempt, I then realized that its much easier to add the connector module rather than customizing speechbackground as it relies on so many things. Speech_create etc.

I made a dummy module which just prints the status of each callback. Here is the trace.

   -- Executing [3010@default:1] SpeechCreate("SIP/station1-08ce3998", "nuance") in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:94 create: Creating Speech Engine [May 19 18:27:17] WARNING[8638]: app_mc.c:96 create: nuance
    -- Executing [3010@default:2] SpeechActivateGrammar("SIP/station1-08ce3998", "company-directory") in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:137 activate: Activating Grammar Name [May 19 18:27:17] WARNING[8638]: app_mc.c:139 activate: nuance company-directory
    -- Executing [3010@default:3] SpeechStart("SIP/station1-08ce3998", "") in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:163 start: Starting Engine [May 19 18:27:17] WARNING[8638]: app_mc.c:165 start: nuance
    -- Executing [3010@default:4] SpeechBackground("SIP/station1-08ce3998", "AppointmentTomorrow") in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:163 start: Starting Engine [May 19 18:27:17] WARNING[8638]: app_mc.c:165 start: nuance
[May 19 18:27:17] WARNING[8638]: format_wav.c:156 check_header: Unexpected freqency 16000
[May 19 18:27:17] WARNING[8638]: file.c:316 fn_wrapper: Unable to open format wav
    -- Saved useragent "SJphone/1.65.377a (SJ Labs)" for peer station1
[May 19 18:28:00] WARNING[8638]: app_mc.c:105 destroy: Destroying Speech Engine [May 19 18:28:00] WARNING[8638]: app_mc.c:107 destroy: nuance
    --

I saw two problems and few questions. SpeechStart is called twice. It is also called from SpeechBackground. I am not sure why is it being called.

int start(struct ast_speech *speech)
{
         ast_log(LOG_WARNING, "Starting Engine ");
         if(speech != NULL && speech->engine != NULL )
            ast_log(LOG_WARNING, "%sn", speech->engine->name);

        return 0;
}

Write callback is never called. I see that in your speech_utils code that you write to engine only when AST_SPEECH_STATE_READY is set. When is the right time to set this? Should we set it in speechcreate? Which function call is the right way? I cannot use change callback as it will in turn call me again.

I thought the internal engine should call this when it detects that the caller has started talking and then does it automatically.

if (ast_test_flag(speech, AST_SPEECH_QUIET)) ;; Who sets this flag? Does your underlying dsp engine in asterisk processes and sets it or is the connector module incharge to detect using its own algorithm and then set it? I hope its the dsp engine. The reason I ask you is because the speech didn't stop in my case at all.

I'm getting there. Thanks Joshua.

On Thu, May 15, 2008 at 9:44 PM, Joshua Colp <jcolp@digium.com (jcolp@digium.com)> wrote:

Quote:

----- "praveen kumar" <pbx.kumar@gmail.com (pbx.kumar@gmail.com)> wrote:

> Hello,
>
> Kudos to the asterisk community.
>

Greetings and salutations.

> - the present solution has a big drawback. The users have to wait till
> the end of the prompt before they can start recording. They cannot
> speak in between and this is causing a problem. Is there a way to stop
> playback of prompt when the speech is detected from the callee end.
>

There's nothing really built in to do speaker detection in this instance and stop it.

> - I see that speech api is available and connectors can be written but
> there is no proper documentation. Can we check how lumenvox has done
> the connector? Since its GPL licensed - I am assuming it should be
> shared.

The Lumenvox connector is a binary module and is not under a GPL license. The source is therefore not available. As for documentation for the API it is correct there is no example connector module but the API is intuitive enough that you should be able to figure it out if you are a developer.

You create an ast_speech_engine structure with callbacks to everything that your engine can handle. Create/destroy callbacks exist for when a speech object is created and destroyed. Load/unload/activate/deactivate callbacks exist for grammars. The start callback is called when res_speech is going to start feeding audio into your engine. The write callback gets called with audio that be fed into your engine. The get callback is called when res_speech wants to get results of the code. It is up to your engine to set flags on the speech object to indicate various things. AST_SPEECH_QUIET signals that the person is speaking and AST_SPEECH_HAVE_RESULTS signals that your engine has results from the decode.

This is a rough view of things.

> - backgrounddetect does stop during play but it jumps to talk
> extension. we want the entire speech to be recorded and leaving out
> the first fragment which triggered to jump to talk extension will not
> server the purpose in speech detection. further, since its agi , its
> not extension driven.
>

If you want to approach it this way you will need to do some custom coding.

Joshua Colp
Software Developer
Digium, Inc.

_______________________________________________
--Bandwidth and Colocation Provided by http://www.api-digital.com--

asterisk-speech-rec mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-speech-rec

Display posts from previous:

AsteriskGuru Archives Forum Index -> Asterisk-Speech-Rec	All times are GMT
Page 1 of 1

You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum