Speech recognition in ARI

civicharles · January 9, 2019, 1:35pm

We’re trying to implement speech recognition with barge-in in an ARI stasis application.
The speech dial plan can’t be accessed from ARI, but there must be a way to implement them.
Is anyone working on this?

jcolp · January 9, 2019, 1:37pm

It’s been brought up a few times but I know of noone working on such a thing.

civicharles · January 9, 2019, 2:26pm

So, then the question is, what would it take to:

add additional functionality to ARI?
through that functionality access the speech_background dialplan app, or replicate that code in a similar fashion to be called from an ARI method?

The code for channel actions is in res_ari_channels.c, and the code for speech_background is in app_speech_utils.c

Something simple like Mute, is easy to trace and understand, as it goes from res_ari_channels.c to resource_channels.c to control.c where it finally calls ast_channel_suppress

If this is the case, an analysis of playback is only slightly more difficult to get to the point of calling the lower-level ast_… methods.

Based on that, what then would it take to make speech_background work within the same context? And then to add an additional ARI method to access it?

jcolp · January 9, 2019, 2:31pm

ARI doesn’t execute applications or anything like that, doing so is rather undefined due to the underlying ownership and control interaction.

For ARI you update some JSON[1] (not specifically for this since it would be in regards to the channel, but you still update the channel JSON with new methods) and generate code. The code then calls functions which you have to implement. It would need to implement the same kind of stuff that app_speech_utils does.

Before doing any of this though the actual API as presented in ARI would need to be defined and explored to make sure it fits and that people are happy with it.

[1] https://wiki.asterisk.org/wiki/display/AST/Create+a+new+resource+with+ARI

civicharles · January 9, 2019, 2:51pm

It doesn’t seem insurmountable to replicate the code that already exists in speech_utils in such a way that it could be accessed from a new ARI resource.
And, a version one of an API doesn’t have to make everyone happy, it just needs to work.

Many people are looking for this functionality, but no one is working on it. It’s open source and this is a forum, this is the perfect space to work this out.

jcolp · January 9, 2019, 3:13pm

The developer community and the people who have been interested in this don’t hang out here, so using this wouldn’t yield much feedback. There exists a mailing list which people in that area are on[1]. And as for a version one of the API - it should still be fairly defined and agreed upon. We generally don’t include things in the tree which are in flux because changing it subsequently in any significant way is problematic, and you have to remain backwards compatible. We can’t impact the users of it. It would also need to be fairly defined to be eligible for release branches (13 or 16 for example).

[1] http://lists.digium.com/cgi-bin/mailman/listinfo/asterisk-app-dev

Topic		Replies	Views
Using Dialplan Functions through ARI Asterisk APIs	2	338	July 22, 2023
AEAP and Speech to text in the dialplan Asterisk APIs	7	827	May 7, 2023
Howto implement speech recognition barge-in with ARI Asterisk APIs	17	3984	January 4, 2019
Listen, Whisper, Barge through REST API Asterisk APIs	19	4226	June 29, 2017
When caller says sound disappears Asterisk APIs	14	462	August 27, 2021

Speech recognition in ARI

Related topics