I am developing a telephony application using Asterisk, and I need assistance with a specific scenario. I want to create an IVR-like experience where a pre-recorded message is played to the user, and the recording stops as soon as the user starts speaking. I then aim to capture the user’s speech input, process it through an API, and retrieve a response.
Here’s a high-level overview of what I want to achieve:
Play a pre-recorded message to the user.
Dynamically stop the recording or pre-recorded message as soon as the user starts speaking.
Record the user’s speech input.
Pass the recorded speech to an external API for processing.
Retrieve and handle the API response.
How can I achieve this? Is it achievable using dialplan, or if not, maybe programming or async techniques using AGI?
Also, architecturally, is this approach good or are there any issues with it? How can I build a robust IVR system speech-to-text system.
This task can be accomplished using the BackgroundDetect() application. Afterward, direct the user to the record() application, and then pass the recorded file to your application for further processing. For this last part, you can use the system() and Linux curl command in the dial plan, assuming you want to accomplish everything using pure dial plan.
To be honest, this is a very rudimentary way to achieve this task, valid before the launch of ARI. I believe a more sophisticated approach can be taken using ARI. However, I don’t use ARI very often.