I’m currently using AEAP to handle speech processing for Asterisk, which includes speech-to-text (STT) and text-to-speech (TTS) conversion for Asterisk playback. The setup works well, but I’d like to enhance it to create more seamless voice AI conversations with users and reduce latency.
I’ve come across OpenAI’s Realtime API, which seems promising for low-latency interactions (OpenAI Realtime API). My goal is to stream audio from Asterisk to OpenAI (using AEAP), receive the audio response in real-time, and then immediately play that streamed audio response to the user—all without saving it as a file.
Question:
How can Asterisk play the audio response stream directly from the Realtime API without saving it as a file?
I would be grateful for any insights, advice, or implementation examples that could help me achieve this real-time streaming solution!
You can’t easily simultaneously use AEAP and also do this. Trying to do that is going beyond what it was intended to do and what the dialplan functionality for it was intended to do. I’m a broken record, but external media in ARI provides a bidirectional realtime RTP stream that you can do whatever you want with. Stream audio out of Asterisk, stream audio in, attach it to a snoop channel and do live transcription, stick a caller and the external media channel in a bridge and do some kind of conversational thing, up to you. When the external media channel is created, the target for RTP for the ARI application is provided.