I’m working on a voice assistant system using Asterisk with ARI and OpenAI’s realtime API. I need help with the audio capture part.
Current situation:
Setup:
Dialplan handling calls to extension 680
Using ARI to create a bridge with two channels:
Main channel (from PJSIP call)
External-media channel for RTP streaming
astervoice*CLI> bridge show all
Bridge-ID Chans Type Technology Duration
028ae525-f4c1-4fca-bea9-f9ac74d08861 2 stasis simple_bridge 00:00:09
Got RTP packet from 10.7.1.3:4090 (type 96, seq 012064, ts 156160, len 000640)
Dialplan:
[from-internal]
exten => 680,1,NoOp(Starting AI Voice Assistant)
same => n,Answer()
same => n,Stasis(voicebot)
same => n,Hangup()
[stream-audio]
exten => s,1,NoOp(Starting external media stream)
same => n,Set(JITTERBUFFER(adaptive)=default)
same => n,Set(AUDIO_BUFFER_POLICY=strict)
same => n,Set(AUDIO_BUFFER_SIZE=128)
same => n,Set(RTP_PORT=${CHANNEL(rtpport)})
same => n,ExternalMedia(rtp,10.7.1.2:${RTP_PORT}/${MATH(${RTP_PORT} + 1)},slin16)
same => n,Hangup()
The log file:
Call Initialization and External Media Channel Creation:
{"level":"info","message":"New call from 680 to extension 680","timestamp":"2024-12-09T16:10:50.124Z"}
{"level":"info","message":"Starting voicebot handler for channel 1733760650.4","timestamp":"2024-12-09T16:10:50.125Z"}
{"level":"info","message":"Created external media channel external_1733760650.4","timestamp":"2024-12-09T16:10:50.138Z"}
A new call is initiated from extension 680. The system starts a voicebot handler for the channel 1733760650.4.
An external media channel external_1733760650.4 is created, which is crucial for handling audio streams between Asterisk and external systems like OpenAI.
Bridge Creation and Channel Addition:
{"level":"info","message":"Created mixing bridge 7028ae525-f4c1-4fca-bea9-f9ac74d08861","timestamp":"2024-12-09T16:10:50.146Z"}
{"level":"info","message":"Added channels to bridge","timestamp":"2024-12-09T16:10:50.659Z"}
A mixing bridge with ID 028ae525-f4c1-4fca-bea9-f9ac74d08861 is created. This bridge is used to mix audio streams from different channels.
Channels are added to this bridge, allowing for the integration of audio from the external media channel and other sources.
Connection to OpenAI Realtime API:
{"level":"info","message":"Connecting to OpenAI Realtime API...","timestamp":"2024-12-09T16:10:50.660Z"}
{"level":"info","message":"WebSocket connection established","timestamp":"2024-12-09T16:10:51.487Z"}
The system connects to the OpenAI Realtime API, establishing a WebSocket connection for real-time communication.
Session Initialization and Welcome Message:
The session is initialized with specific configurations, including audio format and instructions for the virtual assistant.
A welcome message is generated and received. This confirms that the system is correctly receiving and processing the welcome audio from OpenAI.
The system starts playback of the audio file chunk_1733760652759.wav, which contains the welcome message.
Conclusion
The logs indicate that the system is correctly handling the audio streams between Asterisk and OpenAI. The external media channel and mixing bridge are set up properly, and the welcome message from OpenAI is received and played back successfully. This confirms that the audio capture and transmission to OpenAI are functioning as expected.
Issue:
I’m trying to capture the caller’s audio for debugging purposes and send it to may node.js app. The bridge is created successfully and the channels are connected, but I’m not receiving the caller’s audio in my application.
You should be able to take a packet capture and verify that the rtp stream to the external media port is set up correctly and contains the expected audio.
I’m assuming your application is set up to receive the rtp stream to proxy to OpenAI?
The caller’s audio coming from ExternalMedia() is actually an RTP stream, you are not mentioning how you are trying to send it to the nodejs app, but I can tell you the nodejs app would have to be listening in the same server as Asterisk. I mean, you could send it to a separate server but keep in mind you can’t send pure RTP straight through a TCP pipe unless it is wrapped by an additional layer.
I’m working with a SoftMix bridge in Asterisk and have configured it to use external media. While the bridge itself seems to be functioning
astervoice*CLI> bridge show all
Bridge-ID Chans Type Technology Duration
voicebot_1733943994.0 3 stasis softmix 00:00:05
I’m struggling to capture the caller’s audio. My goal is to forward the caller’s audio to the external media endpoint to my node app.
Questions:
How can I configure Asterisk to properly send the caller’s audio to the external media in this setup?
Are there specific bridge settings, channel options, or external media parameters I might be overlooking?
So you’re not using external media as provided by ARI, you’re using a Local channel instead. From the given code it doesn’t appear you are calling “dial” on the created channel if “create” is actually using the create ARI route, so I would not expect anything to happen until that is done.