Hi. We are reaching out to seek guidance on how to properly separate audio streams in Asterisk in order to achieve our specific use case. Below, we will outline our goals, current implementation, challenges, and the specific assistance we require.
General Goal
Our goal is to integrate Asterisk with OpenAI’s real-time API to create an AI-powered voicebot that interacts with users over the phone. The core functionality involves:
- Receiving audio from the caller and sending it to OpenAI for processing.
- Receiving audio responses from OpenAI and sending them to the caller.
- Recording audio streams in separate files:
- One file containing only the audio sent by OpenAI (AI-to-caller audio).
- Another file containing only the audio from the caller (caller-to-AI audio).
It is crucial that the caller’s audio and OpenAI’s audio remain completely isolated to prevent OpenAI from hearing its own audio (feedback loop) while still providing the caller with the generated responses in real time.
Specific Implementation Goals
- Separate the caller’s audio and OpenAI’s audio into two distinct streams.
- Only send the caller’s audio to OpenAI.
- Play back only OpenAI’s audio to the caller.
- Use Asterisk’s
externalMedia
functionality to handle OpenAI’s API integration and RTP streams effectively. - Record the two audio streams in separate files:
- Caller-to-AI stream (
external_bridge_*
). - AI-to-caller stream (
user_bridge_*
).
Current Implementation
We currently use two holding bridges in Asterisk to manage the separation of audio streams:
- User Bridge (
user_bridge
):
- Contains the caller’s channel (
PJSIP/680
). - Plays the audio received from OpenAI to the caller.
- Records OpenAI’s responses to the caller in a separate file (
user_bridge_*.wav
).
- External Bridge (
external_bridge
):
- Contains the external media channel (
UnicastRTP
) connected to OpenAI. - Supposed to record only the audio from the caller to send to OpenAI.
The RTP stream from the caller is routed to the external_bridge
, and the audio responses from OpenAI are routed back to the user_bridge
. However, while the OpenAI-to-caller recording works as expected, the caller-to-AI recording is empty (0 seconds). This indicates that the external_bridge
is not correctly receiving or handling the caller’s audio.
Challenges
- Empty Audio Recording in
external_bridge
:
The file generated for theexternal_bridge
is always empty (0 seconds), even though the caller’s channel (PJSIP/680
) is added to theexternal_bridge
along with theexternalMedia
channel. We suspect that the caller’s audio is not being routed correctly to theexternal_bridge
. - Isolating Streams:
We need to ensure that:
- The
external_bridge
receives only the caller’s audio. - The
user_bridge
plays only OpenAI’s audio to the caller. - Neither bridge mixes the two streams.
- Bidirectional RTP Synchronization:
externalMedia
uses a dedicated RTP port to handle communication with OpenAI. While we see RTP packets flowing correctly between OpenAI and Asterisk, we are unsure if the audio stream from the caller is being routed correctly to this channel for processing.
- Avoiding Feedback Loops:
Currently, OpenAI could potentially hear its own audio responses if the streams are not isolated correctly. This would disrupt the voicebot’s behavior and produce undesired results.
Specific Questions
- How can we ensure the caller’s audio is correctly routed to the
external_bridge
for both processing by OpenAI and recording?
- Are there specific configurations required for the
externalMedia
channel to capture the caller’s audio properly? - Is it necessary to configure
externalMedia
differently when used with holding bridges?
- Is our approach of using two separate holding bridges (
user_bridge
andexternal_bridge
) appropriate for separating the streams?
- If not, what would be the recommended way to isolate audio streams for our use case?
- Is there an alternative to using holding bridges for isolating audio streams while maintaining real-time playback and recording?
- How can we debug or confirm the audio routing between the caller channel (
PJSIP/680
),externalMedia
, and the bridges?
- Are there specific tools or logs we can enable to verify where the audio stream is being dropped?
- Would enabling
rtp set debug on
or other debugging methods help in this scenario?
- What is the best way to configure the
record()
function for bridges to ensure that the correct audio streams are captured?
- For example, we currently use the following options:
options: 'b(IN)b(OUT)'
. Is this correct for recording only the audio received or transmitted by a bridge?
- Is there a better way to manage the externalMedia channel and its integration with bridges for real-time processing by OpenAI?
- For example, should we consider an alternative architecture or bridge type (e.g.,
mixing
) for more control over audio streams?
Logs and Observations
Here are some key observations from our logs:
- RTP packets from the caller are received successfully:
python
Copia
Got RTP from 10.7.1.2:4084 (type 96, seq 005518, ts 166400, len 000640)
- RTP packets are sent to OpenAI via the externalMedia channel:
bash
Copia
Sent RTP to 127.0.0.1:12050 (type 118, seq 064810, ts 166400, len 000640)
- OpenAI’s audio responses are played correctly to the caller:
csharp
Copia
Playing response on user bridge: sound:chunk_123456
- The file for the caller-to-AI stream (
external_bridge_*
) is always empty:
bash
Copia
/var/spool/asterisk/recording/external_bridge_*.wav (0 seconds)
Despite receiving RTP packets from the caller, the audio is not being recorded or processed correctly in the external_bridge
.
Expected Outcome
We need to achieve the following:
- Correctly route the caller’s audio to the
external_bridge
for processing and recording. - Ensure that the caller’s audio stream and OpenAI’s audio stream are isolated and do not mix.
- Maintain real-time playback for the caller, with OpenAI’s responses being audible as expected.
- Generate two separate audio files:
- One containing the caller’s audio only.
- One containing OpenAI’s responses only.
Thank you for taking the time to review this request. Your assistance in helping us achieve this setup would be greatly appreciated. If you need additional details, such as code snippets or configuration files, we will be happy to provide them.
async start() {
try {
logger.info(Starting voicebot handler for channel ${this.channel.id}
);
this.isActive = true;
// Ensure the directory for storing audio files exists
await this.ensureAudioDir();
// Skip initialization if this is an external media channel
if (this.isExternalMediaChannel) {
logger.info('External media channel detected, skipping initialization');
return;
}
// Answer the incoming channel
await this.channel.answer();
logger.info(`Channel ${this.channel.id} answered`);
// Create the first holding bridge for the caller
this.userBridge = await this.ari.bridges.create({
type: 'holding', // Use a holding bridge
name: `user_bridge_${this.channel.id}`,
bridgeId: `user_bridge_${this.channel.id}`
});
logger.info(`[BRIDGE] Created user holding bridge ${this.userBridge.id}`);
// Add the caller's channel to the user bridge
await this.userBridge.addChannel({
channel: this.channel.id
});
logger.info(`[BRIDGE] Added caller channel ${this.channel.id} to user holding bridge`);
// Create an external media channel to communicate with OpenAI
this.externalMedia = await this.ari.channels.externalMedia({
app: 'voicebot',
external_host: '0.0.0.0:12050', // Dedicated RTP port for external media
format: 'slin16',
channelId: `external_${this.channel.id}`,
variables: {
JITTERBUFFER: 'adaptive',
AUDIO_BUFFER_POLICY: 'strict',
AUDIO_BUFFER_SIZE: '128'
}
});
logger.info(`[MEDIA] Created external media channel ${this.externalMedia.id}`);
// Create the second holding bridge for the external media channel
this.externalBridge = await this.ari.bridges.create({
type: 'holding', // Separate holding bridge
name: `external_bridge_${this.channel.id}`,
bridgeId: `external_bridge_${this.channel.id}`
});
logger.info(`[BRIDGE] Created external holding bridge ${this.externalBridge.id}`);
// Add the external media channel to the external bridge
await this.externalBridge.addChannel({
channel: this.externalMedia.id
});
logger.info(`[BRIDGE] Added external media channel ${this.externalMedia.id} to external holding bridge`);
// Configure recording for the user bridge (records OpenAI responses)
const userRecording = await this.userBridge.record({
name: `user_bridge_${this.channel.id}`,
format: 'wav',
beep: false,
maxDurationSeconds: 3600,
ifExists: 'overwrite',
options: 'b(IN)b(OUT)' // Record both incoming and outgoing streams
});
// Handle events for the user bridge recording
userRecording.once('RecordingStarted', () => {
logger.info(`[MONITOR] Recording started on user bridge ${this.userBridge.id}`);
});
userRecording.once('RecordingFailed', (event) => {
logger.error(`[MONITOR] Recording failed on user bridge: ${event.error}`);
});
userRecording.once('RecordingFinished', () => {
logger.info(`[MONITOR] Recording completed on user bridge ${this.userBridge.id}`);
});
logger.info(`[MONITOR] Configured recording on user bridge ${this.userBridge.id}`);
// Configure recording for the external bridge (records caller's audio)
const externalRecording = await this.externalBridge.record({
name: `external_bridge_${this.channel.id}`,
format: 'wav',
beep: false,
maxDurationSeconds: 3600,
ifExists: 'overwrite',
options: 'b(IN)b(OUT)' // Record both incoming and outgoing streams
});
// Handle events for the external bridge recording
externalRecording.once('RecordingStarted', () => {
logger.info(`[MONITOR] Recording started on external bridge ${this.externalBridge.id}`);
});
externalRecording.once('RecordingFailed', (event) => {
logger.error(`[MONITOR] Recording failed on external bridge: ${event.error}`);
});
externalRecording.once('RecordingFinished', () => {
logger.info(`[MONITOR] Recording completed on external bridge ${this.externalBridge.id}`);
});
logger.info(`[MONITOR] Configured recording on external bridge ${this.externalBridge.id}`);
// Connect to OpenAI's real-time API for processing audio
logger.info("[OPENAI] Connecting to OpenAI Realtime API...");
await this.realtimeHandler.connect();
// Set up event handlers for the voicebot functionality
await this._setupEventHandlers();
// Mark the handler as initialized
this.isInitialized = true;
logger.info('VoicebotHandler initialization completed');
// Add event handlers for cleanup when the channel is destroyed or leaves Stasis
this.channel.once('ChannelDestroyed', async () => {
logger.info(`[EVENT] Channel ${this.channel.id} destroyed, starting cleanup`);
await this.cleanup();
});
this.channel.once('StasisEnd', async () => {
logger.info(`[EVENT] Channel ${this.channel.id} left Stasis, starting cleanup`);
await this.cleanup();
});
// Handle unexpected hangup events
this.channel.once('ChannelHangupRequest', async () => {
logger.info(`[EVENT] Hangup requested for channel ${this.channel.id}`);
await this.cleanup();
});
} catch (error) {
logger.error('[START] Error in VoicebotHandler:', error);
await this.cleanup();
throw error;
}
}