external media get caller audio

I’m working on a voice assistant system using Asterisk with ARI and OpenAI’s realtime API. I need help with the audio capture part.

Current situation:

  1. Setup:
  • Dialplan handling calls to extension 680
  • Using ARI to create a bridge with two channels:
    • Main channel (from PJSIP call)
    • External-media channel for RTP streaming

astervoice*CLI> bridge show all
Bridge-ID Chans Type Technology Duration
028ae525-f4c1-4fca-bea9-f9ac74d08861 2 stasis simple_bridge 00:00:09
Got RTP packet from 10.7.1.3:4090 (type 96, seq 012064, ts 156160, len 000640)

  • Dialplan:
    [from-internal]
    exten => 680,1,NoOp(Starting AI Voice Assistant)
    same => n,Answer()
    same => n,Stasis(voicebot)
    same => n,Hangup()

[stream-audio]
exten => s,1,NoOp(Starting external media stream)
same => n,Set(JITTERBUFFER(adaptive)=default)
same => n,Set(AUDIO_BUFFER_POLICY=strict)
same => n,Set(AUDIO_BUFFER_SIZE=128)
same => n,Set(RTP_PORT=${CHANNEL(rtpport)})
same => n,ExternalMedia(rtp,10.7.1.2:${RTP_PORT}/${MATH(${RTP_PORT} + 1)},slin16)
same => n,Hangup()

The log file:

  1. Call Initialization and External Media Channel Creation:

    {"level":"info","message":"New call from 680 to extension 680","timestamp":"2024-12-09T16:10:50.124Z"}
    {"level":"info","message":"Starting voicebot handler for channel 1733760650.4","timestamp":"2024-12-09T16:10:50.125Z"}
    {"level":"info","message":"Created external media channel external_1733760650.4","timestamp":"2024-12-09T16:10:50.138Z"}
    
    • A new call is initiated from extension 680. The system starts a voicebot handler for the channel 1733760650.4.
    • An external media channel external_1733760650.4 is created, which is crucial for handling audio streams between Asterisk and external systems like OpenAI.
  2. Bridge Creation and Channel Addition:

    {"level":"info","message":"Created mixing bridge 7028ae525-f4c1-4fca-bea9-f9ac74d08861","timestamp":"2024-12-09T16:10:50.146Z"}
    {"level":"info","message":"Added channels to bridge","timestamp":"2024-12-09T16:10:50.659Z"}
    
    • A mixing bridge with ID 028ae525-f4c1-4fca-bea9-f9ac74d08861 is created. This bridge is used to mix audio streams from different channels.
    • Channels are added to this bridge, allowing for the integration of audio from the external media channel and other sources.
  3. Connection to OpenAI Realtime API:

    {"level":"info","message":"Connecting to OpenAI Realtime API...","timestamp":"2024-12-09T16:10:50.660Z"}
    {"level":"info","message":"WebSocket connection established","timestamp":"2024-12-09T16:10:51.487Z"}
    
    • The system connects to the OpenAI Realtime API, establishing a WebSocket connection for real-time communication.
  4. Session Initialization and Welcome Message:

    • The session is initialized with specific configurations, including audio format and instructions for the virtual assistant.
    • A welcome message is generated and received. This confirms that the system is correctly receiving and processing the welcome audio from OpenAI.
    • The system starts playback of the audio file chunk_1733760652759.wav, which contains the welcome message.

Conclusion

The logs indicate that the system is correctly handling the audio streams between Asterisk and OpenAI. The external media channel and mixing bridge are set up properly, and the welcome message from OpenAI is received and played back successfully. This confirms that the audio capture and transmission to OpenAI are functioning as expected.

Issue:
I’m trying to capture the caller’s audio for debugging purposes and send it to may node.js app. The bridge is created successfully and the channels are connected, but I’m not receiving the caller’s audio in my application.

Any suggestions would be greatly appreciated.

How is your application bridging the external media stream to OpenAI? Is the audio from OpenAI being sent to a file that is then played back?

Yes. The received audio chunks from OpenAI are saved as WAV files and played back to the user through Asterisk.

You should be able to take a packet capture and verify that the rtp stream to the external media port is set up correctly and contains the expected audio.
I’m assuming your application is set up to receive the rtp stream to proxy to OpenAI?

The caller’s audio coming from ExternalMedia() is actually an RTP stream, you are not mentioning how you are trying to send it to the nodejs app, but I can tell you the nodejs app would have to be listening in the same server as Asterisk. I mean, you could send it to a separate server but keep in mind you can’t send pure RTP straight through a TCP pipe unless it is wrapped by an additional layer.

I’m working with a SoftMix bridge in Asterisk and have configured it to use external media. While the bridge itself seems to be functioning
astervoice*CLI> bridge show all
Bridge-ID Chans Type Technology Duration
voicebot_1733943994.0 3 stasis softmix 00:00:05
I’m struggling to capture the caller’s audio. My goal is to forward the caller’s audio to the external media endpoint to my node app.

Questions:

  • How can I configure Asterisk to properly send the caller’s audio to the external media in this setup?
  • Are there specific bridge settings, channel options, or external media parameters I might be overlooking?

Setup Details:

  • Asterisk Version: 20
  • Channels in Use: PJSIP
  • External Media Endpoint Details::

async start() {
try {
logger.info(Starting voicebot handler for channel ${this.channel.id});
this.isActive = true;

        await this.ensureAudioDir();

        // If this is the external-media channel, do nothing
        if (this.isExternalMediaChannel) {
            logger.info('External media channel detected, skipping initialization');
            return;
        }

        await this.channel.answer();
        logger.info(`Channel ${this.channel.id} answered`);

        // Create external media channel to force softmix
        this.externalMedia = await this.ari.channels.create({
            endpoint: 'Local/s@stream-audio',
            app: 'voicebot',
            channelId: `external_${this.channel.id}`,
            variables: {
                AUDIO_FORMAT: 'slin16'
            }
        });
        logger.info(`Created external media channel ${this.externalMedia.id}`);

        // Before creating the bridge, set the channel variables
        logger.debug('[CHANNEL CONFIG] Setting audio variables');
        await this.channel.setChannelVar({
            variable: 'MIXMONITOR_EVENTS',
            value: 'yes'
        });

        await this.channel.setChannelVar({
            variable: 'AUDIO_EVENTS',
            value: 'yes'
        });

        // Create the mixing bridge explicitly specifying softmix type
        this.bridge = await this.ari.bridges.create({ 
            type: 'mixing,softmix',  // Explicitly force softmix
            name: `voicebot_${this.channel.id}`,
            bridgeId: `voicebot_${this.channel.id}`
        });
        logger.info(`Created softmix bridge ${this.bridge.id}`);

        // Add the channel to the bridge with all necessary options
        await this.bridge.addChannel({
            channel: this.channel.id,
            options: {
                EVENTS: 'all',
                AUDIO_EVENTS: 'yes',
                BRIDGE_TYPE: 'softmix'  // Force softmix here too
            }
        });
        logger.info(`Added channel ${this.channel.id} to bridge ${this.bridge.id}`);

        // Immediately add the external media channel to the bridge
        await this.bridge.addChannel({
            channel: this.externalMedia.id,
            options: {
                EVENTS: 'all',
                AUDIO_EVENTS: 'yes',
                BRIDGE_TYPE: 'softmix'
            }
        });
        logger.info(`Added external media channel ${this.externalMedia.id} to bridge`);

        // Add monitoring of audio events
        this.bridge.once('BridgeMixingStart', () => {
            logger.info('[BRIDGE] Softmix bridge started, monitoring audio events');
        });

        this.bridge.on('ChannelTalkingStarted', () => {
            logger.info('[BRIDGE] Channel started talking');
        });

        this.bridge.on('ChannelTalkingFinished', () => {
            logger.info('[BRIDGE] Channel stopped talking');
        });

        // Add monitoring of audio events
        this.bridge.on('AudioFrameReceived', async (event) => {
            logger.debug(`[BRIDGE] Received audio frame of size: ${event.frame.length} bytes`);
            await this._handleAudioFrame(event.frame);
        });

        // Connect to OpenAI and configure audio
        logger.info("Connecting to OpenAI Realtime API...");
        await this.realtimeHandler.connect();
        
        await this._setupEventHandlers();
        
        this.isInitialized = true;
        logger.info('VoicebotHandler initialization completed with softmix bridge');

    } catch (error) {
        logger.error('Error in VoicebotHandler:', error);
        await this.cleanup();
        throw error;
    }
}

So you’re not using external media as provided by ARI, you’re using a Local channel instead. From the given code it doesn’t appear you are calling “dial” on the created channel if “create” is actually using the create ARI route, so I would not expect anything to happen until that is done.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.