Asterisk to OpenAI Realtime: The Definitive MVP (In Progress)

InfinitoCloud · February 25, 2025, 6:41am

What is this?

I want to share with the community how I’m building a Minimum Viable Product (MVP) to enable calls from Asterisk FreePBX to OpenAI Realtime models with the lowest possible latency. This will be an easy-to-configure and ready-to-use solution.

OpenAI Realtime Reference:

https://platform.openai.com/docs/guides/realtime

How to reach the goal and what is it?

I believe the most user-friendly experience will come from using FreePBX, which provides easy access to features like extensions, trunks, reports, and IVRs. My plan is to build a FreePBX module that includes:

An ARI ExternalMedia App
OpenAI APIs integration
A Control Panel in FreePBX

The Goal:

Enable users to call extension 3000, where an OpenAI Realtime model answers with its voice. The model will listen to your commands, understand them, and respond appropriately with low latency.

Why?

To demonstrate how an Asterisk ARI app can integrate with OpenAI’s Realtime provider.
To create a foundational project for integrating other AI providers (e.g., Grok 3 Voice) into a single app.
To meet new customers who need ARI customizations and professional services.

Who am I?

I’m a Cloud Architect with 15 years of experience working with servers, networks, cloud, and AI. I’ve leveraged AI tools to complete other ARI (Asterisk REST Interface) implementations with AI bots, such as Google DialogFlow and AWS Bedrock. This project is an exciting challenge for me, and I’d love to share it with the community.

Requirements

OpenAI API keys and credits, with access to Realtime models.
FreePBX installed.

Outcomes/Roadmap

Enable seamless, low-latency communication with OpenAI Realtime models via Asterisk.
Develop a FreePBX Addon.
Build an ARI app using ExternalMedia.
Prepare the solution for scalability.

Tested On

Debian 12 with FreePBX 17

To-Do List

8.1. Research OpenAI Realtime capabilities and requirements. In Progress - Done

8.2. Build a backend Node.js app replicating the OpenAI Realtime Playground to validate all required steps. In Progress - Done

It works!. I can send and receive voice in streaming using OpenAI websockets. Here are the mini-apps that help me test everything:

8.3. Extract all necessary fields for implementation. In Progress

8.4. Build the first ARI app version 0.1.

8.5. Test and debug.

8.6. Create the FreePBX Module. In Progress

8.7. Test and debug.

8.8. Build the second ARI app version 0.2.

8.9. Test and debug.

8.10. Integrate the most popular requested features.

What can you do?

Subscribe to this post for updates until the MVP is complete.
Help me test it and report bugs or issues to the GitHub repo once it’s ready.
Suggest functionalities! I’ll prioritize the most relevant ones for future versions.

MuhireE · February 25, 2025, 9:26am

Hi @InfinitoCloud ,

i feel interested to your project, how can we link up for further discussion.

Thanks,

InfinitoCloud · February 27, 2025, 1:48am

Ok, I continue with the development of the module, step 8.3:

I have just completed an ARI application that integrates the RTP server, this application answers the call, receive the RTP audio, records its input audio for 10 seconds, saves as wav and plays it by injecting it through RTP towards asterisk. For people asking how to get ARI to work with RTP:

agolovaciuc · February 28, 2025, 6:13pm

Hello everyone,

I see you’re discussing the possibility of integrating Asterisk with OpenAI Realtime.

Could you please tell me if it’s possible to use Node.js with ARI to send a media stream to OpenAI so that it can process the call in real-time?

InfinitoCloud · March 1, 2025, 6:31am

Hello, that’s exactly what I’m doing, today I was working on the ARI application and I was able to listen for the first time to the openai model real time from an asterisk call, but it fails when playing the following open ai rt audio responses, well, I’m making progress, I’ve made quite a bit of progress I would say. I’m close, follow the updates of this post, at the end I will share the functional ARI application. Greetings.

deduzzo · March 2, 2025, 9:46pm

fantastic job! was also my research and you and you got ahead of me! my research was oriented to include a n8n flux, i follow your work and I will be vailable to extend it to create something awesome

InfinitoCloud · March 4, 2025, 7:08pm

Hi, nice idea, it could be integrated with n8n. I’m close to complete the 0.1 version of the ARI app, so, keep tuned.

egorky · March 6, 2025, 4:31am

Will this app allow for a realtime conversation between a user via telephone with an openai bot?

InfinitoCloud · March 6, 2025, 4:11pm

Yes exactly, this ARI application and the subsequent FreePBX module will connect Asterisk with OpenAI Realtime services.

InfinitoCloud · March 23, 2025, 7:46pm

UPDATES!

8.4. Build the first ARI app version 0.1 - DONE!

You can see the ARI app working here:

Also supports interruptions:

boblp · March 26, 2025, 8:55pm

Do you have the latest code for this? I see you havent pushed any changes to your repo in a bit.

love the work <3, nice demo

InfinitoCloud · March 27, 2025, 3:09pm

Hi, yes, the repo is updated now:

vim · March 28, 2025, 6:04am

Thanks for your demo

TedM · April 2, 2025, 4:21am

Hmm, slightly interesting. Now what would make it scads more interesting would be a demo app that instead of just talking back to the user with general AI responses, would also return a copy of the conversation text transcript using the speech-to-test demo model that OpenAI has online.

InfinitoCloud · April 2, 2025, 2:34pm

Of course, anything is possible. This is a basic application, but it required a lot of features just to start interacting properly with OpenAI RT. From here, you can create AI agents that query external data, AI-powered interactive voice response (IVR) systems, RAG-powered customer service agents, and more.

VinayakVasuMV · April 4, 2025, 12:24pm

No Audio is heared or created

asterisk_to_openai_rt# node asterisk_to_openai_rt.js
(node:31353) [DEP0040] DeprecationWarning: The punycode module is deprecated. Please use a userland alternative instead.
(Use node --trace-deprecation ... to show where the warning was created)
[
‘This API is using a deprecated version of Swagger! Please see Home · swagger-api/swagger-core Wiki · GitHub for more info’
]
N/A | 2025-04-04T12:23:40.885Z [INFO] Connected to ARI at http://127.0.0.1:8088
N/A | 2025-04-04T12:23:40.888Z [INFO] ARI application “stasis_app” started
N/A | 2025-04-04T12:23:40.890Z [INFO] RTP Receiver listening on 127.0.0.1:12000
N/A | 2025-04-04T12:23:44.708Z [INFO] StasisStart event received for channel 1743769424.9, name: SIP/5000-00000003
N/A | 2025-04-04T12:23:44.709Z [INFO] SIP channel started: 1743769424.9
N/A | 2025-04-04T12:23:44.716Z [INFO] Channel 1743769424.9 answered
N/A | 2025-04-04T12:23:44.718Z [INFO] ExternalMedia channel 1743769424.10 created and mapped to bridge e7014e22-d699-4e5c-825d-1bbd81e07e79
N/A | 2025-04-04T12:23:44.719Z [INFO] Attempting to start OpenAI WebSocket for channel 1743769424.9
N/A | 2025-04-04T12:23:44.752Z [INFO] StasisStart event received for channel 1743769424.10, name: UnicastRTP/127.0.0.1:12000-0x7f3280010100
N/A | 2025-04-04T12:23:44.753Z [INFO] ExternalMedia channel started: 1743769424.10
N/A | 2025-04-04T12:23:44.756Z [INFO] ExternalMedia channel 1743769424.10 added to bridge e7014e22-d699-4e5c-825d-1bbd81e07e79
N/A | 2025-04-04T12:23:45.035Z [INFO] RTP Source assigned for channel 1743769424.9: 127.0.0.1:11430
N/A | 2025-04-04T12:23:45.076Z [INFO] Audio processed for channel 1743769424.9 | RMS: 0.000 | Max sample before: 16, after: 16
C-0000 | 2025-04-04T12:23:45.770Z [INFO] [Client] OpenAI WebSocket connection established for channel 1743769424.9
N/A | 2025-04-04T12:23:45.771Z [INFO] Initializing RTP stream to 127.0.0.1:11430 for channel 1743769424.9
S-0000 | 2025-04-04T12:23:45.776Z [INFO] [Server] First event received for channel 1743769424.9 | Type: session.created | Duration: N/As | Status: Received
S-0001 | 2025-04-04T12:23:45.776Z [INFO] [Server] Session created for channel 1743769424.9 | Duration: N/As | Status: Received
N/A | 2025-04-04T12:23:45.870Z [INFO] RTP stream fully initialized for channel 1743769424.9
N/A | 2025-04-04T12:23:45.871Z [INFO] StreamHandler initialized for channel 1743769424.9 | Ready: true
C-0001 | 2025-04-04T12:23:45.872Z [INFO] [Client] Session updated with VAD settings for channel 1743769424.9 | Threshold: 0.8, Prefix: 300ms, Silence: 700ms
C-0002 | 2025-04-04T12:23:45.877Z [INFO] [Client] Sending audio chunk to OpenAI for channel 1743769424.9 | Size: 9.38 KB | RMS: 0.000
N/A | 2025-04-04T12:23:47.065Z [INFO] Received 100 RTP packets from 127.0.0.1:11430, total bytes: 17200, rate: 16.19 packets/s
N/A | 2025-04-04T12:23:47.085Z [INFO] Audio processed for channel 1743769424.9 | RMS: 0.000 | Max sample before: 24, after: 24
C-0003 | 2025-04-04T12:23:47.877Z [INFO] [Client] Sending audio chunk to OpenAI for channel 1743769424.9 | Size: 9.38 KB | RMS: 0.284
S-0004 | 2025-04-04T12:23:48.169Z [INFO] [Server] Speech started detected for channel 1743769424.9 | Duration: N/As | Status: Received
N/A | 2025-04-04T12:23:49.064Z [INFO] Received 100 RTP packets from 127.0.0.1:11430, total bytes: 17200, rate: 50.03 packets/s
N/A | 2025-04-04T12:23:49.086Z [INFO] Audio processed for channel 1743769424.9 | RMS: 0.288 | Max sample before: 26496, after: 26496
C-0004 | 2025-04-04T12:23:49.878Z [INFO] [Client] Sending audio chunk to OpenAI for channel 1743769424.9 | Size: 9.38 KB | RMS: 0.000
S-0006 | 2025-04-04T12:23:50.356Z [INFO] [Server] Speech stopped detected for channel 1743769424.9 | Duration: N/As | Status: Received
N/A | 2025-04-04T12:23:50.356Z [INFO] RTP stream stopped for channel 1743769424.9
N/A | 2025-04-04T12:23:50.356Z [INFO] Stopped RTP stream due to user speech for channel 1743769424.9
N/A | 2025-04-04T12:23:50.356Z [INFO] Initializing RTP stream to 127.0.0.1:11430 for channel 1743769424.9
N/A | 2025-04-04T12:23:50.357Z [INFO] Finished RTP stream for channel 1743769424.9 | Total duration: 4.59s | Total bytes sent: 3680 | Total packets: 10
S-0011 | 2025-04-04T12:23:50.409Z [INFO] [Server] Response completed for channel 1743769424.9 | Duration: N/As | Audio Fragments: 0 | Text Fragments: 0 | RTP Packets: 5 | RTP Bytes: 1200
C-0005 | 2025-04-04T12:23:50.410Z [INFO] [Client] Cleared OpenAI audio buffer for channel 1743769424.9
N/A | 2025-04-04T12:23:50.456Z [INFO] RTP stream fully initialized for channel 1743769424.9
N/A | 2025-04-04T12:23:50.456Z [INFO] StreamHandler initialized for channel 1743769424.9 | Ready: true
N/A | 2025-04-04T12:23:51.065Z [INFO] Received 100 RTP packets from 127.0.0.1:11430, total bytes: 17200, rate: 49.98 packets/s
N/A | 2025-04-04T12:23:51.106Z [INFO] Audio processed for channel 1743769424.9 | RMS: 0.000 | Max sample before: 16, after: 16
N/A | 2025-04-04T12:23:51.699Z [INFO] Channel 1743769424.9 removed from sipMap at start of StasisEnd
N/A | 2025-04-04T12:23:51.699Z [INFO] Send timeout cleared for channel 1743769424.9
N/A | 2025-04-04T12:23:51.699Z [INFO] RTP stream stopped for channel 1743769424.9
N/A | 2025-04-04T12:23:51.699Z [INFO] StreamHandler stopped for channel 1743769424.9 in StasisEnd
N/A | 2025-04-04T12:23:51.699Z [INFO] Channel 1743769424.9 hung up, checking playback status before cleanup
N/A | 2025-04-04T12:23:51.699Z [INFO] Finished RTP stream for channel 1743769424.9 | Total duration: 1.34s | Total bytes sent: 3680 | Total packets: 10
N/A | 2025-04-04T12:23:51.801Z [INFO] WebSocket closed for channel 1743769424.9 in StasisEnd
N/A | 2025-04-04T12:23:51.809Z [INFO] Bridge e7014e22-d699-4e5c-825d-1bbd81e07e79 destroyed
N/A | 2025-04-04T12:23:51.809Z [INFO] Channel ended: 1743769424.9
N/A | 2025-04-04T12:23:52.207Z [INFO] RTP stream stopped for channel 1743769424.9
C-0006 | 2025-04-04T12:23:52.207Z [INFO] [Client] OpenAI WebSocket connection closed for channel 1743769424.9 | Status: Finished

No audio is heared or created

asterisk_to_openai_rt/test-tools# node 4_app_voicechat_rt.js
2025-04-04T12:11:15.515Z - Starting connection…
2025-04-04T12:11:15.516Z - Output file initialized: ./output_audio_2025-04-04T12-11-15-514Z.pcm
2025-04-04T12:11:16.923Z - Connection established
2025-04-04T12:11:16.923Z - Audio loaded, total size: 80400 bytes
2025-04-04T12:11:16.926Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.027Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.128Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.220Z - Voice detection started
2025-04-04T12:11:17.229Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.330Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.432Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.533Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.634Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.735Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.836Z - Sent chunk: 4800 bytes
2025-04-04T12:11:17.937Z - Sent chunk: 4800 bytes
2025-04-04T12:11:18.037Z - Sent chunk: 4800 bytes
2025-04-04T12:11:18.137Z - Sent chunk: 4800 bytes
2025-04-04T12:11:18.238Z - Sent chunk: 4800 bytes
2025-04-04T12:11:18.338Z - Sent chunk: 4800 bytes
2025-04-04T12:11:18.438Z - Sent chunk: 4800 bytes
2025-04-04T12:11:18.538Z - Sent chunk: 3600 bytes
2025-04-04T12:11:18.632Z - Audio transcribed (optional): No transcription
2025-04-04T12:11:18.640Z - Audio confirmed
2025-04-04T12:11:18.640Z - Response request sent
2025-04-04T12:11:18.849Z - Audio saved to: ./output_audio_2025-04-04T12-11-15-514Z.pcm
2025-04-04T12:11:19.173Z - Error: Error committing input audio buffer: buffer too small. Expected at least 100ms of audio, but buffer only has 0.00ms of audio.
2025-04-04T12:11:19.173Z - Connection closed

hzapa · April 6, 2025, 6:51pm

Hi, I have Same probelme here. any clues thanks

oreilya · April 6, 2025, 9:20pm

Hi. I have a problem.

asterisk@VM-9d8a061d-0352-4e8a-97c3-2e541c08a880:/asterisk_to_openai_rt$ nodejs asterisk_to_openai_rt.js
[
‘This API is using a deprecated version of Swagger! Please see Home · swagger-api/swagger-core Wiki · GitHub for more info’
]
N/A | 2025-04-06T19:22:35.191Z [INFO] Connected to ARI at http://127.0.0.1:8088
N/A | 2025-04-06T19:22:35.204Z [INFO] ARI application “stasis_app” started
N/A | 2025-04-06T19:22:35.210Z [INFO] RTP Receiver listening on 127.0.0.1:12000
N/A | 2025-04-06T19:22:40.171Z [INFO] StasisStart event received for channel 1743967360.75, name: PJSIP/4000-00000030
N/A | 2025-04-06T19:22:40.171Z [INFO] SIP channel started: 1743967360.75
N/A | 2025-04-06T19:22:40.178Z [ERROR] Error in SIP channel 1743967360.75: {
“message”: “Write access denied”
}
N/A | 2025-04-06T19:22:42.004Z [INFO] Channel ended: 1743967360.75
N/A | 2025-04-06T19:28:43.739Z [INFO] StasisStart event received for channel 1743967723.80, name: PJSIP/4000-00000033
N/A | 2025-04-06T19:28:43.739Z [INFO] SIP channel started: 1743967723.80
N/A | 2025-04-06T19:28:43.742Z [ERROR] Error in SIP channel 1743967723.80: {
“message”: “Write access denied”
}
N/A | 2025-04-06T19:28:57.793Z [INFO] Channel ended: 1743967723.80

hiteshjb · April 14, 2025, 2:01pm

I have same issue, no errors and no audio either. Anyone figured what was issue ? Or any solutions to try

hiteshjb · April 14, 2025, 2:03pm

Did you find any solution to this issue?

Topic		Replies	Views
Is there any one who did AI live audio chat with gemini live api and asterisk Asterisk APIs	3	38	May 5, 2025
Asterisk real time audio stream for conversational AI Asterisk APIs	2	136	May 9, 2025
Asterisk integration with OpenAI realtime API Asterisk APIs	9	668	March 16, 2025
WebRTC to OpenAI Realtime Asterisk APIs	2	141	April 18, 2025
Development of Tool for Healthcare Office / Transcription & Speaker Identification Asterisk APIs	3	54	May 13, 2025

Asterisk to OpenAI Realtime: The Definitive MVP (In Progress)

Related topics