Is it possible to integrate OpenAI Whisper with Asterisk?

DHT768 · February 10, 2025, 9:11pm

Hi everyone,

I wanted to ask if there’s a way to use OpenAI Whisper with Asterisk, similar to how it works with the aeap-speech-to-text module.

Has anyone tried this or have any ideas on how Whisper could be integrated with Asterisk in a similar manner? If there are any existing projects or approaches, I’d appreciate any pointers!

Thanks in advance!

jcolp · February 10, 2025, 9:20pm

I’m not aware of any out of the box here you go implementation. The fundamentals to do it exist with Asterisk, but it ultimately has to be done.

DHT768 · February 10, 2025, 9:32pm

Thanks for your positive news.
I think the Python script is the least of the problems, but how can I send the live audio stream to my Python script via the dial plan?

jcolp · February 10, 2025, 9:33pm

You can either write an AEAP implementation using its protocol if all you want is text to speech, or you write an ARI application using external media if you want to do more.

DHT768 · February 10, 2025, 9:34pm

I don’t want TTS, I want STT

jcolp · February 10, 2025, 9:34pm

Sorry, I meant speech to text. AEAP only does speech to text currently.

DHT768 · February 10, 2025, 9:37pm

Ouweia, I can’t do C++ and I believe AEAP is written in C++. There are already Python scripts for OpenAI Whisper for live audio, but I need to figure out how to make Asterisk send the live audio to the script. Maybe I’ll find a solution to rewrite the existing AEAP script, as you already suggested. I’ll ask in the OpenAI forum and once I have a solution, I will of course share it in this post. I wish you a pleasant day. Best regards from Germany!

jcolp · February 10, 2025, 9:38pm

AEAP isn’t “written in C++”. AEAP is a defined protocol[1] between Asterisk and an outside application that can be in any language. The example we provide is in Javascript.

[1] Asterisk External Application Protocol (AEAP) - Asterisk Documentation

DHT768 · February 10, 2025, 9:42pm

Thank you for your correction. I just checked, and I think the provider.js needs to be adapted for WhisperLive. However, I’ll need to dig deeper into it to figure it out. I wanted to use VOSK, but it requires a huge amount of resources, and my server’s RAM isn’t sufficient for it. Google STT is all well and good, but I’m looking for a local solution that doesn’t rely on an external provider.

sedwards · February 10, 2025, 11:42pm

If you cache the rendered audio from the external STT provider, your dependence and costs may be tolerable.

We use Polly (AWS) and Watson (IBM). When we receive the STT audio, I store it as /tts-cache/<provider>/<language>/<voice>/<md5sum-of-text>.wav.

For some of our clients, we can extract the text and variables from their script and pre-render the text.

nshmyrev · February 11, 2025, 4:38pm

You can reduce Vosk models to fit your memory, for example, you can remove rnnlm and rescore folders from the model and it will fit 2Gb probably.

As for Whisper, it needs much more compute (GPU card) and not really real-time.

Chetang.Jha · February 15, 2025, 3:02am

try using audio sockets

DHT768 · February 15, 2025, 12:10pm

The problem is that I only have 1GB of memory, and Asterisk is already running on the server. Only short sentences need to be recognized, which should not be longer than 10-15 seconds.

Can you provide me with an example of a Dialplan using AudioSockets and VOSK?

ronlockard · February 16, 2025, 8:26am

OpenAI Whisper does not natively support streaming audio input. So you have to send OpenAI a recording file.

I have done this type of thing 3 different ways, and they all work, although one of them was not using OpenAI because of the lack of streaming capability. How you do it just depends on your specific application. If you want to stream the audio real-time, OpenAI is probably not the best choice for your situation. Although you can chunk the audio into small segments and send them sequentially, there are better services if you want to do real-time streaming though. If you want to do that, I’d suggest you do it using ARI with an external media channel to get the audio into your app. That is the best method I believe because you can also stream TTS back to Asterisk then also, but for a simple application using any script to send the recording file to OpenAI will work. One use case I have done is just transcribing voicemail, and I send the file for transcription via an AGI script called in a hangup handler.

DHT768 · February 16, 2025, 2:41pm

That sounds great. Would you share your code here?

ronlockard · February 16, 2025, 11:32pm

Sorry, no I can’t.

DHT768 · February 17, 2025, 1:02am

And PN?

ronlockard · February 17, 2025, 1:40am

What are you asking?

DHT768 · February 17, 2025, 1:51am

I asked if you could send me the code in a private message.

ronlockard · February 17, 2025, 4:08pm

“I can’t” really means I can’t. It wasn’t just an objection to sharing it puclicly in the forum. I don’t own the code since I wrote it as an employee. Besides it’s in Go, not Python.

Sorry. Not everyone can share their code. My point in responding was to point out you don’t have to overly complicate it. Depending on your specific needs it could be very simple, as in about 5-6 lines of code. If you need to stream the audio it is much more complicated and in that scenario you have to overcome that OpenAI doesn’t accept audio streams.

Good luck.

Topic		Replies	Views
How Can Asterisk Play the Real-Time Audio Stream? Asterisk Support	3	1958	November 29, 2024
Stream / Playback using websocket Asterisk APIs	9	1602	April 12, 2024
Audio pipe to external program Asterisk APIs	4	684	August 26, 2023
Protocol AEAP - Using it for TTS Asterisk APIs	4	282	June 2, 2023
Asterisk integration with OpenAI realtime API Asterisk APIs	9	908	March 16, 2025

Is it possible to integrate OpenAI Whisper with Asterisk?

Related topics