I wanted to ask if there’s a way to use OpenAI Whisper with Asterisk, similar to how it works with the aeap-speech-to-text module.
Has anyone tried this or have any ideas on how Whisper could be integrated with Asterisk in a similar manner? If there are any existing projects or approaches, I’d appreciate any pointers!
Thanks for your positive news.
I think the Python script is the least of the problems, but how can I send the live audio stream to my Python script via the dial plan?
You can either write an AEAP implementation using its protocol if all you want is text to speech, or you write an ARI application using external media if you want to do more.
Ouweia, I can’t do C++ and I believe AEAP is written in C++. There are already Python scripts for OpenAI Whisper for live audio, but I need to figure out how to make Asterisk send the live audio to the script. Maybe I’ll find a solution to rewrite the existing AEAP script, as you already suggested. I’ll ask in the OpenAI forum and once I have a solution, I will of course share it in this post. I wish you a pleasant day. Best regards from Germany!
AEAP isn’t “written in C++”. AEAP is a defined protocol[1] between Asterisk and an outside application that can be in any language. The example we provide is in Javascript.
Thank you for your correction. I just checked, and I think the provider.js needs to be adapted for WhisperLive. However, I’ll need to dig deeper into it to figure it out. I wanted to use VOSK, but it requires a huge amount of resources, and my server’s RAM isn’t sufficient for it. Google STT is all well and good, but I’m looking for a local solution that doesn’t rely on an external provider.
The problem is that I only have 1GB of memory, and Asterisk is already running on the server. Only short sentences need to be recognized, which should not be longer than 10-15 seconds.
Can you provide me with an example of a Dialplan using AudioSockets and VOSK?
OpenAI Whisper does not natively support streaming audio input. So you have to send OpenAI a recording file.
I have done this type of thing 3 different ways, and they all work, although one of them was not using OpenAI because of the lack of streaming capability. How you do it just depends on your specific application. If you want to stream the audio real-time, OpenAI is probably not the best choice for your situation. Although you can chunk the audio into small segments and send them sequentially, there are better services if you want to do real-time streaming though. If you want to do that, I’d suggest you do it using ARI with an external media channel to get the audio into your app. That is the best method I believe because you can also stream TTS back to Asterisk then also, but for a simple application using any script to send the recording file to OpenAI will work. One use case I have done is just transcribing voicemail, and I send the file for transcription via an AGI script called in a hangup handler.
“I can’t” really means I can’t. It wasn’t just an objection to sharing it puclicly in the forum. I don’t own the code since I wrote it as an employee. Besides it’s in Go, not Python.
Sorry. Not everyone can share their code. My point in responding was to point out you don’t have to overly complicate it. Depending on your specific needs it could be very simple, as in about 5-6 lines of code. If you need to stream the audio it is much more complicated and in that scenario you have to overcome that OpenAI doesn’t accept audio streams.