I’m Giuseppe Careri, a developer, solution architect, and Asterisk enthusiast for over 15 years.
Earlier this year I had the pleasure of presenting this solution at Astricon 2025, where I showcased how to build a fully automated AI voicebot using Asterisk with AudioSocket, integrating real-time STT/LLM/TTS pipelines.
If you’re working on real-time voice interactions using Asterisk, and you’re looking to build an AI-powered voicebot using Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS) — I wanted to share a project that might save you weeks of setup and experimentation.
What this does
We’ve built an open source, production-ready infrastructure that allows you to:
Receive or place calls through Asterisk
Forward live audio using AudioSocket (TCP-based audio streaming)
Process it through a modular AI pipeline:
STT (e.g. Deepgram, Whisper)
LLM (e.g. OpenAI GPT-4, local models)
TTS (e.g. ElevenLabs, Coqui)
Or Speech to Speech (e.g. OpenAI Realtime)
The system responds in real time, enabling AI agents to fully replace or assist live operators.
Thanks for sharing this, Giuseppe — and really appreciate the Docker images and the detailed explanation!
Would it be possible for you to share the full codebase or main application logic as well? That would help us understand the flow more deeply, especially around the VAD implementation, buffer sizing, and audio stream handling.
We’re exploring ways to tweak the system — such as adjusting VAD sensitivity, optimizing buffer sizes for latency, and potentially integrating a noise reduction module to improve transcription quality.
Thanks a lot for your message and the kind words — really appreciated!
For the VAD implementation, I used Silero VAD, which I found to be very performant. You can check out some benchmarks and comparisons with other libraries here:
To integrate it into AVR, I built a custom Node.js wrapper library called avr-vad.
You can find the source code on GitHub, where I also explain how to configure it via environment variables — things like thresholds, frame size, and more.
If you’re looking to tweak VAD sensitivity, buffer sizes for latency, or integrate noise reduction modules — you’re definitely on the right path.
Let me know if that answered your question!
If you’re interested in diving deeper into the core, feel free to reach out to me directly on Discord — it’s quicker and easier to share details there.
Looking forward to hearing more from you — and thanks again!
Hi , I just reviewed the system, works well, how to change the language in Hindi and also little bit enhancement is required in latency , some time i see one way talk, let me review more and will share feedback. thanks .
Hi, thank you for your feedback and for testing the system!
To change the language to Hindi, could you please let me know which module you’re referring to — is it the ASR (speech-to-text), the LLM (language model), or the TTS (text-to-speech)? The setup supports multiple providers, and language configuration may vary depending on which one you’re using.
If you’re using the docker-compose-anthropic.yml example, that means the system is currently using Deepgram for ASR and TTS, and Anthropic for the LLM.
Please note: as far as I know, Deepgram does not support Hindi, so for Hindi conversations I recommend switching to Google as the ASR and TTS provider — it has solid support for Hindi.
One of the advantages of AVR is flexibility: if you want to change provider, you just need to replace the ASR, LLM, or TTS service containers and set the correct environment variables. For example:
• If you want to use Vosk for ASR, simply replace the Deepgram ASR container with the Vosk one.
• If you want to use Google TTS, replace the Deepgram TTS container accordingly.
You’ll find an example setup with Google in this file: docker-compose-google.yml
And for Vosk, I’ve shared working configurations on Discord.
Hi Giuseppe, thanks for sharing this — truly exciting stuff!
We’ve been exploring similar integrations for voice automation projects and totally agree that bridging Asterisk with real-time AI pipelines is the way forward. The modularity of your setup, especially with AudioSocket and the flexibility to plug in various STT/LLM/TTS providers, is super impressive.
Definitely keen to dive deeper into how latency is handled across the pipeline — and how scalable the setup is for concurrent sessions. Also love the idea of replacing IVRs with more conversational agents.
We’re also working with Asterisk-based voice solutions and would love to explore synergies or contribute where possible. Looking forward to checking out the GitHub repo and the community!
Thanks a lot for your kind words — really appreciated!
I’m glad to hear you’re working on similar voice automation projects. Totally agree: the shift from rigid IVRs to dynamic, real-time conversational agents is definitely the direction we’re moving toward.
In terms of latency, we’ve spent quite some time optimizing the pipeline — handling audio chunking efficiently and using streaming APIs where possible to keep the interaction as natural as we can. Scalability is also something we’ve designed for from day one: the microservices architecture allows us to spin up sessions independently and scale horizontally when needed.
We’d love to exchange ideas and explore ways to collaborate — especially if you’re already working with Asterisk. Feel free to take a look at the GitHub repositories, and if you haven’t yet, feel free to join the community on Discord — it’s a great space for technical discussions and sharing progress.
Looking forward to hearing more about your work too!
Hello,
I have reviewed your project and was planning to test it on my existing Asterisk setup. However, I noticed that the asr-core repository on GitHub is not public, and the asr-core loaded via Docker is also encrypted. While it is introduced as open-source and free, it seems that the source code or access is not provided. Would it be possible for you to share or provide access in this regard?
Just to clarify — the idea and architecture of Agent Voice Response are completely open and documented for anyone to explore. The avr-core code, however, is still private for now. This isn’t because I want to keep it closed, but simply because at the moment there’s no dedicated funding or support team to maintain a fully public release.
By keeping it private temporarily, I can ensure the system stays stable, secure, and easy to test for everyone through the free Docker image.
The goal is to make the avr-core repository open in the future once we have the right resources in place.
In the meantime, I’d love to have you in our Discord community — we share updates, test new features, and discuss improvements together.
Hope to see you in the Discord — your feedback and ideas would be truly valuable for the project.
Of course, you can replace the ASR, LLM, and TTS services with the ones you prefer.
If you launch the provided docker-compose, you’ll also get an avr-asterisk service with a default pjsip configuration:
user: 1000
password: 1000
transport: tcp
You can then connect with any softphone, and by calling the 5001 extension (default), your voicebot (AVR) will run with ElevenLabs as STT and TTS, and in this example, Anthropic as the LLM.
If you’d like more details or want to connect with other users, join our Discord community.
We’ve just released the integration with ElevenLabs .
You don’t need to compile anything — just run the containers.
If you already have your own Asterisk, FreePBX, or any Asterisk-based solution, our integration guide explains how to connect it with AVR.
The main advantage is flexibility: you can choose which provider to use not only based on costs but also on performance. ElevenLabs is undoubtedly one of the best, but I also recommend trying OpenAI Realtime and Gemini. Both of these support function calls, which are extremely useful for custom integrations.
Yes, OpenAI’s new SIP integration works well and is very easy to get started with — but there are some trade-offs worth mentioning. If you want a quick, cloud-based setup, it’s definitely a good option.
On the other hand, projects like AgentVoiceResponse (AVR) give you the same OpenAI Realtime integration while keeping much more flexibility:
You can integrate not only OpenAI, but also Gemini, ElevenLabs, Ultravox, Vosk, Kokoro, etc.
Full support for Asterisk/FreePBX/VitalPBX using AudioSocket or Dial(AudioSocket/).
Custom function calls/tools (e.g. avr_transfer, avr_hangup, or your own business logic).
Deployable on-prem or in your own cloud, which is important for companies with GDPR/privacy requirements.
WebSocket events, transcripts, and webhook integrations to your own DBs.
So in short:
OpenAI SIP → simple and direct, but closed.
AVR + OpenAI Realtime → more complex, but much more powerful and customizable.