Build a real-time AI voicebot with Asterisk + AudioSocket + STT/LLM/TTS or STS

gcareri · July 25, 2025, 2:31pm

Hi everyone,

I’m Giuseppe Careri, a developer, solution architect, and Asterisk enthusiast for over 15 years.

Earlier this year I had the pleasure of presenting this solution at Astricon 2025, where I showcased how to build a fully automated AI voicebot using Asterisk with AudioSocket, integrating real-time STT/LLM/TTS pipelines.

If you’re working on real-time voice interactions using Asterisk, and you’re looking to build an AI-powered voicebot using Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS) — I wanted to share a project that might save you weeks of setup and experimentation.

What this does

We’ve built an open source, production-ready infrastructure that allows you to:

Receive or place calls through Asterisk
Forward live audio using AudioSocket (TCP-based audio streaming)
Process it through a modular AI pipeline:
- STT (e.g. Deepgram, Whisper)
- LLM (e.g. OpenAI GPT-4, local models)
- TTS (e.g. ElevenLabs, Coqui)

Or Speech to Speech (e.g. OpenAI Realtime)

The system responds in real time, enabling AI agents to fully replace or assist live operators.

How it works (tech stack)

Asterisk (v18+) with res_audiosocket enabled
Node.js app that:
- Streams audio bidirectionally via AudioSocket
- Manages context & latency buffers
- Orchestrates STT → LLM → TTS in real-time or STT
Docker-based deployment
Optional UI and ASR/LLM/TTS plugins

Try it out

Infra & Docker setup:
GitHub - agentvoiceresponse/avr-infra: The AVR Infrastructure project is designed to launch the Agent Voice Response application, which will start the Core, ASR, LLM, and TTS services integrated with Asterisk Audiosocket.
Join our Discord (community support, feedback welcome):
Agent Voice Response

Ideal for:

Replacing IVRs with natural language AI agents
Building outbound call agents (notifications, surveys, etc.)
AI receptionist or first-line support
Multilingual customer service

We’re actively improving it and would love feedback, contributions, or collaboration.

If you’ve tried building something similar with AGI, EAGI, or ARI, and hit roadblocks, this might be a much smoother path.

Let me know if you’d like a demo or to discuss your use case!

Thanks,

Giuseppe

(maintainer of AgentVoiceResponse)

Venkatesh3132003 · July 26, 2025, 5:56pm

Thanks for sharing this, Giuseppe — and really appreciate the Docker images and the detailed explanation!

Would it be possible for you to share the full codebase or main application logic as well? That would help us understand the flow more deeply, especially around the VAD implementation, buffer sizing, and audio stream handling.

We’re exploring ways to tweak the system — such as adjusting VAD sensitivity, optimizing buffer sizes for latency, and potentially integrating a noise reduction module to improve transcription quality.

Looking forward to diving deeper. Amazing work!

gcareri · July 27, 2025, 3:22am

Thanks a lot for your message and the kind words — really appreciated!

For the VAD implementation, I used Silero VAD, which I found to be very performant. You can check out some benchmarks and comparisons with other libraries here:

GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector

To integrate it into AVR, I built a custom Node.js wrapper library called avr-vad.

You can find the source code on GitHub, where I also explain how to configure it via environment variables — things like thresholds, frame size, and more.

If you’re looking to tweak VAD sensitivity, buffer sizes for latency, or integrate noise reduction modules — you’re definitely on the right path.

Let me know if that answered your question!

If you’re interested in diving deeper into the core, feel free to reach out to me directly on Discord — it’s quicker and easier to share details there.

Looking forward to hearing more from you — and thanks again!

helpinghandindia · July 27, 2025, 11:38am

Hi , I just reviewed the system, works well, how to change the language in Hindi and also little bit enhancement is required in latency , some time i see one way talk, let me review more and will share feedback. thanks .

gcareri · July 27, 2025, 12:50pm

Hi, thank you for your feedback and for testing the system!

To change the language to Hindi, could you please let me know which module you’re referring to — is it the ASR (speech-to-text), the LLM (language model), or the TTS (text-to-speech)? The setup supports multiple providers, and language configuration may vary depending on which one you’re using.

helpinghandindia · July 27, 2025, 1:09pm

docker-compose -f docker-compose-anthropic.yml up -d
I am testing with this.

gcareri · July 27, 2025, 1:50pm

If you’re using the docker-compose-anthropic.yml example, that means the system is currently using Deepgram for ASR and TTS, and Anthropic for the LLM.

Please note: as far as I know, Deepgram does not support Hindi, so for Hindi conversations I recommend switching to Google as the ASR and TTS provider — it has solid support for Hindi.

One of the advantages of AVR is flexibility: if you want to change provider, you just need to replace the ASR, LLM, or TTS service containers and set the correct environment variables. For example:
• If you want to use Vosk for ASR, simply replace the Deepgram ASR container with the Vosk one.
• If you want to use Google TTS, replace the Deepgram TTS container accordingly.

You’ll find an example setup with Google in this file:
docker-compose-google.yml

And for Vosk, I’ve shared working configurations on Discord.

the_tech_talker · July 31, 2025, 11:32am

Hi Giuseppe, thanks for sharing this — truly exciting stuff!

We’ve been exploring similar integrations for voice automation projects and totally agree that bridging Asterisk with real-time AI pipelines is the way forward. The modularity of your setup, especially with AudioSocket and the flexibility to plug in various STT/LLM/TTS providers, is super impressive.

Definitely keen to dive deeper into how latency is handled across the pipeline — and how scalable the setup is for concurrent sessions. Also love the idea of replacing IVRs with more conversational agents.

We’re also working with Asterisk-based voice solutions and would love to explore synergies or contribute where possible. Looking forward to checking out the GitHub repo and the community!

Cheers

gcareri · July 31, 2025, 12:23pm

Hi the_tech_talker,

Thanks a lot for your kind words — really appreciated!

I’m glad to hear you’re working on similar voice automation projects. Totally agree: the shift from rigid IVRs to dynamic, real-time conversational agents is definitely the direction we’re moving toward.

In terms of latency, we’ve spent quite some time optimizing the pipeline — handling audio chunking efficiently and using streaming APIs where possible to keep the interaction as natural as we can. Scalability is also something we’ve designed for from day one: the microservices architecture allows us to spin up sessions independently and scale horizontally when needed.

We’d love to exchange ideas and explore ways to collaborate — especially if you’re already working with Asterisk. Feel free to take a look at the GitHub repositories, and if you haven’t yet, feel free to join the community on Discord — it’s a great space for technical discussions and sharing progress.

Looking forward to hearing more about your work too!

Cheers,
Giuseppe

omert126 · August 15, 2025, 2:02pm

Hello,
I have reviewed your project and was planning to test it on my existing Asterisk setup. However, I noticed that the asr-core repository on GitHub is not public, and the asr-core loaded via Docker is also encrypted. While it is introduced as open-source and free, it seems that the source code or access is not provided. Would it be possible for you to share or provide access in this regard?

gcareri · August 15, 2025, 7:41pm

Hello @omert126,

thank you so much for checking out the project!

Just to clarify — the idea and architecture of Agent Voice Response are completely open and documented for anyone to explore. The avr-core code, however, is still private for now. This isn’t because I want to keep it closed, but simply because at the moment there’s no dedicated funding or support team to maintain a fully public release.

By keeping it private temporarily, I can ensure the system stays stable, secure, and easy to test for everyone through the free Docker image.

The goal is to make the avr-core repository open in the future once we have the right resources in place.

In the meantime, I’d love to have you in our Discord community — we share updates, test new features, and discuss improvements together.

Hope to see you in the Discord — your feedback and ideas would be truly valuable for the project.

fdemyan · August 25, 2025, 11:56am

Hello @omert126,

In the meantime, you may also try WebSocket connector for ElevenLabs

hzapa · August 25, 2025, 8:07pm

How did you compile AdelinaSolutions? I can’t compile it.

gcareri · August 26, 2025, 8:27am

Hi @everyone,

if you need to integrate Asterisk with ElevenLabs, you can already do it with Agent Voice Response.

Just create your API Key and update the .env file.

Here’s an example docker-compose to run Agent Voice Response with ElevenLabs:

docker-compose-elevenlabs.yml

Of course, you can replace the ASR, LLM, and TTS services with the ones you prefer.

If you launch the provided docker-compose, you’ll also get an avr-asterisk service with a default pjsip configuration:

user: 1000
password: 1000
transport: tcp

You can then connect with any softphone, and by calling the 5001 extension (default), your voicebot (AVR) will run with ElevenLabs as STT and TTS, and in this example, Anthropic as the LLM.

If you’d like more details or want to connect with other users, join our Discord community.

AdelinaSolutions · August 28, 2025, 2:06pm

We have developed an asterisk module for 11labs integration. It can be used directly from the dialplan.

gcareri · September 6, 2025, 8:47pm

Hi @hzapa,

We’ve just released the integration with ElevenLabs .

You don’t need to compile anything — just run the containers.

If you already have your own Asterisk, FreePBX, or any Asterisk-based solution, our integration guide explains how to connect it with AVR.

The main advantage is flexibility: you can choose which provider to use not only based on costs but also on performance. ElevenLabs is undoubtedly one of the best, but I also recommend trying OpenAI Realtime and Gemini. Both of these support function calls, which are extremely useful for custom integrations.

Check out the ElevenLabs integration here:

ElevenLabs Speech-to-Speech Integration with AVR

And if you’d like support with setup or production deployment, feel free to join our Discord community — lots of members are available to help:

Join avr discord: Agent Voice Response

We’d be happy to have you there!

PitzKey · September 8, 2025, 11:47am

For those who missed it.

OpenAI now supports SIP: https://platform.openai.com/docs/guides/realtime-sip

With that being said, I vote for using ARI, especially since it gives you way more flexibility.

amarniit · September 8, 2025, 1:04pm

we tested this OpenAI realtime API with direct asteirsk its perfectly working…

gcareri · September 8, 2025, 1:40pm

Hey @PitzKey and @amarniit, interesting points

Yes, OpenAI’s new SIP integration works well and is very easy to get started with — but there are some trade-offs worth mentioning. If you want a quick, cloud-based setup, it’s definitely a good option.

On the other hand, projects like AgentVoiceResponse (AVR) give you the same OpenAI Realtime integration while keeping much more flexibility:

You can integrate not only OpenAI, but also Gemini, ElevenLabs, Ultravox, Vosk, Kokoro, etc.
Full support for Asterisk/FreePBX/VitalPBX using AudioSocket or Dial(AudioSocket/).
Custom function calls/tools (e.g. avr_transfer, avr_hangup, or your own business logic).
Deployable on-prem or in your own cloud, which is important for companies with GDPR/privacy requirements.
WebSocket events, transcripts, and webhook integrations to your own DBs.

So in short:

OpenAI SIP → simple and direct, but closed.
AVR + OpenAI Realtime → more complex, but much more powerful and customizable.

If anyone is curious, you can check out:

Full repos: agentvoiceresponse repositories · GitHub

OpenAI integration repo: GitHub - agentvoiceresponse/avr-sts-openai: This repository showcases the integration between Agent Voice Response and OpenAI's Real-time Speech-to-Speech API

Wiki (OpenAI Realtime with AVR): https://wiki.agentvoiceresponse.com/en/using-openai-realtime-sts-with-avr

And feel free to join the Discord for support and real-world examples

PitzKey · September 8, 2025, 1:54pm

When it comes to use OpenAI, what additional flexibility does your product enable that you can’t do with SIP?

Topic		Replies	Views
Interactive voice response Asterisk Support	1	156	October 11, 2005
[Help] Asterisk - Real speak integration Asterisk Support	0	402	June 13, 2006
Voicesms application Asterisk Support	0	182	July 21, 2008
Interactive voice response Asterisk Support	4	196	September 17, 2005
Multilingual Botbuilder with speech recognition for IVR applications Asterisk Integration	1	384	March 17, 2022

Build a real-time AI voicebot with Asterisk + AudioSocket + STT/LLM/TTS or STS

Related topics