Connecting Asterisk with Pipecat

Hello all,

I have spent the last two full weeks trying to get Asterisk AudioSocket to work with Pipecat. I’m trying to build an agentic voicebot, so my flow looks something like this:

[Voice from User] -> STT -> LLM (with tools) -> TTS -> [Voice to User]

The problem is that there is no clear or defined way to actually transmit the real time voice from Asterisk AudioSocket to Pipecat that I could fine. Even after downsampling/upsampling to the proper rate, It did not work at all, I can’t hear anything at all.

this is what I have done thus far

## **Complete Summary of All Attempts and Results**

### **Initial Setup**

- **What we had**: AudioSocket (Asterisk) → Bridge → WebSocket → Pipecat Server

- **Initial problem**: Frame processing errors, no audio flow

### **Attempt 1: Custom Transport Classes**

**What we tried**: Created `WebSocketAudioTransport` extending `BaseTransport`

```python

class WebSocketAudioTransport(BaseTransport):

def \__init_\_(*self*, *websocket*: WebSocket, *params*: TransportParams):

    super().\__init_\_(params)

```

**Result**: :cross_mark: Failed with `TypeError: BaseTransport._init_() takes 1 positional argument but 2 were given`

### **Attempt 2: Fixed Transport Methods**

**What we tried**: Changed from `input_processor()`/`output_processor()` to `input()`/`output()`

**Result**: :cross_mark: Failed with `TypeError: Can’t instantiate abstract class WebSocketAudioTransport with abstract methods input, output`

### **Attempt 3: Simplified Without Custom Transport**

**What we tried**: Removed custom transport, used simple `FrameProcessor` classes

```python

class AudioInputProcessor(FrameProcessor):

async def run_input_loop(*self*):

    *# Process queued audio*

```

**Result**: :cross_mark: Failed with `AttributeError: ‘AudioRawFrame’ object has no attribute ‘id’`

### **Attempt 4: Fixed Frame Types**

**What we tried**: Changed to `InputAudioRawFrame` with proper ID attributes

```python

audio_frame = InputAudioRawFrame(

*audio*=audio_data,

*sample_rate*=16000,

*num_channels*=1

)

audio_frame.id = str(uuid.uuid4())

```

**Result**: :white_check_mark: Greeting plays, :cross_mark: No STT transcription

### **Attempt 5: Direct Frame Queuing**

**What we tried**: Queue frames directly to task instead of through processors

```python

await task.queue_frame(audio_frame)

```

**Result**: :white_check_mark: Greeting plays, :cross_mark: No STT transcription

### **Attempt 6: Fixed Sample Rates**

**What we tried**:

- Set `audio_out_sample_rate=24000` for OpenAI TTS requirement

- Updated bridge to downsample from 24kHz to 8kHz

**Result**: :white_check_mark: Greeting plays, :cross_mark: No STT transcription

### **Attempt 7: PipelineRunner Signal Handler Fix**

**What we tried**: Added `handle_sigint=False` to fix Windows signal handler error

```python

runner = PipelineRunner(handle_sigint=False)

```

**Result**: :white_check_mark: No more signal handler error, :white_check_mark: Greeting plays, :cross_mark: No STT transcription

### **Attempt 8: Service Validation**

**What we tried**: Added startup validation to test each OpenAI service

```python

async def validate_openai_services(api_key: str):

*# Test STT, LLM, TTS with actual API calls*

```

**Result**: :white_check_mark: All services validated successfully, :white_check_mark: Greeting plays, :cross_mark: No STT transcription

### **Attempt 9: Pipeline Order Fix**

**What we tried**: Moved audio output handler after TTS in pipeline

```python

pipeline = Pipeline([

stt,

context_aggregator.user(),

llm,

tts,

output_handler,  *# Moved here*

context_aggregator.assistant()

])

```

**Result**: :white_check_mark: Greeting plays, :cross_mark: No STT transcription

### **Attempt 10: Disabled VAD and Added Transcription Logging**

**What we tried**:

- Set `vad_enabled=False` to ensure audio isn’t filtered

- Added `TranscriptionLogger` to log all transcription events

**Result**: :white_check_mark: Greeting plays, :cross_mark: No transcriptions logged at all

### **Attempt 11: WebSocketInputProcessor with Separate Async Task**

**What we tried**: Created `WebSocketInputProcessor` with `_audio_receiver_loop()` running as separate async task

```python

class WebSocketInputProcessor(FrameProcessor):

async def \_audio_receiver_loop(*self*):

    *# Receive audio and push frames*

    await **self**.push_frame(audio_frame)

```

**Result**: :cross_mark: Frames created in disconnected async task didn’t reach STT service

### **Attempt 12: Direct Frame Queuing with audio_receiver Function**

**What we tried**: Separate `audio_receiver()` function with direct `task.queue_frame()`

```python

async def audio_receiver(websocket: WebSocket, task: PipelineTask):

await task.queue_frame(audio_frame)

```

**Result**: :white_check_mark: Greeting plays, :white_check_mark: Audio received (300+ chunks), :cross_mark: No STT transcription

### **Attempt 13: Audio Accumulation for Whisper (500ms chunks)**

**What we tried**: Accumulated audio into 8000-byte chunks (500ms at 16kHz) for Whisper

```python

BUFFER_SIZE = 8000 # 500ms at 16kHz

audio_buffer.extend(audio_data)

while len(audio_buffer) >= BUFFER_SIZE:

*# Send accumulated chunk*

```

**Result**: :white_check_mark: Greeting plays, :white_check_mark: 30 frames sent to STT, :cross_mark: No transcriptions

### **Attempt 14: VAD Enabled with Wrong Parameters**

**What we tried**: Tried `SileroVADAnalyzer` with non-existent parameters

```python

vad_analyzer = SileroVADAnalyzer(

*min_speech_duration*=0.3,  *# Doesn't exist*

*max_silence_duration*=0.5   *# Doesn't exist*

)

```

**Result**: :cross_mark: Failed with `TypeError: unexpected keyword argument ‘min_speech_duration’`

### **Attempt 15: VAD with Correct VADParams**

**What we tried**: Used proper `VADParams` class with correct parameters

```python

vad_params = VADParams(

*confidence*=0.7,

*start_secs*=0.3,

*stop_secs*=0.5,

*min_volume*=0.6

)

vad_analyzer = SileroVADAnalyzer(sample_rate=16000, params=vad_params)

```

**Result**: :white_check_mark: Greeting plays, :white_check_mark: Audio received (400+ chunks), :cross_mark: Still no STT transcription

### **Attempt 16: Deepgram Streaming STT with Audio Accumulation**

**What we tried**: Suggested using Deepgram for true streaming STT with 200ms accumulation

```python

if DEEPGRAM_AVAILABLE and os.getenv(“DEEPGRAM_API_KEY”):

stt = DeepgramSTTService(

    *api_key*=os.getenv("DEEPGRAM_API_KEY"),

    *model*="nova-2",

    *interim_results*=True

)

```

**Result**: :warning: Not tested - user doesn’t have Deepgram API key

### **Attempt 17: Debug Processor with Comprehensive Logging**

**What we tried**: Added `DebugProcessor` to log all frames and audio levels

```python

class DebugProcessor(FrameProcessor):

def \__init_\_(*self*, *label*: str):

    super().\__init_\_()

    **self**.label = label

    *# Log RMS, MAX amplitude for each audio frame*

```

**Result**: :white_check_mark: Confirmed audio flowing (RMS=5961, MAX=13948), :cross_mark: No VAD events, :cross_mark: No transcriptions

### **Attempt 18: Manual VAD Processing**

**What we tried**: Manually call `vad_analyzer.analyze()` on audio chunks

```python

vad_result = await vad_analyzer.analyze(audio_frame)

if vad_result.speech_probability > 0.5:

*# Accumulate speech*

```

**Result**: :cross_mark: Failed with `‘SileroVADAnalyzer’ object has no attribute ‘analyze’`

### **Attempt 19: Fixed StartFrame Timing Issues**

**What we tried**: Removed `AudioInputProcessor` from pipeline, used direct frame queuing with proper timing

```python

# Audio receiver waits 0.5s before starting

await asyncio.sleep(0.5)

# Pipeline waits 1.0s before greeting

await asyncio.sleep(1.0)

```

**Result**: :white_check_mark: Greeting plays, :white_check_mark: 700+ chunks received, :cross_mark: No VAD events, :cross_mark: No transcriptions

### **Attempt 20: FastAPIWebsocketTransport with Custom Serializer**

**What we tried**: Created `RawAudioSerializer` with proper `type` property

```python

class RawAudioSerializer(FrameSerializer):

@property

def type(*self*) -> str:

    return "raw_audio"

```

**Result**: :cross_mark: Failed with `exception receiving data: KeyError (‘text’)` - transport expects text messages

### **Attempt 21: WebsocketServerTransport**

**What we tried**: Used `WebsocketServerTransport` instead of `FastAPIWebsocketTransport`

```python

transport = WebsocketServerTransport(

*websocket*=websocket,

*params*=WebsocketServerParams(...)

)

```

**Result**: :cross_mark: Failed with `unexpected keyword argument ‘websocket’` - this transport creates its own server

### **Attempt 22: CustomWebSocketWrapper with FastAPIWebsocketTransport**

**What we tried**: Created wrapper to handle mixed text/binary WebSocket messages

```python

class CustomWebSocketWrapper:

def \__init_\_(*self*, *websocket*: WebSocket):

    **self**.websocket = websocket

    **self**.config_handled = False

```

**Result**: :cross_mark: Failed with `‘CustomWebSocketWrapper’ object has no attribute ‘client_state’`

### **Attempt 23: Simple Solution with AudioAccumulator**

**What we tried**: Created `AudioAccumulator` processor to accumulate chunks

```python

class AudioAccumulator(FrameProcessor):

def \__init_\_(*self*, *chunk_size*: int = 16000):

    *# Accumulate 1 second chunks*

```

**Result**: :white_check_mark: Audio received (500+ chunks), :cross_mark: AudioAccumulator never processed frames (bypassed by `task.queue_frame()`)

### **Attempt 24: Final Solution with WebSocketInputProcessor in Pipeline**

**What we tried**: Put `WebSocketInputProcessor` IN the pipeline with its own receive loop

```python

class WebSocketInputProcessor(FrameProcessor):

async def \_receive_loop(*self*):

    *# First, send StartFrame*

    await **self**.push_frame(StartFrame())

    *# Then receive and push audio frames*

```

I am not looking for a theoretical fix, because I have tried almost all theoretical fixes. I am looking for help in the form of a concrete, minimalist, code example that connects Asterisk AudioSocket with Pipecat which would then allow me to switch out parts of the Pipecat pipeline with ease

I don’t know anything about Pipecat, but I suspect trying to implement a custom asyncio Transport may not be the simplest way. If you do want to persevere, I have some tips on Transports and Protocols in this Jupyter notebook.

I also have an AudioSocket example in this collection which, while it uses asyncio, does not have any custom Transport or Protocol classes.

Hey :waving_hand: I feel you — I also spent days trying to get Pipecat working with Asterisk AudioSocket, running into the same kind of issues you’re describing.

What you’re trying to build is exactly what Agent Voice Response (AVR) already does — and it’s free and open source. :rocket:

AVR is designed to integrate Asterisk AudioSocket with any STT, LLM, or TTS provider. If you prefer, you can even connect directly with speech-to-speech models like OpenAI Realtime, UltravoxAI, or Deepgram, without having to manually handle resampling, buffering, or stream management.

A couple of key points that might convince you to take a look:

  • :counterclockwise_arrows_button: AVR takes care of the entire real-time audio pipeline (Asterisk ↔ AI provider) so you don’t need to reinvent the wheel.

  • :gear: You can mix and match providers (e.g. Deepgram STT + Anthropic LLM + Coqui TTS) or use ready-made speech-to-speech.

  • :package: There are ready-to-go docker-compose examples here: avr-infra so you can spin up a working setup in minutes.

  • :open_book: Detailed docs explain how it works: AVR Wiki.

  • :speech_balloon: And we’ve got an active community on Discord where people share configs, troubleshooting, and ideas: join us here.

If your goal is a real-time voicebot with Asterisk, AVR already solves most of the pain points you’re hitting with Pipecat. I’d love to see what you build with it!

1 Like

Hello,

Thank you for bringing this up. I actually came across it several days ago, but unfortunately it lacked the flexibility I’m looking for. I don’t want to be limited to simply the services which are provided through the docker files. I want to be able to build the pipeline using combination of tools (local or cloud), for example, I might want to use my own custom made ASR, then connect it to a local LLM of my choice, then send the result to any cloud TTS.

As far as i’m aware, that is still a restriction of the AVR project, it focuses a lot on the cloud aspect but doesn’t really have much flexibility for the local aspect

I’d appreciate your input on this

Hi there,

Thanks for your message! Actually, one of the main strengths of AVR is exactly its flexibility — you’re not forced into an all-cloud setup. You can run everything in the cloud, fully local, or in a hybrid mode (some modules local, some in the cloud).

For example:

  • ASR → you can use Deepgram in the cloud, or Vosk locally via our AVR integration.

  • TTS → we’ve integrated community-driven local engines like Kokoro and CoquiTTS, but you can also use cloud providers such as ElevenLabs.

  • LLMs → if you’re interested in local setups, we’re currently working on Ollama integration, and we’d love to evaluate vLLM in the future.

In one of the latest versions we’ve implemented an integration with n8n :rocket: — this allows you to build real workflows connecting your voicebot not only with AI modules, but also with CRMs, calendars, Google Sheets, ticketing systems, task managers, and much more.

So, you can really mix and match the pieces depending on your needs. If you have experience with local LLMs and would like to help us shape this part, we’d be more than happy to collaborate!

You’re very welcome to join our community — we can share more details and brainstorm together. :rocket:

:backhand_index_pointing_right: Here are some links to get you started:

Looking forward to your thoughts — and hopefully your contributions :slightly_smiling_face:

Thanks for your work !

What about the source code of avr-core ?

You’re welcome!

You can find all the repositories here:

:backhand_index_pointing_right: agentvoiceresponse repositories · GitHub

Also, feel free to join us on Discord for support and more details :rocket:

I mean the source code of image “agentvoiceresponse/avr-core” , dockerfile and source code . if that possible .

Hi @m7mdcc , at the moment, the source code is not publicly available. The main reason is that we were not in a position to properly manage pull requests and ongoing development contributions. We are currently working on structuring the project so that we can provide proper support on the code base, and once this process is in place, our plan is to open the code.

In the meantime, please note that you can use avr-core completely free of charge.

Thank you for your patience and understanding.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.