AI Voice Agent + Grandstream HT813 + Landline

AudioSocket Bidirectional Audio Problem - Technical Summary

Problem Overview

I’m implementing a real-time AI voice agent using Asterisk’s AudioSocket application for bidirectional audio streaming. The issue is that audio only flows in ONE direction (from phone → Asterisk → AudioSocket server), but NOT in the reverse direction (AudioSocket server → Asterisk → phone).

What Works

  1. AudioSocket Connection: Stable TCP connection established between Asterisk and my Node.js AudioSocket server

  2. Speech-to-Text (STT): Audio from the phone is perfectly captured and transcribed (user saying “Hello”, “Do you hear me?” is transcribed correctly)

  3. Protocol Implementation:

    • Correct UUID handshake (NOT echoed back, as per protocol)

    • Sending silence frames with proper 3-byte headers: 0x10 (audio type) + 0x01 0x40 (320 bytes length in big-endian) + 320 bytes PCM

    • Sending TTS audio frames with same format, 170 frames over 3.4 seconds at 20ms intervals

  4. TCP Settings: TCP_NODELAY enabled for low latency

What Doesn’t Work

  1. Text-to-Speech (TTS) Playback: The user hears NOTHING when the AudioSocket server sends audio frames back to Asterisk

  2. Unidirectional Audio: Only receiving audio FROM Asterisk, not successfully sending audio TO Asterisk for playback

Technical Details

Current Setup

Asterisk Dialplan (extensions.conf):

[direct-outbound] exten => _NXXXXXXXXX,1,NoOp(=== Outbound Call ===) same => n,Set(CALL_ID=${CALL_ID}) same => n,Set(MODE=${MODE}) same => n,GotoIf($[“${MODE}” = “audiosocket”]?audiosocket_dial:normal_dial)

same => n(audiosocket_dial),NoOp(=== AudioSocket Mode ===) same => n,Dial(PJSIP/${EXTEN}@fxo-line,60,tT) same => n,Hangup()

[voice-agent-audiosocket] exten => s,1,NoOp(=== Voice Agent AudioSocket ===) same => n,Set(AUDIOSOCKET_UUID=${CALL_ID}) same => n,AudioSocket(${AUDIOSOCKET_UUID},asterisk-api:9092) same => n,Hangup()

Call Flow:

  1. AMI Originate creates Local/${destination}@direct-outbound channel

  2. Context specified as voice-agent-audiosocket, extension s

  3. This should create:

    • ;1 leg → Executes AudioSocket() application in voice-agent-audiosocket context

    • ;2 leg → Dials PJSIP/${destination}@fxo-line in direct-outbound context

  4. Both legs should be automatically bridged by Asterisk

AudioSocket Server (Node.js):

  • Receives UUID from Asterisk (19 bytes: 3-byte header + 16-byte UUID)

  • Does NOT echo UUID back (just starts sending audio)

  • Sends silence frames immediately to keep connection alive

  • When TTS audio arrives, stops silence and sends 170 audio frames:

    • Each frame: 3-byte header (0x10 0x01 0x40) + 320 bytes PCM audio

    • Sent at 20ms intervals (real-time rate for 8kHz audio)

    • Format: signed 16-bit PCM, 8kHz, mono, little-endian

  • Resumes silence after TTS completes

Logs Show

AudioSocket Server:

AudioSocket connected Streaming 170 audio frames at 20ms intervals (3.4s) Streamed 50/170 frames Streamed 100/170 frames Streamed 150/170 frames Finished streaming 170 frames All socket.write() calls return true (not blocked)

Asterisk:

  • No errors in logs

  • No “Failed to receive frame” messages

  • AudioSocket() application appears to be running

  • Channel shows sendrecv topology for audio

Call Behavior:

  • Phone rings (works)

  • User answers (works)

  • User’s voice is captured and transcribed perfectly (works)

  • User hears NOTHING (no TTS audio) (DOESN’T WORK)

Questions for Community

  1. Is AudioSocket actually bidirectional by default? Or does it require special configuration to send audio TO Asterisk?

  2. Does Asterisk automatically READ from the AudioSocket and play to the channel? Or do I need to explicitly tell it to read/playback?

  3. Is my Local channel setup correct for bidirectional audio? Should both legs be bridged automatically, or do I need to use ARI/Stasis to create the bridge manually?

  4. Is there a way to verify that Asterisk is actually READING audio frames from the AudioSocket? The logs show no errors, but also no indication it’s reading anything.

  5. Should I be using a different dialplan approach? Some examples show using Dial() with options like b() (before-answer) or U() (after-answer) to run AudioSocket, but I’m not sure if that’s necessary.

Environment

  • Asterisk 22 (latest)

  • AudioSocket protocol v1

  • Node.js 18 AudioSocket server

  • Call flow: SIP phone → Asterisk → FXO gateway → PSTN

  • Using Local channels with AMI Originate

Asterisk is in a docker container in a server that is in the same network with HT813

Any insights into why audio only flows one direction would be greatly appreciated!

Hi, since Asterisk runs on Docker, have you configured Docker’s NAT networking and asterisk pjsip.conf?
For development and testing, it would be advisable to install Asterisk directly on your operating system and make your tests directly.
Do calls between extensions work? using your PSTN to extensions works?

Greetings!

I recently did a similar project and had successful audio using audiosocket, take a look at this , may provide some useful insights

Hello!!

Yes calls work, I get the call in my cellphone , but I dont hear the TTS, the system though hears me!

Hello Hkjarral, I found your repo when I was doing the research, i think you use ari and not audiosocket, will that work for my case with HT813?

I built this to be a ground for experimentation, it uses ARI with audiosocket and external media where audiosocket not available , I had limited test cases I could test in my setup. Feel free to use it or try it out and see how it works out for your application.