AudioSocket Bidirectional Audio Problem - Technical Summary
Problem Overview
I’m implementing a real-time AI voice agent using Asterisk’s AudioSocket application for bidirectional audio streaming. The issue is that audio only flows in ONE direction (from phone → Asterisk → AudioSocket server), but NOT in the reverse direction (AudioSocket server → Asterisk → phone).
What Works
-
AudioSocket Connection: Stable TCP connection established between Asterisk and my Node.js AudioSocket server
-
Speech-to-Text (STT): Audio from the phone is perfectly captured and transcribed (user saying “Hello”, “Do you hear me?” is transcribed correctly)
-
Protocol Implementation:
-
Correct UUID handshake (NOT echoed back, as per protocol)
-
Sending silence frames with proper 3-byte headers: 0x10 (audio type) + 0x01 0x40 (320 bytes length in big-endian) + 320 bytes PCM
-
Sending TTS audio frames with same format, 170 frames over 3.4 seconds at 20ms intervals
-
-
TCP Settings: TCP_NODELAY enabled for low latency
What Doesn’t Work
-
Text-to-Speech (TTS) Playback: The user hears NOTHING when the AudioSocket server sends audio frames back to Asterisk
-
Unidirectional Audio: Only receiving audio FROM Asterisk, not successfully sending audio TO Asterisk for playback
Technical Details
Current Setup
Asterisk Dialplan (extensions.conf):
[direct-outbound] exten => _NXXXXXXXXX,1,NoOp(=== Outbound Call ===) same => n,Set(CALL_ID=${CALL_ID}) same => n,Set(MODE=${MODE}) same => n,GotoIf($[“${MODE}” = “audiosocket”]?audiosocket_dial:normal_dial)
same => n(audiosocket_dial),NoOp(=== AudioSocket Mode ===) same => n,Dial(PJSIP/${EXTEN}@fxo-line,60,tT) same => n,Hangup()
[voice-agent-audiosocket] exten => s,1,NoOp(=== Voice Agent AudioSocket ===) same => n,Set(AUDIOSOCKET_UUID=${CALL_ID}) same => n,AudioSocket(${AUDIOSOCKET_UUID},asterisk-api:9092) same => n,Hangup()
Call Flow:
-
AMI Originate creates Local/${destination}@direct-outbound channel
-
Context specified as voice-agent-audiosocket, extension s
-
This should create:
-
;1 leg → Executes AudioSocket() application in voice-agent-audiosocket context
-
;2 leg → Dials PJSIP/${destination}@fxo-line in direct-outbound context
-
-
Both legs should be automatically bridged by Asterisk
AudioSocket Server (Node.js):
-
Receives UUID from Asterisk (19 bytes: 3-byte header + 16-byte UUID)
-
Does NOT echo UUID back (just starts sending audio)
-
Sends silence frames immediately to keep connection alive
-
When TTS audio arrives, stops silence and sends 170 audio frames:
-
Each frame: 3-byte header (0x10 0x01 0x40) + 320 bytes PCM audio
-
Sent at 20ms intervals (real-time rate for 8kHz audio)
-
Format: signed 16-bit PCM, 8kHz, mono, little-endian
-
-
Resumes silence after TTS completes
Logs Show
AudioSocket Server:
AudioSocket connected Streaming 170 audio frames at 20ms intervals (3.4s) Streamed 50/170 frames Streamed 100/170 frames Streamed 150/170 frames Finished streaming 170 frames All socket.write() calls return true (not blocked)
Asterisk:
-
No errors in logs
-
No “Failed to receive frame” messages
-
AudioSocket() application appears to be running
-
Channel shows sendrecv topology for audio
Call Behavior:
-
Phone rings (works)
-
User answers (works)
-
User’s voice is captured and transcribed perfectly (works)
-
User hears NOTHING (no TTS audio) (DOESN’T WORK)
Questions for Community
-
Is AudioSocket actually bidirectional by default? Or does it require special configuration to send audio TO Asterisk?
-
Does Asterisk automatically READ from the AudioSocket and play to the channel? Or do I need to explicitly tell it to read/playback?
-
Is my Local channel setup correct for bidirectional audio? Should both legs be bridged automatically, or do I need to use ARI/Stasis to create the bridge manually?
-
Is there a way to verify that Asterisk is actually READING audio frames from the AudioSocket? The logs show no errors, but also no indication it’s reading anything.
-
Should I be using a different dialplan approach? Some examples show using Dial() with options like b() (before-answer) or U() (after-answer) to run AudioSocket, but I’m not sure if that’s necessary.
Environment
-
Asterisk 22 (latest)
-
AudioSocket protocol v1
-
Node.js 18 AudioSocket server
-
Call flow: SIP phone → Asterisk → FXO gateway → PSTN
-
Using Local channels with AMI Originate
Asterisk is in a docker container in a server that is in the same network with HT813
Any insights into why audio only flows one direction would be greatly appreciated!