How to prevent G.722 (16kHz) from transcoding to 8kHz SLIN before EAGI script?

Forum Post Title: How to prevent G.722 (16kHz) from transcoding to 8kHz SLIN before EAGI script?

Hello everyone,

I’m working on a voicebot project using Asterisk, a Python EAGI script, and the Google Cloud Speech-to-Text API. My goal is to maintain a full 16kHz audio pipeline from an incoming call using the G.722 codec all the way to my script for high-quality transcription.

Despite my configuration, I’m seeing Asterisk transcode the audio down to 8kHz, which is causing problems for my speech recognition setup. I’ve hit a wall trying to debug this and would greatly appreciate any insights from the community.

The Core Problem

An incoming call is correctly established using G.722 (16kHz). However, when I check the active channel, Asterisk is actively transcoding the audio before it even gets to my EAGI application.

Here is the output from core show channel [channel-id], which is the key evidence of the issue:

 -- General --
           Name: SIP/provider-trunk-0000000b
           Type: PJSIP
       UniqueID: 1750859102.22
       LinkedID: 1750859102.22
      Caller ID: [REDACTED_CALLER_ID]
      ...
  NativeFormats: (g722)
    WriteFormat: g722
     ReadFormat: slin
 WriteTranscode: No
  ReadTranscode: Yes (g722@16000)->(slin@8000)  <-- THE PROBLEM IS HERE
 Time to Hangup: 0
   Elapsed Time: 0h0m18s
      Bridge ID: (Not bridged)
 --   PBX   --
        Context: from-internal
      Extension: [REDACTED_DID]
       Priority: 24
    Application: EAGI
           Data: /var/lib/asterisk/agi-bin/voicebox/voicebox.py

As you can see, the ReadTranscode line clearly shows the incoming 16kHz G.722 audio is being converted to 8kHz SLIN (slin@8000). My Python script is configured for 16kHz LINEAR16 audio, so this transcoding is the root of my problem.

My Configuration

I am using pjsip for the trunk. The user provided sip.conf, but here is the equivalent modern pjsip.conf configuration.

/etc/asterisk/pjsip.conf

[transport-udp]
type=transport
protocol=udp
bind=0.0.0.0

[provider-trunk]
type=endpoint
context=from-internal
disallow=all
allow=g722,ulaw,alaw  ; g722 is preferred
aors=provider-trunk
direct_media=no
rtp_symmetric=yes
force_rport=yes

[provider-trunk]
type=aor
contact=sip:[PROVIDER_IP] ; The static IP of my provider

[provider-trunk]
type=identify
endpoint=provider-trunk
match=[PROVIDER_IP] ; The IP the calls come from

/etc/asterisk/extensions.conf

This is my dialplan that handles the call. It plays a welcome message and then enters a loop to interact with my EAGI script.

[general]
autofallthrough=yes

[from-internal]
exten => _.,1(start),Set(CHANNEL(format)=slin16)
exten => _.,n,Answer()
exten => _.,n,Verbose(1, ReadFormat=${CHANNEL(readformat)})
exten => _.,n,Set(callstart_time=${EPOCH})
exten => _.,n,Ringing
exten => _.,n,Wait(2)
exten => _.,n,Set(i=1)
exten => _.,n,Set(VBDIR=/var/lib/asterisk/agi-bin/voicebox)
exten => _.,n,Set(voicedir=${VBDIR}/incoming/voice-${UNIQUEID})
exten => _.,n,Set(voicefile=${voicedir}/${i}.wav)
exten => _.,n,MixMonitor(${VBDIR}/output.wav,r(${voicefile}))

; Play a welcome message
exten => _.,n,Set(audio=${VBDIR}/short_welcome)
exten => _.,n,Playback(${audio})

; Loop for conversation with the bot
exten => _.,n(loop),While($[${i} < 24])
    exten => _.,n,GotoIf($["${hangup}" = "True"]?hangup,1)
    exten => _.,n,eagi(${VBDIR}/voicebox.py)
    exten => _.,n,StopMixMonitor()
    exten => _.,n,Set(i=$[${i} + 1])
    exten => _.,n,Set(voicefile=${voicedir}/${i}.wav)
    exten => _.,n,MixMonitor(${VBDIR}/output.wav,r(${voicefile}))
    exten => _.,n,Background(${audio})
exten => _.,n,EndWhile

exten => _.,n,hangup()

; Hangup logic
exten => h,1,agi(${VB_DIR}/upload_audio.py)
exten => h,n,System(rm -rf ${voicedir})
exten => h,n,agi(${VB_DIR}/post_call.py)
exten => h,n,hangup()

Python EAGI Script Summary

  • voicebox.py: This is the main script called by eagi() in the dialplan. Its primary job is to manage the call flow and invoke the speech recognition script.
  • speech.py: This script uses the Google Cloud Speech library. The critical configuration within this script is:
    # speech.py snippet
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code='sv_SE'
    )
    
    This confirms the script is correctly set up to process 16kHz audio.

What I’ve Tried

  1. Codec Preference: Ensuring g722 is the first and preferred codec in my pjsip.conf endpoint.
  2. Forcing Channel Format: I added Set(CHANNEL(format)=slin16) at the very beginning of my dialplan, but this seems to be ignored or overridden, as the transcode still happens.
  3. The Playback Hypothesis: I have been advised that the Playback(${audio}) command could be the culprit. If my short_welcome.wav file is an 8kHz file, Asterisk might be forcing the entire channel to 8kHz to match it. I am in the process of verifying all my .wav files are saved as 16kHz, 16-bit mono.

My Questions for the Community

  1. Is the Playback() application the most likely reason for forcing this transcode, even when I’ve set the channel format manually?
  2. What is the most robust and reliable way to configure my dialplan to guarantee a 16kHz audio path from the endpoint to the EAGI script?
  3. Are there any other global settings (in asterisk.conf or elsewhere) that could be influencing this behavior and forcing a default 8kHz path?

Thank you in advance for any help or suggestions you can provide!

This doesn’t exist. If you’re using AI, it has lied to you.

To set the format of audio passed to EAGI you set the EAGI_AUDIO_FORMAT dialplan variable[1]. If not set it defaults to signed linear at 8kHz.

[1] asterisk/res/res_agi.c at master · asterisk/asterisk · GitHub

1 Like

You can also get a full-duplex 16-bit audio connection with Asterisk via AudioSocket. This is done FastAGI-style, where your running process accepts multiple incoming connections instead of a separate process being launched by Asterisk for every call.

Thank you @jcolp for the clarification on EAGI_AUDIO_FORMAT - that was exactly what I needed! I’ve updated my configuration and wanted to share my complete solution for achieving consistent 16kHz audio delivery to EAGI scripts.

Current Configuration (Updated)

SIP Configuration (sip.conf):

ini

[1002]
dtmfmode=rfc2833
deny=0.0.0.0/0.0.0.0
trustrpid=yes
sendrpid=yes
permit=xxx.xxx.xxx.xxx
host=xxx.xxx.xxx.xxx
type=friend
context=from-internal
host=dynamic
direct_media=no
disallow=all
allow=g722,g726,alaw,ulaw,gsm
nat=force_rport,comedia

Dialplan (extensions.conf):

ini

[general]
autofallthrough=yes

[from-internal]
exten => _.,1(start),Answer
exten => _.,n,Set(EAGI_AUDIO_FORMAT=slin16)  ; Thanks to @jcolp's correction!
exten => _.,n,Verbose(1, ReadFormat=${CHANNEL(readformat)})
exten => _.,n,Verbose(1, WriteFormat=${CHANNEL(writeformat)})
exten => _.,n,Set(call_start_time=${EPOCH})
exten => _.,n,Set(export_conversation="False")
exten => _.,n,Ringing
exten => _.,n,Wait(2)
exten => _.,n,Set(VOLUME(RX)=4)
exten => _.,n,Set(VOLUME(TX)=1)
exten => _.,n,Set(i=1)
exten => _.,n,Set(VB_DIR=/var/lib/asterisk/agi-bin/voicebox)
exten => _.,n,Set(voicedir=${VB_DIR}/incoming/voice-${UNIQUEID})
exten => _.,n,Set(voicefile=${voicedir}/${i}.wav16)
exten => _.,n,MixMonitor(${VB_DIR}/output.wav16,r(${voicefile}))
exten => _.,n,Set(intent_start_time=${EPOCH})
exten => _.,n,Set(audio=${VB_DIR}/welcome_16k)
exten => _.,n,Playback(${audio})

exten => _.,n(loop),While($[${i} < 24])
exten => _.,n,GotoIf($["${hangup}" = "True"]?hangup,1)
exten => _.,n,eagi(${VB_DIR}/voicebox.py)
exten => _.,n,StopMixMonitor()
exten => _.,n,Set(i=$[${i} + 1])
exten => _.,n,Set(voicefile=${voicedir}/${i}.wav16)
exten => _.,n,MixMonitor(${VB_DIR}/output.wav16,r(${voicefile}))
exten => _.,n,Set(intent_start_time=${EPOCH})
exten => _.,n,Background(${audio})
exten => _.,n,EndWhile

exten => _.,n,hangup()

exten => hangup,1,Set(TIMEOUT(absolute)=1)
exten => hangup,n,hangup()

exten => h,1,agi(${VB_DIR}/upload_audio.py)
exten => h,2,System(rm -rf ${voicedir})
exten => h,3,agi(${VB_DIR}/post_call.py)
exten => h,4,hangup()

My Specific Challenge

I’m working with a diverse customer base calling via mobile phones through an ISP SIP service that supports:

  • AMR (8kHz)
  • AMR-WB (16kHz)
  • G.722 (16kHz)
  • G.726-32 (16kHz)
  • PCMA/PCMU (8kHz)
  • telephone-event/8000 and telephone-event/16000

My Google Cloud Speech-to-Text API requires consistent 16kHz LINEAR16 audio for optimal accuracy. The challenge is ensuring all incoming calls, regardless of the codec used, deliver 16kHz audio to my EAGI script.

Questions for the Community

  1. Codec Negotiation Strategy: Given the mixed codec support from mobile carriers, what’s the best approach to prioritize 16kHz codecs (G.722, G.726, AMR-WB) while gracefully handling 8kHz fallbacks?
  2. Audio File Impact: I suspect my Playback() and Background() audio files might be forcing transcoding. All my audio files are now converted to 16kHz WAV format. Is this sufficient, or are there other audio-related considerations?
  3. AMR-WB Support: Does standard Asterisk support AMR-WB codec out of the box, or do I need additional modules/licensing for wideband AMR?
  4. 8kHz Upsampling: For calls that must use 8kHz codecs (PCMA/PCMU), will setting EAGI_AUDIO_FORMAT=slin16 automatically upsample the audio to 16kHz, or do I need additional processing?
  5. Quality Verification: What’s the best way to verify that my EAGI script is actually receiving 16kHz audio? I’m currently logging the format in my Python script, but I’d like to confirm the actual sample rate.
  6. Audio File Format Impact: If my dialplan contains just one Playback() file at 8kHz (e.g., welcome message), will this force the entire channel to remain at 8kHz for all subsequent EAGI calls and audio processing? Or is it mandatory to convert every single audio file to 16kHz to maintain the 16kHz pipeline?

@ldo - Thank you for mentioning AudioSocket! I’m curious about the performance differences. For a system handling 200-300 concurrent calls, would AudioSocket provide better audio quality consistency compared to EAGI?

If you set the dialplan variable, then that’s the format your AGI will get. Asterisk will automatically transcode as needed without intervention.

I think you’re overthinking this. If you can use g722 with an upstream then just use that and let Asterisk do the rest.

You can use “core show channel” to see what the channel has negotiated, and what it is being provided for playback. The best option is to use the same format that is needed, so if g722 is negotiated then g722 files.

Asterisk does not include AMR-WB support or provide it.

If you set that, then that’s what you get. Always.

It doesn’t force 8kHz for the lifetime of the channel if a single file is 8kHz. A channel has a negotiated format that generally doesn’t change unless the other side re-negotiates, or you re-negotiate. The channel can then be fed media in any format, or read in any format, and it will transcode as needed.

Has Asterisk been changed to allow that? I thought the nearest you could get to that was slin16., which has no metadata.

It’s actually slin16. :folded_hands: