Transcoding and audio normalization

Hi all !

As an introduction, I’d like to point out that I’m really not an expert in telephony and asterisk, so please excuse me if I’m not precise enough or if I wrongly use technical terms. I looked at different topics in the forum about audio codecs, transcoding, normalization but couldn’t find any answer.

Here is the issue I’m facing (to be honest, I’m not even sure this comes from asterisk, but I try to check all steps in my pipeline to troubleshoot my issue) :

I’m streaming a 8kHz 16 bits mono wav file toto.wav with function Playback(/opt/s3/audio/toto)

The file is played back fine and I get the following log lines:

asterisk-1 | [Aug 20 14:15:11] – Executing [10@call-file-test:3] Playback(“Local/123@callbackout-00000004;1”, “/opt/s3/audio/h2b-api-tests/files_numbers/file_number10_22729230965”) in new stack
asterisk-1 | [Aug 20 14:15:11] – <Local/123@callbackout-00000004;1> Playing ‘/opt/s3/audio/h2b-api-tests/files_numbers/file_number10_22729230965.slin’ (language ‘en’)

Problem is there’s a loss in the audio gain between original toto.wav file and the one streamed by asterisk.

I can see this gain loss when comparing the original audio and the audio streamed by asterisk.

It’s as if the audio gain was modified once Asterisk transcoded (not sure asterisk did transcode) from WAV to slin and played the audio file.

Could this gain loss be the result of audio normalization when transcoding to .slin ? If yes, how could I avoid it ? Or maybe there is an option in asterisk I missed that automatically decrease audio input gain (Set(VOLUME,…) is not used anywhere) ?

Any help would be greatly appreciated.

Thanks by advance !

How are you measuring?

What codec is actually in use on the channel?

How are you measuring?

No precise measuring, but the waveform of the streamed audio shows a clear gain diminution.

What codec is actually in use on the channel?

I’ll need your help to check this. Where should I look out ? Or what config file should I link to the post ?

The output of “core show channel” on the channel would show it.

docker exec -it my_container asterisk -rx “core show channels”

returns

Channel Location State Application(Data)
0 active channels
0 active calls
15 calls processed

The asterisk server is used to stream audio to UniMRCP.

A channel has to actually be up playing back audio in order to examine it.

There is no inherent audio normalization in Asterisk itself, and no volume stuff is enabled by default. There’s just the translation implementation for the specific translation path and any effect it has on things.

Here is the core show channel result when streaming audio:

Channel Location State Application(Data)
Local/123@callbackou 123@callbackout:6 Up Dial(PJSIP/200@local,1,b(predi
Local/123@callbackou 10@call-file-test:3 Up Playback(/opt/s3/audio/h2b-api
PJSIP/local-0000000b 200@from-local:8 Up MRCPRecog(builtin:speech/spell
PJSIP/local-0000000a (None) Up AppDial((Outgoing Line))
4 active channels
3 active calls
18 calls processed

That is “core show channels”, not “core show channel”. The “core show channel” CLI command has to have a specific channel name (that can be tab completed) to show the details of a channel.

Sorry for the typo in my previous post:

9ef1ed832d53*CLI> core show channels

returns

Channel Location State Application(Data)
PJSIP/local-0000000f 200@from-local:8 Up MRCPRecog(builtin:speech/spell
PJSIP/local-0000000e (None) Up AppDial((Outgoing Line))
Local/123@callbackou 10@call-file-test:3 Up Playback(/opt/s3/audio/h2b-api
Local/123@callbackou 123@callbackout:6 Up Dial(PJSIP/200@local,1,b(predi
4 active channels
3 active calls
24 calls processed

Would the following log line be instructive:

[Aug 20 15:39:50] NOTICE[84]: app_mrcprecog.c:282 speech_on_channel_add: (ASR-14) Channel ready codec=PCMA, sample rate=8000

OK, I misunderstood your comment about “core show channel”. Sorry.

I’ll try tomorrow with “core show channel” tab completed for each active channel.

MRCP stuff is not part of the Asterisk project, I know nothing of its details.

That isn’t a .wav file, although the actual media part of it would be the same as the .wav file you described. The .slin file contains no metadata.

It is up to the the device that does the initial A to D conversion to ensure that the full dynamic range of the digital part is used. There should be no further level changes relative to the maximum representable value.

Note that the codecs used on the PSTN transmit 8 bit data, with about a 12 bit dynamic range, using non-linear encoding. If you look at the numerical values transmitted, without re-linearising, you will see much lower values.

It is one of the advantages of digital that peak amplitude is preserved.

Asterisk would assume that a .WAV file was GSM encoded, not linear PCM. For it to treat a file as having the characteristics you describe, it would have t be a .WAV file. In practice, it has used a .slin file, and hasn’t touched either .WAV or .wav files. It looks like you had a pre-existing .slin file.

As others have said…there is no volume normalization or modification of volume within asterisk. However that’s not to say the inherent nature of ulaw/alaw won’t cause some values to not get perfectly mapped. It’s not a lossless conversion process. Usually…by default…ulaw/alaw are enabled and preferred codecs within Asterisk…I think..it’s been forever since I just let a default config run.

The files are usually converted in realtime as needed. I can tell you from converting .wav directly to ulaw….there will be level changes.

I don’t know enough about how you’re measuring this to know if the problem is elsewhere. If you’re using a softphone and just looking at the levels in a DAW from it’s output…there may be other leveling adjustment going on within that.

The only way to know for sure is to have a piece of software make a sip connection and write the raw audio it receives. Otherwise…there are too many pieces at play to be able to say specifically where it is.

Thanks for all your contributions.

@david551 : I’m sure we don’t have any .slin files in our asterisk container. The file streamed is a .wav 8kHz mono s16le encoded using ffmpeg from either wav or pcm sources.

As suggested by @jcolp, here is the result of “core show channel” command using CLI when streaming the “problematic” audio file:

core show channel log 9ef1ed832d53*CLI> core show channel Local/123@callbackout-00000011;1 Local/123@callbackout-00000011;2 PJSIP/local-00000022 PJSIP/local-00000023 9ef1ed832d53*CLI> core show channel Local/123@callbackout-00000011;1 -- General -- Name: Local/123@callbackout-00000011;1 Type: Local UniqueID: 1755778430.119 LinkedID: 1755778430.119 Caller ID: 123 Caller ID Name: (N/A) Connected Line ID: 124 Connected Line ID Name: (N/A) Eff. Connected Line ID: 124 Eff. Connected Line ID Name: (N/A) DNID Digits: (N/A) Language: en State: Up (6) NativeFormats: (slin192) WriteFormat: slin ReadFormat: slin192 WriteTranscode: Yes (slin@8000)->(slin@192000) ReadTranscode: No Time to Hangup: 0 Elapsed Time: 0h0m7s Bridge ID: (Not bridged) -- PBX -- Context: call-file-test Extension: 10 Priority: 3 Call Group: 0 Pickup Group: 0 Application: Playback Data: /opt/s3/audio/h2b-api-tests/files_numbers/file_number10_22729230965 Call Identifer: [C-00000035] Variables: mrcp_options=f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal dbid=f28219b1-ff79-49d3-be8d-9dc2ad28148f audiofile=h2b-api-tests/files_numbers/file_number10_22729230965 grammar=builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6} CDR Variables: level 1: clid="" <123> level 1: src=123 level 1: dst=10 level 1: dcontext=call-file-test level 1: channel=Local/123@callbackout-00000011;1 level 1: lastapp=Playback level 1: lastdata=/opt/s3/audio/h2b-api-tests/files_numbers/file_number10_22729230965 level 1: start=1755778430.766762 level 1: answer=1755778430.767201 level 1: end=0.000000 level 1: duration=6 level 1: billsec=6 level 1: disposition=8 level 1: amaflags=3 level 1: uniqueid=1755778430.119 level 1: linkedid=1755778430.119 level 1: sequence=68 -- Streams -- Name: audio-0 Type: audio State: sendrecv Group: -1 Formats: (slin192) Metadata: 9ef1ed832d53*CLI> core show channel Local/123@callbackout-00000011;1 Local/123@callbackout-00000011;2 PJSIP/local-00000022 PJSIP/local-00000023 9ef1ed832d53*CLI> core show channel Local/123@callbackout-00000011;2 -- General -- Name: Local/123@callbackout-00000011;2 Type: Local UniqueID: 1755778430.120 LinkedID: 1755778430.119 Caller ID: 124 Caller ID Name: (N/A) Connected Line ID: 123 Connected Line ID Name: (N/A) Eff. Connected Line ID: 123 Eff. Connected Line ID Name: (N/A) DNID Digits: (N/A) Language: en State: Up (6) NativeFormats: (slin192) WriteFormat: slin192 ReadFormat: slin192 WriteTranscode: No ReadTranscode: No Time to Hangup: 0 Elapsed Time: 0h0m14s Bridge ID: a5021558-ad83-4912-a6cf-aa4faaad492b -- PBX -- Context: callbackout Extension: 123 Priority: 6 Call Group: 0 Pickup Group: 0 Application: Dial Data: PJSIP/200@local,1,b(predialhandler^addheader^1) Call Identifer: [C-00000034] Variables: BRIDGEPVTCALLID=af48a75e-9e74-4d33-80e4-224298dbe5b7 BRIDGEPEER=PJSIP/local-00000022 DIALEDPEERNUMBER=200@local DIALEDPEERNAME=PJSIP/local-00000022 DIALSTATUS=ANSWER PROGRESSTIME_MS= PROGRESSTIME= RINGTIME_MS= RINGTIME= DIALEDTIME_MS= DIALEDTIME= ANSWEREDTIME_MS= ANSWEREDTIME= MRCP_OPTIONS=f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal GRAMMAR=builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6} DBID=f28219b1-ff79-49d3-be8d-9dc2ad28148f AUDIOFILENAME=h2b-api-tests/files_numbers/file_number10_22729230965 mrcp_options=f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal dbid=f28219b1-ff79-49d3-be8d-9dc2ad28148f audiofile=h2b-api-tests/files_numbers/file_number10_22729230965 grammar=builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6} CDR Variables: level 1: clid="" <124> level 1: src=124 level 1: dst=123 level 1: dcontext=callbackout level 1: channel=Local/123@callbackout-00000011;2 level 1: dstchannel=PJSIP/local-00000022 level 1: lastapp=Dial level 1: lastdata=PJSIP/200@local,1,b(predialhandler^addheader^1) level 1: start=1755778430.766871 level 1: answer=1755778430.767516 level 1: end=0.000000 level 1: duration=14 level 1: billsec=14 level 1: disposition=8 level 1: amaflags=3 level 1: uniqueid=1755778430.120 level 1: linkedid=1755778430.119 level 1: sequence=69 -- Streams -- Name: audio-0 Type: audio State: sendrecv Group: -1 Formats: (slin192) Metadata: 9ef1ed832d53*CLI> core show channel Local/123@callbackout-00000011;1 Local/123@callbackout-00000011;2 PJSIP/local-00000022 PJSIP/local-00000023 9ef1ed832d53*CLI> core show channel PJSIP/local-00000022 -- General -- Name: PJSIP/local-00000022 Type: PJSIP UniqueID: 1755778431.121 LinkedID: 1755778430.119 Caller ID: 123 Caller ID Name: (N/A) Connected Line ID: 124 Connected Line ID Name: (N/A) Eff. Connected Line ID: 124 Eff. Connected Line ID Name: (N/A) DNID Digits: (N/A) Language: fr State: Up (6) NativeFormats: (alaw) WriteFormat: slin192 ReadFormat: slin192 WriteTranscode: Yes (slin@192000)->(slin@8000)->(alaw@8000) ReadTranscode: Yes (alaw@8000)->(slin@8000)->(slin@192000) Time to Hangup: 0 Elapsed Time: 0h0m21s Bridge ID: a5021558-ad83-4912-a6cf-aa4faaad492b -- PBX -- Context: from-local Extension: Priority: 1 Call Group: 0 Pickup Group: 0 Application: AppDial Data: (Outgoing Line) Call Identifer: [C-00000034] Variables: BRIDGEPEER=Local/123@callbackout-00000011;2 GOSUB_RETVAL= DIALEDPEERNUMBER=200@local MRCP_OPTIONS=f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal GRAMMAR=builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6} DBID=f28219b1-ff79-49d3-be8d-9dc2ad28148f AUDIOFILENAME=h2b-api-tests/files_numbers/file_number10_22729230965 CDR Variables: level 1: clid="" <123> level 1: src=123 level 1: dcontext=from-local level 1: channel=PJSIP/local-00000022 level 1: lastapp=AppDial level 1: lastdata=(Outgoing Line) level 1: start=1755778431.268525 level 1: answer=1755778431.280846 level 1: end=1755778431.281094 level 1: duration=0 level 1: billsec=0 level 1: disposition=8 level 1: amaflags=3 level 1: uniqueid=1755778431.121 level 1: linkedid=1755778430.119 level 1: sequence=70 -- Streams -- Name: audio-0 Type: audio State: sendrecv Group: -1 Formats: (alaw) Metadata: 9ef1ed832d53*CLI> core show channel Local/123@callbackout-00000011;1 Local/123@callbackout-00000011;2 PJSIP/local-00000022 PJSIP/local-00000023 9ef1ed832d53*CLI> core show channel PJSIP/local-00000023 -- General -- Name: PJSIP/local-00000023 Type: PJSIP UniqueID: 1755778431.122 LinkedID: 1755778431.122 Caller ID: 124 Caller ID Name: (N/A) Connected Line ID: (N/A) Connected Line ID Name: (N/A) Eff. Connected Line ID: (N/A) Eff. Connected Line ID Name: (N/A) DNID Digits: 200 Language: fr State: Up (6) NativeFormats: (alaw) WriteFormat: alaw ReadFormat: alaw WriteTranscode: No ReadTranscode: No Time to Hangup: 0 Elapsed Time: 0h0m27s Bridge ID: (Not bridged) -- PBX -- Context: from-local Extension: 200 Priority: 8 Call Group: 0 Pickup Group: 0 Application: MRCPRecog Data: builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6}, f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal Call Identifer: [C-00000036] Variables: count_channel=1 mrcp_options=f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal grammar=builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6} dbid=f28219b1-ff79-49d3-be8d-9dc2ad28148f audiofilename=h2b-api-tests/files_numbers/file_number10_22729230965 SIPDOMAIN=127.0.0.1 CDR Variables: level 1: dnid=200 level 1: clid="" <124> level 1: src=124 level 1: dst=200 level 1: dcontext=from-local level 1: channel=PJSIP/local-00000023 level 1: lastapp=MRCPRecog level 1: lastdata=builtin:speech/spelling/digits?regex=[1-3][0-9](490|493|606|723|729)[0-9]{6}, f=beep&p=uni2&uer=1&nit=40000&t=30000&sct=1000&sint=3000&hmind=50&hmaxd=30000&ct=0.5&sl=0.5&spl=fr&rm=normal level 1: start=1755778431.277987 level 1: answer=1755778431.278607 level 1: end=0.000000 level 1: duration=26 level 1: billsec=26 level 1: disposition=1 level 1: amaflags=3 level 1: uniqueid=1755778431.122 level 1: linkedid=1755778431.122 level 1: sequence=71 -- Streams -- Name: audio-0 Type: audio State: sendrecv Group: -1 Formats: (alaw) Metadata:

Sorry, the add details messed up the log visualization

OK. I used Monitor function in my dialplan, and the original file (the wav file) is clearly louder than the one that asterisk outputs.

Using VOLUME function with a delirious value of 10 make the ouput file much louder.

It’s going through a resampe from 8kHz to 192kHz, presumably because you’ve created a Local channel without informing it of the format to use. How is the channel being created/initiated?

Thanks for the answer.

How is the channel being created/initiated ?

As I told you, I’m really not an expert in Asterisk (this is a part of the pipeline I’m working on). So I might

I atttached the conf files I have access to, hoping they are relevant to answer the question:

extensions.conf (dialplan)

[global]

[default]
;Empty for security

[call-file-test]
exten => _X.,1,Verbose(1,${CONTEXT} - ${STRFTIME(${EPOCH},%C%y-%m-%d %H:%M:%S)} - ${CALLERID(ALL)} - ${EXTEN} - ${remoteid})
same => n,Wait(1)
;same=> n, Set(VOLUME(tx)=10)
same=> n, Monitor(wav,/tmp/${CDR(uniqueid)})
same => n,Playback(/opt/s3/audio/${audiofile})
same => n,Wait(4)
same => n,Hangup()

[callbackout]
exten => _123,1,Answer()
same => n,Set(__AUDIOFILENAME=${audiofile})
same => n,Set(__DBID=${dbid})
same => n,Set(__GRAMMAR=${grammar})
same => n,Set(__MRCP_OPTIONS=${mrcp_options})
;same => n,Set(GROUP()=ast1)
;same => n,Set(__COUNT_CHANNEL=${GROUP_COUNT(ast1)})
same => n,Dial(PJSIP/200@local,1,b(predialhandler^addheader^1))

[from-local]

exten => _200,1,Verbose(1,${CONTEXT} - ${STRFTIME(${EPOCH},%C%y-%m-%d %H:%M:%S)} - ${CALLERID(ALL)} - ${EXTEN} – ${remoteid})
;; Asterisk Manual
; asterisk-unimrcp/app-unimrcp/app_mrcprecog.c at master · unispeech/asterisk-unimrcp · GitHub
same => n,Set(__audiofilename=${PJSIP_HEADER(read,X-AudioName)})
same => n,Set(__dbid=${PJSIP_HEADER(read,X-DbId)})
same => n,Set(__grammar=${PJSIP_HEADER(read,X-Grammar)})
same => n,Set(__mrcp_options=${PJSIP_HEADER(read,X-mrcp-options)})
same => n,Set(GROUP()=ast1)
same => n,Set(__count_channel=${GROUP_COUNT(ast1)})
same => n,MRCPRecog(${grammar}, ${mrcp_options})
same => n,Hangup

exten => h,1,AGI(/opt/scripts/send_data.py)

[predialhandler]
exten => addheader,1,Set(PJSIP_HEADER(add,X-AudioName)=${AUDIOFILENAME})
same => n,Set(PJSIP_HEADER(add,X-DbId)=${DBID})
same => n,Set(PJSIP_HEADER(add,X-Grammar)=${GRAMMAR})
same => n,Set(PJSIP_HEADER(add,X-mrcp-options)=${MRCP_OPTIONS})
same => n,return()

mrcp.conf

[general]
; Default ASR and TTS profiles.
default-asr-profile = speech-nuance5-mrcp2
default-tts-profile = speech-nuance5-mrcp2
; UniMRCP logging level to appear in Asterisk logs. Options are:
; EMERGENCY|ALERT|CRITICAL|ERROR|WARNING|NOTICE|INFO|DEBUG →
log-level = DEBUG
max-connection-count = 100
max-shared-count = 100
offer-new-connection = 1
; rx-buffer-size = 1024
; tx-buffer-size = 1024
; request-timeout = 5000
; speech-channel-timeout = 30000

;
; Profile for Nuance Speech Server MRCPv1
;
[speech-nuance5-mrcp1]
; MRCP version.
version = 1

; === RTSP settings ===
; Must be set to the IP address of the MRCP server.
server-ip = XXX.XXX.XXX.XXX
; RTSP port on the MRCP server.
server-port = 4900
; force-destination = 1
resource-location = media
speechsynth = speechsynthesizer
speechrecog = speechrecognizer

; === RTP factory ===
; rtp-ip = 0.0.0.0
; Must be set to the IP address of the MRCP client.
rtp-ip = XXX.XXX.XXX.XXX
; rtp-ext-ip = auto
; RTP port range on the MRCP client.
rtp-port-min = 4000
rtp-port-max = 5000

; === Jitter buffer settings ===
playout-delay = 50
; min-playout-delay = 20
max-playout-delay = 200

; === RTP settings ===
ptime = 20
codecs = L16/96/8000 telephone-event/101/8000

; === RTCP settings ===
rtcp = 1
rtcp-bye = 2
rtcp-tx-interval = 5000
rtcp-rx-resolution = 1000

;
; Profile for Nuance Speech Server MRCPv2
;
[uni2]
; MRCP version.
version = 2

; === SIP settings ===
; Must be set to the IP address of the MRCP server.
; PREPROD
server-ip = XXX.XXX.XXX.XXX
; PROD
;server-ip = XXX.XXX.XXX.XXX
; SIP port on the MRCP server.
server-port = 8060
; server-username = test
force-destination = 0

; === SIP agent ===
client-ip = 0.0.0.0
; Must be set to the IP address of the MRCP client.
; client-ext-ip = auto
; SIP port on the MRCP client.
client-port = 5093
; SIP transport either UDP or TCP.
sip-transport = tcp
; ua-name = Asterisk
sdp-origin = stresstester
; sip-t1 = 500
; sip-t2 = 4000
; sip-t4 = 4000
; sip-t1x64 = 32000
; sip-timer-c = 185000

; === RTP factory ===
rtp-ip = 0.0.0.0
; Must be set to the IP address of the MRCP client.
; rtp-ext-ip = auto
; RTP port range on the MRCP client.
rtp-port-min = 4000
rtp-port-max = 5000

; === Jitter buffer settings ===
playout-delay = 50
; min-playout-delay = 20
max-playout-delay = 200

; === RTP settings ===
ptime = 20
codecs = L16/96/8000 telephone-event/101/8000

; === RTCP settings ===
rtcp = 1
rtcp-bye = 2
rtcp-tx-interval = 5000
rtcp-rx-resolution = 1000

pjsip.conf

; PJSIP Configuration
;[global]

[general]
bindport = 5060

[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0:5060

[local]
type = aor
contact = sip:127.0.0.1
qualify_frequency = 60
default_expiration = 1800

[local]
type = identify
endpoint = local
match = 127.0.0.1

[local]
type = endpoint
context = from-local
dtmf_mode = rfc4733
disallow = all
allow = alaw
direct_media = no
language = fr
aors = local