Handling of 100 ms speex data from a device

A device is transmitting audio speex data in 100 ms intervals. To convert this into 20 ms audio data sent to a softphone at 20 ms intervals, what steps should be taken?

We are using Asterisk server to establish audio communication between 2 sets of devices. We have used version 16.10.0 of Asterisk to configure our server. The audio calls work successfully when a device transmits audio packets using Speex codec and each packet carrying 20 ms audio data. Now, the situation has become a bit tricky with addition of another device that has a constraint to send audio data in packet of 100 ms duration only. The 100 ms packets are rejected from devices/PSTN phones at other end, which is making it more challenging.
The solution we have thought upon is to break the 100 ms packets in 5 packets (each worth 20ms audio duration) at Asterisk server. Is this something that can be achieved using any configuration at Asterisk server or do we need to make changes in code?
Any help in this regard will be appreciated.

I suspect you are going to have to try it. If the device works with the Hello world and echo examples, it means that Asterisk is prepared to accept that packetisation. If it does that, it will repacketise to whatever is negotiated on the on the other side. I suppose it is just possible that you will need to force something other than Speex, on the other side, to force transcoding. I don’t know if the frame structure is represented in Speex itself, or just the RTP. If it is represented in Speex, Asterisk will need to transcode out and then transcode back.

Splitting frames is standard for Asterisk, the risks here are that it doesn’t cope with such huge frames for Speex, or that it tries to optimise Speex to Speex cases.

I would suggest that 100ms is far too high for acceptable latency and therefore echo behaviour. What device should we avoid buying?

1 Like

Thanks David for your inputs. 100ms setting is at one end where existing device is used which is difficult to change. Per your suggestion splitting frame is standard for Asterisk and even we thought so but eventually its not happening. We tried few other things:

  • changing “maximum_ms” in codec_builtin.c to 20 ms

  • Setting codec order like alaw:20,ulaw:20 for pjsip configuration for out bound trunk

None of these worked. Any suggestion to change code will be greatly appreciated.

Please note that we are using Asterix v16.10.0.

You are using an unsupported version with known bugs, including security vulnerabilities.

What debugging information do you get for the codec negotiation. I would expect the large ptime to be accepted, or rejected, and if accepted, to be transcoded when going to non-Speex destinations.

Thank David. I think we used the version that was available in AWS’s market place. Let me collect requested logs and get back to you in next one day. Which version do you suggest to use?

The debugging information we get for the codec negotiation as follows:

Information from log file:

[2024-04-15 11:36:51] VERBOSE[243352] chan_iax2.c: Accepting AUTHENTICATED call from 106.213.30.121:49152:
> requested format = speex,
> requested prefs = (),
> actual format = speex,
> host prefs = (g726|g726aal2|adpcm|gsm|ilbc|speex|lpc10|g729|g723),
> priority = mine
[2024-04-15 11:36:51] ERROR[243333] res_pjsip_header_funcs.c: No headers had been previously added to this session.

Information from wareshark log:

Frame 52731: 1119 bytes on wire (8952 bits), 1119 bytes captured (8952 bits) on interface eth0, id 0
Ethernet II, Src: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09), Dst: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0)
Internet Protocol Version 4, Src: 10.0.1.5, Dst: 3.80.16.189
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (INVITE)
Request-Line: INVITE sip:+447721074835@xxxxxxxxxxxxxxxxxxxxxx.voiceconnector.chime.aws SIP/2.0
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): - 1201466070 1201466070 IN IP4 xx.xxx.xxx.xx
Session Name (s): Asterisk
Connection Information (c): IN IP4 xx.xxx.xxx.xx
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 11156 RTP/AVP 8 10 0 101
Media Attribute (a): rtpmap:8 PCMA/8000
Media Attribute (a): rtpmap:10 L16/8000
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): ptime:20
Media Attribute (a): maxptime:20
Media Attribute (a): sendrecv
[Generated Call-ID: 44037c2f-449a-40b1-a838-b15376c285d3]

Frame 64082: 877 bytes on wire (7016 bits), 877 bytes captured (7016 bits) on interface eth0, id 0
Ethernet II, Src: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0), Dst: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09)
Internet Protocol Version 4, Src: 3.80.16.189, Dst: 10.0.1.5
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (183)
Status-Line: SIP/2.0 183 Session Progress
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): Sonus_UAC 816352 77608 IN IP4 3.80.17.57
Session Name (s): SIP Media Capabilities
Connection Information (c): IN IP4 3.80.17.57
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 43584 RTP/AVP 0 101
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-15
Media Attribute (a): sendrecv
Media Attribute (a): rtcp:43585
Media Attribute (a): ptime:20
[Generated Call-ID: 44037c2f-449a-40b1-a838-b15376c285d3]

Frame 65705: 897 bytes on wire (7176 bits), 897 bytes captured (7176 bits) on interface eth0, id 0
Ethernet II, Src: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0), Dst: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09)
Internet Protocol Version 4, Src: 3.80.16.189, Dst: 10.0.1.5
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (200)
Status-Line: SIP/2.0 200 OK
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): Sonus_UAC 816352 77608 IN IP4 3.80.17.57
Session Name (s): SIP Media Capabilities
Connection Information (c): IN IP4 3.80.17.57
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 43584 RTP/AVP 0 101
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-15
Media Attribute (a): sendrecv
Media Attribute (a): rtcp:43585
Media Attribute (a): ptime:20
[Generated Call-ID: 44037c2f-449a-40b1-a838-b15376c285d3]

You’ve negotiated 20ms packets and no use of Speex, at all, in you SIP trace. You end up with just µ-Law.

You didn’t say you were using IAX2. I don’t think that has had much changed in its internals, for about a decade, and may well predate Speex. I don’t know how it handles alternative packetisation sizes.

Thank you for your insightful reply. Apologies for not mentioning earlier that the Asterisk server is utilizing IAX2. It’s important to note that Asterisk employs IAX2 solely for communication with the external device. When Asterisk communicates with AWS Chime, it utilizes PJSIP instead of IAX2.

The previously provided log reflects changes made to the “maximum_ms” parameter in codec_builtin.c, set to 20 ms.

Below are the original logs without this modification:

Example 1:
This log demonstrates the use of Speex and its successful operation for a UK number. When an external device initiates a call to a UK number with a VoIP audio frame size of 60 ms, the Wireshark log is as follows:

==============================================================================================================
Frame 35527: 1266 bytes on wire (10128 bits), 1266 bytes captured (10128 bits) on interface eth0, id 0
Ethernet II, Src: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09), Dst: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0)
Internet Protocol Version 4, Src: 10.0.1.5, Dst: 3.80.16.120
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (INVITE)
Request-Line: INVITE sip:+447440396811@xxxxxxxxxxxxxxxxxxxxxx.voiceconnector.chime.aws SIP/2.0
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): - 1664509304 1664509304 IN IP4 xx.xxx.xxx.xx
Session Name (s): Asterisk
Connection Information (c): IN IP4 xx.xxx.xxx.xx
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 10964 RTP/AVP 110 117 119 10 0 8 4 107 101
Media Attribute (a): rtpmap:110 speex/8000
Media Attribute (a): rtpmap:117 speex/16000
Media Attribute (a): rtpmap:119 speex/32000
Media Attribute (a): rtpmap:10 L16/8000
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:8 PCMA/8000
Media Attribute (a): rtpmap:4 G723/8000
Media Attribute (a): rtpmap:107 opus/48000/2
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): ptime:20
Media Attribute (a): maxptime:60
Media Attribute (a): sendrecv
[Generated Call-ID: bfe9715b-c91a-4b96-b8ba-2e55e1b14e2e]

Frame 36210: 880 bytes on wire (7040 bits), 880 bytes captured (7040 bits) on interface eth0, id 0
Ethernet II, Src: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0), Dst: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09)
Internet Protocol Version 4, Src: 3.80.16.120, Dst: 10.0.1.5
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (183)
Status-Line: SIP/2.0 183 Session Progress
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): Sonus_UAC 450856 699679 IN IP4 3.80.17.191
Session Name (s): SIP Media Capabilities
Connection Information (c): IN IP4 3.80.17.191
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 29098 RTP/AVP 0 101
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-15
Media Attribute (a): sendrecv
Media Attribute (a): rtcp:29099
Media Attribute (a): ptime:20
[Generated Call-ID: bfe9715b-c91a-4b96-b8ba-2e55e1b14e2e]

Example 2:
Here is another log without the use of Speex, working for India. When an external device initiates a call to an Indian number with a VoIP audio frame size of 100 ms, the Wireshark log is as follows:

Frame 18705: 1167 bytes on wire (9336 bits), 1167 bytes captured (9336 bits) on interface eth0, id 0
Ethernet II, Src: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09), Dst: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0)
Internet Protocol Version 4, Src: 10.0.1.5, Dst: 3.80.16.4
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (INVITE)
Request-Line: INVITE sip:+919433878492@xxxxxxxxxxxxxxxxxxxxxx.voiceconnector.chime.aws SIP/2.0
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): - 628517922 628517922 IN IP4 xx.xxx.xxx.xx
Session Name (s): Asterisk
Connection Information (c): IN IP4 xx.xxx.xxx.xx
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 10970 RTP/AVP 18 8 10 0 101
Media Attribute (a): rtpmap:18 G729/8000
Media Attribute (a): fmtp:18 annexb=no
Media Attribute (a): rtpmap:8 PCMA/8000
Media Attribute (a): rtpmap:10 L16/8000
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): ptime:20
Media Attribute (a): maxptime:70
Media Attribute (a): sendrecv
[Generated Call-ID: 7eab4154-4766-476b-99d7-80590d5f2dff]

Frame 19480: 887 bytes on wire (7096 bits), 887 bytes captured (7096 bits) on interface eth0, id 0
Ethernet II, Src: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0), Dst: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09)
Internet Protocol Version 4, Src: 3.80.16.4, Dst: 10.0.1.5
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (183)
Status-Line: SIP/2.0 183 Session progress
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): root 2678167 2678167 IN IP4 3.80.17.52
Session Name (s): Twilio Media Gateway
Connection Information (c): IN IP4 3.80.17.52
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 57228 RTP/AVP 0 101
Media Attribute (a): maxptime:150
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): sendrecv
Media Attribute (a): rtcp:57229
Media Attribute (a): ptime:20
[Generated Call-ID: 7eab4154-4766-476b-99d7-80590d5f2dff]

However, when an external device calls a UK number with a VoIP audio frame size of 100 ms, it only works one way (the external device can listen, but the UK end cannot). The log resembles that of the Indian log, except that the maxptime attribute is missing from the SIP/SDP response from the AWS Chime server. Different destination numbers may have different maxptime values in their responses. Is there any special handling required for this in the Asterisk side? Please advise.