Handling of 100 ms speex data from a device

A device is transmitting audio speex data in 100 ms intervals. To convert this into 20 ms audio data sent to a softphone at 20 ms intervals, what steps should be taken?

We are using Asterisk server to establish audio communication between 2 sets of devices. We have used version 16.10.0 of Asterisk to configure our server. The audio calls work successfully when a device transmits audio packets using Speex codec and each packet carrying 20 ms audio data. Now, the situation has become a bit tricky with addition of another device that has a constraint to send audio data in packet of 100 ms duration only. The 100 ms packets are rejected from devices/PSTN phones at other end, which is making it more challenging.
The solution we have thought upon is to break the 100 ms packets in 5 packets (each worth 20ms audio duration) at Asterisk server. Is this something that can be achieved using any configuration at Asterisk server or do we need to make changes in code?
Any help in this regard will be appreciated.

I suspect you are going to have to try it. If the device works with the Hello world and echo examples, it means that Asterisk is prepared to accept that packetisation. If it does that, it will repacketise to whatever is negotiated on the on the other side. I suppose it is just possible that you will need to force something other than Speex, on the other side, to force transcoding. I don’t know if the frame structure is represented in Speex itself, or just the RTP. If it is represented in Speex, Asterisk will need to transcode out and then transcode back.

Splitting frames is standard for Asterisk, the risks here are that it doesn’t cope with such huge frames for Speex, or that it tries to optimise Speex to Speex cases.

I would suggest that 100ms is far too high for acceptable latency and therefore echo behaviour. What device should we avoid buying?

1 Like

Thanks David for your inputs. 100ms setting is at one end where existing device is used which is difficult to change. Per your suggestion splitting frame is standard for Asterisk and even we thought so but eventually its not happening. We tried few other things:

  • changing “maximum_ms” in codec_builtin.c to 20 ms

  • Setting codec order like alaw:20,ulaw:20 for pjsip configuration for out bound trunk

None of these worked. Any suggestion to change code will be greatly appreciated.

Please note that we are using Asterix v16.10.0.

You are using an unsupported version with known bugs, including security vulnerabilities.

What debugging information do you get for the codec negotiation. I would expect the large ptime to be accepted, or rejected, and if accepted, to be transcoded when going to non-Speex destinations.

Thank David. I think we used the version that was available in AWS’s market place. Let me collect requested logs and get back to you in next one day. Which version do you suggest to use?

The debugging information we get for the codec negotiation as follows:

Information from log file:

[2024-04-15 11:36:51] VERBOSE[243352] chan_iax2.c: Accepting AUTHENTICATED call from 106.213.30.121:49152:
> requested format = speex,
> requested prefs = (),
> actual format = speex,
> host prefs = (g726|g726aal2|adpcm|gsm|ilbc|speex|lpc10|g729|g723),
> priority = mine
[2024-04-15 11:36:51] ERROR[243333] res_pjsip_header_funcs.c: No headers had been previously added to this session.

Information from wareshark log:

Frame 52731: 1119 bytes on wire (8952 bits), 1119 bytes captured (8952 bits) on interface eth0, id 0
Ethernet II, Src: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09), Dst: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0)
Internet Protocol Version 4, Src: 10.0.1.5, Dst: 3.80.16.189
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (INVITE)
Request-Line: INVITE sip:+447721074835@xxxxxxxxxxxxxxxxxxxxxx.voiceconnector.chime.aws SIP/2.0
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): - 1201466070 1201466070 IN IP4 xx.xxx.xxx.xx
Session Name (s): Asterisk
Connection Information (c): IN IP4 xx.xxx.xxx.xx
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 11156 RTP/AVP 8 10 0 101
Media Attribute (a): rtpmap:8 PCMA/8000
Media Attribute (a): rtpmap:10 L16/8000
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): ptime:20
Media Attribute (a): maxptime:20
Media Attribute (a): sendrecv
[Generated Call-ID: 44037c2f-449a-40b1-a838-b15376c285d3]

Frame 64082: 877 bytes on wire (7016 bits), 877 bytes captured (7016 bits) on interface eth0, id 0
Ethernet II, Src: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0), Dst: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09)
Internet Protocol Version 4, Src: 3.80.16.189, Dst: 10.0.1.5
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (183)
Status-Line: SIP/2.0 183 Session Progress
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): Sonus_UAC 816352 77608 IN IP4 3.80.17.57
Session Name (s): SIP Media Capabilities
Connection Information (c): IN IP4 3.80.17.57
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 43584 RTP/AVP 0 101
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-15
Media Attribute (a): sendrecv
Media Attribute (a): rtcp:43585
Media Attribute (a): ptime:20
[Generated Call-ID: 44037c2f-449a-40b1-a838-b15376c285d3]

Frame 65705: 897 bytes on wire (7176 bits), 897 bytes captured (7176 bits) on interface eth0, id 0
Ethernet II, Src: 02:fe:75:12:ba:b0 (02:fe:75:12:ba:b0), Dst: 02:ea:c1:10:07:09 (02:ea:c1:10:07:09)
Internet Protocol Version 4, Src: 3.80.16.189, Dst: 10.0.1.5
User Datagram Protocol, Src Port: 5060, Dst Port: 5060
Session Initiation Protocol (200)
Status-Line: SIP/2.0 200 OK
Message Header
Message Body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): Sonus_UAC 816352 77608 IN IP4 3.80.17.57
Session Name (s): SIP Media Capabilities
Connection Information (c): IN IP4 3.80.17.57
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 43584 RTP/AVP 0 101
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-15
Media Attribute (a): sendrecv
Media Attribute (a): rtcp:43585
Media Attribute (a): ptime:20
[Generated Call-ID: 44037c2f-449a-40b1-a838-b15376c285d3]

You’ve negotiated 20ms packets and no use of Speex, at all, in you SIP trace. You end up with just µ-Law.

You didn’t say you were using IAX2. I don’t think that has had much changed in its internals, for about a decade, and may well predate Speex. I don’t know how it handles alternative packetisation sizes.