Hello,
I’m trying to create an application connected to Asterisk using External Media. The application follows this scheme:
The difference is that I’d like to process the transcription. Use text-to-speech from cloud and again use RTP to stream audio back to the channel.
On the Asterisk side, phparia (GitHub - wormling/phparia: Framework for creating ARI (Asterisk REST Interface) applications.) is used to initiate the external media channeling the sound over RTP to my application. In my application, I wrote a simple jitter buffer and forward the media stream to STT. After few more steps, I take the text response, generate audio file and stream it back over RTP. Again the RTP transmitter is written by hand. It simply cuts the audio file, wraps the chunks into RTP and sends them to Asterisk.
It almost works and prints this to the output:
[Feb 1 16:04:21.626] VERBOSE[3431966] dial.c: Called 10.10.7.125:42573
[Feb 1 16:04:21.627] VERBOSE[3431966] dial.c: UnicastRTP/10.10.7.125:42573-0x14ebe8e3a040 answered
[Feb 1 16:04:21.627] VERBOSE[3431966] ari/resource_channels.c: Launching Stasis(filter_1,mediaresend,{\"callName\":\"65bbb2f54a7899.39571233\"}) on UnicastRTP/10.10.7.125:42573-0x14ebe8e3a040
[Feb 1 16:04:21.780] VERBOSE[3431965][C-00000001] res_rtp_asterisk.c: 0x14ebe64c6000 -- Strict RTP qualifying stream type: audio
[Feb 1 16:04:21.827] VERBOSE[3431967] bridge_channel.c: Channel UnicastRTP/10.10.7.125:42573-0x14ebe8e3a040 joined 'simple_bridge' stasis-bridge <65bbb2f54a7899.39571233>
[Feb 1 16:04:21.834] VERBOSE[3431965][C-00000001] res_rtp_asterisk.c: 0x14ebe64c6000 -- Strict RTP switching source address to 172.23.254.27:4010
[Feb 1 16:04:22.401] VERBOSE[3431967] res_rtp_asterisk.c: 0x14ebe8e3d000 -- Strict RTP qualifying stream type: <unknown>
[Feb 1 16:04:22.571] VERBOSE[3431967] res_rtp_asterisk.c: 0x14ebe8e3d000 -- Strict RTP qualifying stream type: <unknown>
[Feb 1 16:04:22.741] VERBOSE[3431967] res_rtp_asterisk.c: 0x14ebe8e3d000 -- Strict RTP qualifying stream type: <unknown>
[Feb 1 16:04:22.911] VERBOSE[3431967] res_rtp_asterisk.c: 0x14ebe8e3d000 -- Strict RTP qualifying stream type: <unknown>
[Feb 1 16:04:22.911] VERBOSE[3431967] res_rtp_asterisk.c: 0x14ebe8e3d000 -- Strict RTP switching source address to 10.10.7.125:50370
The only problem is that the sound is corrupted. I can understand it, but it is noisy.
I tried to capture the network traffic and play it with wireshark and it sounds better:
The same result, clear sound, can be achieved when streaming the RTP data into ffmpeg.
I tried to understadnd the RTP receiver in Asterisk (res/res_rtp_asterisk.c) but it’s been a struggle so far.
I use 8bit, 8kHz, alaw encoding everywhere. The audio signal coming from Asterisk to my application is fine, only the other way is corrupted.
Can I expect Asterisk to restore the audio signal received over UDP/RTP? Is there a jitter buffer on the input or should I configure it or provide the UDP packets in order? Or am I missing something?
Hope, the explanation makes sense and thanks for any ideas,
Martin