I’ve got a web server that does text to speech conversions and returns the audio using the ‘chunked’ transfer-encoding, so the audio arrives a little at a time as it gets converted.
I need to play this audio on a channel via ARI, however I’m not seeing a way to call Playback with an in-memory audio buffer. Do I have to write the audio data to temporary files in order to play it on a channel, or is there some way to play it directly from memory?
Hmm… Maybe I’m supposed to use the new external media functionality for this? Should I write the incoming data packets to a UDP socket and then connect to that socket from an externalMedia channel?
The docs say RTP is the only encapsulation supported by externalMedia right now, and I’m not quite sure how to encapsulate the audio data as RTP in my node app. I’m receiving it from my TTS server in Ogg/Opus.
If I have to use RTP for external media, it might make more sense to enable my TTS server (in C++) to write RTP directly and give the externalMedia channel the address:port of my TTS server. However, I’m not very familiar with RTP. I’m using George Joseph’s asterisk_external_media example to help me figure all this out, but that example reads RTP from the externalMedia channel, and I need to write it. According to George’s source, reading RTP seems to be as simple as stripping off the 12 byte RTP header. I assume writing it should be as simple as adding some sort of 12 byte header to each block of audio.
However, I’m looking at RFC3550 for RTP and it says I need an even numbered port for RTP and an odd numbered port one higher for RTCP. But I don’t see anything about RTCP in George’s example code, so maybe that isn’t necessary?
Anyway, do I need to use external media for this situation? And, if so, could I please get a few hints about encapsulating my audio as RTP?
If there is a simpler solution to writing audio directly from memory to the channel, I would still love to hear it. But I’m moving forward with the external media solution.
I updated my text to speech server to emit the audio in ulaw over RTP (thanks to some help from the pjproject library), but I’m not hearing anything. Is there someway to debug the RTP data coming into a channel so that I can figure out why I’m not hearing anything?
If I call channel.originate instead of channel.externalMedia, my new channel joins the bridge just fine, so it looks like that’s all working correctly.
I eventually got this figured out. I ended up using the new External Media functionality, and I updated my Text to Speech server to write the audio stream as RTP. George Joseph has an excellent external media example at https://github.com/asterisk/asterisk-external-media, but it only receives RTP from Asterisk, it doesn’t send anything back. As it turns out, sending RTP is much, much harder than receiving it. The difficulty for newcomers is the timing. You can’t just blast all your data at the client at once; you have to send it in packets of about 160 bytes, carefully timed to arrive about every 20 milliseconds.
Anyway, it’s all working well now, and I’m glad I got to try out the new external media interface and learn a bit more about RTP.