Using custom TTS from external server with ARI

We have a cusome TTS engine (SMI) which runs on a Windows server, which we need to use with our ARI application (Asterisk 13). Currently what I do is create a PCM file and make Asterisk play it. Problem is, it takes too much time since I need to wait for the TTS server to fully synthesize the whole sentense before passing it to Asterisk. I thought of 2 options to solve this, but have no idea how to do that with Asterisk/ARI:

  1. Make Asterisk start play the file when its first chunk is ready, while I continue writing to it the rest of the speech. Tried that, but it seems Asterisk just uses whatever is on the file while the play command was issued, and ignores the rest.
  2. Stream the results to Asterisk. Could not find anything about how to do that.

Do you have any idea how to use one of these ideas, or better, have another idea?

Eyal Hasson.

I tried playing the stream with http on Asterisk 16 (the TTS engine provides a http stream which it writes the synthisized audio into). It seems Asterisk waits for the whole stream to complete before starting palyback, so no time saving here. Is there a way to force Asterisk to immediatly start playback?

Hi @eyalhasson

Seeing as you’re already using the ARI could you use the new External Media stuff in ARI? It allows you to write audio into Asterisk instead of playing back file after file. I used it and wrote about it here - and in the next couple of weeks I’ll be demo’ing typing Dialogflow up to Asterisk using it. You’ll want the AudioServer part of it which lives on github - you probably wouldnt even need to change your tts application at all, just have this audioserver call the URI instead and use the ability in whatever language you use (doesnt have to be node) to read the response before its finished downloading and push it straight back into Asterisk.

1 Like

Hello @danjenkins,

Thanks - I was not aware of this, and it is really something long time needed. Does your AudioServer support streaming into Asterisk also?

The code currently isn’t there to do so but will be in about 2 weeks time (I’ll be giving that talk at ITExpo about using dialogflow). But if you wanted to do it, you absolutely can, where I read from the stream and then write out to Google’s Speech To Text engine, you’d want to write to it, simple as that really, sure, you’d want to do some buffering going out too so that you didnt chuck a load of media down that udp socket but in essence it should be that simple. Get your media via http, pipe the stream from the http response to the audio stream I’m reading from :slight_smile:

Hello @danjenkins,

I started playing with the External Media support as you suggested, using a C# written UDP server. I am able to send back my audio, but it is not heard on the channel. I think because I have to wrap it with RTP protocol. Am I correct? If so, can you direct me to an example of how to do that?

Hey @eyalhasson I’m literally just about to write something that does this with node for my demo at ITExpo next week so I’ll get back to you soon (or nudge me in a couple of days if I havent gotten back to you)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.