AGI for Speech Syntesis

goncaloara · March 26, 2024, 5:14pm

Hello,

I have come across some APIs like AEAP for Speech-to-Text development in Asterisk. Since AEAP likely doesn’t include Text-to-Speech capabilities, could I use AGI to implement TTS alongside AEAP’s STT? If feasible, what potential complications might arise from combining these APIs, and what are the primary drawbacks of opting for AGI over AEAP for TTS/STT? Is there a better option other than these two?

Thanks

jcolp · March 26, 2024, 5:19pm

The most you could do in AGI is play back a generated file, which works perfectly fine for people…

ldo · March 26, 2024, 8:12pm

If you can generate the audio fast enough, you could feed it into a named pipe and tell Asterisk to play it back in real time from there.

nshmyrev · March 26, 2024, 8:23pm

It depends on how ambitious you are. If you consider advanced use cases with human-like bot, likely you will see many limitations of AGI. Few examples:

Modern LLMs generate responses pretty slowly, like 3-5 tokens per second. If you want to cooperate with LLM, you likely need a streaming TTS, not a file-based TTS. This means simple Playback app not going to work
You likely need to implement barge-in to interrupt TTS properly.And intelligent barge-in so simple cough won’t stop the TTS. A tight integration between TTS and ASR is required then.
AEAP is also quite limited and doesn’t cover important usecases like in-conference assistance.

Given that many advanced dialog systems ended in custom module for ASR/TTS instead of existing UniMRCP, AGI, ARI, AEAP implementations. But for simple systems they are perfectly fine.

goncaloara · March 27, 2024, 10:48am

Thanks for you explanation @nshmyrev !
Despite UniMRCP, AGI, ARI, and AEAP implementations lacking real-time translation support, what are the primary advantages of choosing an UniMRCP implementation over AGI or ARI for example? The ease of implementation, given that there’s no need to build the system from scratch, or the capacity to efficiently handle a large number of calls?

Thank you.

system · April 26, 2024, 10:49am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Asterisk 15, Jack, streams, speech recognition... so many questions! Asterisk APIs	24	9112	September 19, 2018
API Voice doubts Asterisk APIs	4	534	June 19, 2020
Speech-to-Text and Text-to-Speech with Asterisk Asterisk APIs	4	1493	April 17, 2024
Which API should I use Asterisk APIs	4	718	November 1, 2016
Protocol AEAP - Using it for TTS Asterisk APIs	4	293	June 2, 2023

AGI for Speech Syntesis

Related topics