Hello Asterisk community.
We’re working on a product that guides call center agents in real-time during the conversation.
We run speech-to-text, and different ML models on top of voice traffic (we need wav, but it can be RTP with any voice codec), and we have to identify particular conversation/agent/caller/call start/call end (basically signaling).
We want to be able to integrate with as many Asterisk appliances as possible in the least intrusive, and the most sane way.
How to achieve this?
So far, we came to several approaches, looking at the documentation, each with it’s own drawbacks:
-
C Module seems to give us what we need, but a bug in our code may crash client’s installation, and also updates become much harder.
-
ARI seems like the ideal solution (is it possible to stream voice/media data over ARI?) but the API is fairly new, and some of our clients may not have it.
-
AMI looks like the way to go, but I couldn’t find how to get the voice/media data using AMI (as far as I understand, additional integration with AGI is required in our case to get voice)
-
I’m not sure yet if it’s possible to integrate with AGI only, but the drawback in this case is that bug/delay in our application again may draw some of the calls unstable, and we’re trying to avoid that
-
We’ve made a proof of concept with tcpdump, capturing raw SIP/RTP traffic, it works for us but unfortunately we also may capture client’s sensitive data, we don’t want that, and some of our clients won’t allow us to do so.
Please correct me where I’m wrong, as my knowledge of Asterisk is pure theoretical, and guide me in the right direction.
Thanks in advance
Iurii