Hello everyone.
Hope you are doing well.
Currently, I am going to build a Conversational AI Agent using ARI.
But not sure how to build it from scratch.
Any help, any advice would be appreciated.
Hello everyone.
Hope you are doing well.
Currently, I am going to build a Conversational AI Agent using ARI.
But not sure how to build it from scratch.
Any help, any advice would be appreciated.
Read the many many previous threads on this topic.
On Thursday 16 January 2025 at 17:33:39, techdev via Asterisk Community wrote:
Currently, I am going to build a Conversational AI Agent using ARI.
But not sure how to build it from scratch.
Have you tried Google Search to see whether
anyone has discussed this previously, and possibly provided some clues to how
they did it?
Other search engines may provide even more results.
Antony.
–
This email was created using 100% recycled electrons.
This topic will be covered at Astricon this year. Hopefully we will see you there!
I would ask this question:
Do you have a few hundred spare GPUs with a terabyte of VRAM?
I can’t run 14b size LLMs on my laptop; you need approximately 32gigs of VRAM for that. 14b is considered a small model. If you can’t fit it all in VRAM, then you can’t run it all in VRAM and that’s going to be a problem.
speech-to-text, which will be needed for the LLM to do anything, is also going to be pretty computational in real time.
I’ve seen demos of AI agents. They’re running on big cloud providers dedicated for this type of development. There’s till 2 or 3 seconds of lag between responses. It’s not quite ready for prime-time. That is not to mention that with the lousy audio quality of calls, AI agents get very confused when the calls aren’t clear. We are currently using AI at my job to do call transcription and it’s been a lot of difficulty getting it to work right. Even then…it’s not right.
All the demos and stuff are being done with…no surprise…excellent audio into the thing. Real world is going to be a problem.
Thanks for all of your many quick replies.
Yes, I’ve already seen several topics. Here is what I have known about it from this topic - How to get real time audio streams of both Calling party and called party independently - #5 by shamnusln.
Regarding the step 1, should I get the bridge id from bridge list using setInterval and then go to the next step from there?
Hey there,
That’s a good question! I have a few suggestions that might help.
If you’re working with limited resources, I recommend checking out something like Ollama. It allows you to switch models in the future when you have access to better resources without requiring significant changes to your code.
Additionally, I’ve built an AI agent with excellent response times, leveraging two GPUs with a total of 32 GB VRAM, along with local STT and TTS capabilities.
If you’d like to see a demo, you can explore bland.ai or check out Vocode, an open-source project.
I’m planning to put an instance of Kamailio + rtpengine in front of Asterisk, to be able to have in the caller’s audio, the reproduction of a background audio file with the classic noise of a standard office, with hissing, typing on the keyboard and similar things… The latency in this case does not give that sense of anguish that silence does.
On Friday 17 January 2025 at 10:22:39, simone686 via Asterisk Community wrote:
I’m planning to put an instance of Kamailio + rtpengine in front of
Asterisk
Understood. Do you have a question?
Antony.
–
Bill Gates has personally assured the Spanish Academy that he will never allow
the upside-down question mark to disappear from Microsoft word-processing
programs, which must be reassuring for millions of Spanish-speaking people,
though just a piddling afterthought as far as he’s concerned.
Thanks for so many valuable replies for my question.
I read all of your opinions carefully so I think I can implement conversational AI agent in two ways.
Now, I am not sure which is the best way for conversation AI agent.
I did some more research yesterday and found that while there was a lot of material on ARI, there wasn’t enough material on Asterisk AudioSocket.
Given the topic that is coverated at Astericon next month, personally I am more intersted in Asterisk AudioSocket.
I’d appreciate it if you could give me any advice about this.
Thank you.
That talk and usage is through ARI and External Media, fyi. We don’t use AudioSocket at Sangoma, it’s a community supported part of Asterisk.
For the conversational ai. What about taking a 3rd party sip library. Use that to connection to the ai agent? And then I can just register it as a sip extension
Not sure it will work or not?
Any help would be appreciated.