I want to record both parties, like agent (callee) and customer (caller), but in separate files, and I also want to detect who is talking right now, whether it is agent or customer, and I want to take separate actions on the recorded files, like changing the agent’s audio to text, making its audio file, and playing this audio in the customer’s language, and vice versa. How can I do that? Please help me if anyone can. and the speech-to-text and text-to-speech are not problems. The problem is that when someone presses, let’s say, 1 from the IVR menu recording, he should be connected to an agent, and both agent and customer have different languages, so we have to record separate files for each. and also want to detect who is talking right now.
hmm you have 2 options
use AEAP / AGI
but you have to create your own handeling
alternative use Monitor/MixMonitor on each channel
to probaly handled a transfreed call
exten => _X!,1,Monitor(${CHANNEL})
same => n,Dial(PJSIP/${EXTEN},G(AnswerChannel,s,1))
[AnswerChannel]
exten => s,1,Monitor(${CHANNEL})
This seems to involve real-time audio transcription. If that’s the case, recording the call legs doesn’t make sense unless real-time transcription isn’t being used. If you are using recording as a method for handling voice note chunks, similar to WhatsApp’s notes, and for replying to your two questions, these tasks can be managed by separate apps. Therefore, you need to develop your dialplan logic to consolidate everything within a single context
- Record call legs in separate files using the ‘monitor’ app.
- Detect who is speaking by sending the call participants to a ‘confbridge’; this will allow you to utilize AMI events.
ConfbridgeTalking¶
This event is triggered when the conference detects that a user has either started or stopped talking."
Start talking Example
Event: ConfbridgeTalking
Privilege: call, all
Channel: SIP/mypeer-00000001
Uniqueid: 1303308745.0
Conference: 1111
TalkingStatus: on
Stop talking Example
Event: ConfbridgeTalking
Privilege: call, all
Channel: SIP/mypeer-00000001
Uniqueid: 1303308745.0
Conference: 1111
TalkingStatus: off
@ambiorixg12 Thanks for the reply.
But I have a further query: Do we have any mechanism to use real-time transcription, like when the agent is talking in English but the customer understands German?
Or I have an idea in mind that we can record agents as first priority, transcribe and play to other channels (customers), and vice versa. Maybe we can manage the state who is talking. I don’t think so, but here I need suggestions as I am not getting any best idea in mind.
In theory, this is the process: Use ARI’s external media capability to send the media to an external server. Convert that stream into the text and language you need on the external server. Then, reconvert the processed text back into audio and play it to the channel using TTS. Google’s API along with Node.js can handle this job effectively.
I think you need an AI agent, not machine translation. Machine translation won’t know your business and will have to lag several seconds behind the agent to try and gather enough context for a reasonably accurate translation.
I’d also note that, to provide an accurate translation, the machine will probably need to hear both sides of the conversionation.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.