Trying to transcribe a telephone conversation into speech

I have this project where I have to do a speech-to-text of a telephone conversation between two sip accounts.

I am able to render speech to text using google’s speech recognition engine.

But I need to record and transcribe a telephone conversation into text.

Can someone help me?

Hire a palantype operator. Speaker independent, continuous, voice recognition is still a research topic, in reality, let alone multiple speakers degraded by telephone bandwidth.

I don’t understand your answer.
With this code, I can record a conversation:
exten => _8.,1,SetVar(CALLFILENAME=${EXTEN:1}-${TIMESTAMP})
exten => _8.,2,Monitor(wav,${CALLFILENAME},m)
exten => _8.,3,Dial(ZAP/g1/${EXTEN:1})
exten => _8.,4,Congestion
exten => _8.,104,Congestion

With this code, I can return what I say in text:
exten => 1235,1,Answer()
exten => 1235,n,agi(googletts.agi,“Say something in English, when done press the pound key.”,en)
exten => 1235,n(record),agi(speech-recog.agi,en-US)
exten => 1235,n,Verbose(1,Script returned: ${confidence} , ${utterance})

Now I just need to return a full telephone conversation in text

Someone pls help me

The answer was that what you are trying to do is not realistically possible yet, by any porcess that does not use a human brain to interpret the speech.

Telephones make things different, as they destroy the distinctions between s, sh, and similar sounds.

(Actually, if you look at sub-titles on live TV, you will see that it is not possible to do well, even with a human brain in circuit to capture the phonemes. You need the ability of human brains to interpret whole passages of speech, to establish context.

1 Like

But it is possible. There are many services that interpret speech.
For example I am using google speech recognition which can recognize and return exactly(if it is clear) what you speak in text.
I’ve already done that.
But I don’t know how to do it in a conversation

If you still want to try, I would suggest that you code directly to the Google speech API, using the recorded conversation, and ignore the AGI application, which is really there for IVR use.

Ok thank you david551

Another question, I am also configuring a WebRTC for making calls from an asterisk server to webRTC.
I have tried but when I make a call, it says it is busy.
Can I get some help with it

Please start a new thread.

WebRTC is not for the faint hearted, and you have not supplied nearly enough information for someone to realistically help you.

Look this:

Speaker labels let you identify which individuals spoke which words in a multi-participant exchange. You can use the information to develop a person-by-person transcript of an audio stream, such as contact to a call center, or to animate an exchange with a conversational robot or avatar. The feature works best for audio files of telephone conversations that involve two people in an extended conversation. For best performance, the audio should be at least a minute in length. (Labelling who spoke and when is sometimes referred to as speaker diarization.)

The feature is optimized for two-speaker scenarios. It can handle up to six speakers, but more than two speakers can result in variable performance. Two-person exchanges are typically conducted over narrowband media, but the feature is supported for the following models:

en-US_NarrowbandModel and en-US_BroadbandModel
es-ES_NarrowbandModel and es-ES_BroadbandModel
ja-JP_NarrowbandModel and ja-JP_BroadbandModel