Speech-to-text transcription

I need to transcribe a telephone conversation into text, can someone help?

This really needs to be done by a human.

I’m using IBM Speech to text is very easy to implement with asterisk

1 Like

Jersonjunior, are you by any chance taking dictation from a single, cooperative speaker? That’s rather different form transcribing an uncontrolled, two speaker, conversation.

Speaker labels let you identify which individuals spoke which words in a multi-participant exchange. You can use the information to develop a person-by-person transcript of an audio stream, such as contact to a call center, or to animate an exchange with a conversational robot or avatar. The feature works best for audio files of telephone conversations that involve two people in an extended conversation. For best performance, the audio should be at least a minute in length. (Labelling who spoke and when is sometimes referred to as speaker diarization.)

The feature is optimized for two-speaker scenarios. It can handle up to six speakers, but more than two speakers can result in variable performance. Two-person exchanges are typically conducted over narrowband media, but the feature is supported for the following models:

en-US_NarrowbandModel and en-US_BroadbandModel
es-ES_NarrowbandModel and es-ES_BroadbandModel
ja-JP_NarrowbandModel and ja-JP_BroadbandModel

https://www.ibm.com/watson/developercloud/doc/speech-to-text/output.html#speaker_labels

Thank you jersonjunior, I’ll try to implement IBM Speech and I’ll let you know

It works but I have to do it on stream.
And I don’t want to attach a audio file that is transcribed later.
Like when I am speaking with someone on the telephone, I need to retrieve the text

That’s why people explained to you earlier it would be difficult, whoever is requesting this for you(usually government) should pay the fair amount to do it by you and to purchase a license for a software, for free believe my friend you will never got that unless you write it and opensource the API.

Thanks for the reply.
Yes it is my company who is asking me to do this for a project and gave me a license for google’s speech recognition.
But I can’t find how to return text results in real-time with google

You cannot do it realtime as far i know at least with the google api.

Ok thanks man. I also think it is not possible, I just have to be 100% so that I can tell my manager :slight_smile:

But you can try to inject the audio as post call task, so not realtime but you can have it(sort of)

Whilst I still think that the ability to do this at all is over-hyped, to the extent it can be done, you need to train the recognizer on the conversation, so there is going to be a start up delay of at least the time to do the training. Almost certainly you will get the best results if you train on the whole conversation.

Even the commercial product that someone was promoting says you need a certain length of speech for it to work well.

IBM Voice Gateway It works in real time according to the manufacturer, but I believe it will not suit your situation:

https://www.ibm.com/ms-en/marketplace/voice-gateway

What I need to do is similar to the Case 2 with the Cognitive Agent Assistant.
Thanks again guys

Hi guys,
I have a question is the same topic.
Me too I’m stating to use Watson API with asterisk and I could not found any good tutorial to start with. based in the discussion you guys already test the integration with Asterisk
Could you please point to some good tuto. I’m not that familiar with as Asterisk and my job is to integrated with watson :pensive:
Thank you

Hello guys,
I have a question for speaker labels.
Speaker labels is not true.The speaker labels is all 0.

Fosdem 2018 had a presentation and demo on this subject.

Was there, a transcript of a conference call was made realtime, so it is possible.

See this link: https://fosdem.org/2018/schedule/event/jitsi/

There is also a video recording of the presentation, think you can go on from there…

Good luck.

Hello,

You have live (streaming) speech recognition with google cloud speech API but I don’t think that there is a speaker label:

https://cloud.google.com/speech/docs/streaming-recognize

Best Regards

I’ve made an integration with Nuance Transcription Engine. If you’re interested by documentation and code source, please contact me.

1 Like