Speech-to-text transcription

gabriel123 · June 2, 2017, 4:41pm

I need to transcribe a telephone conversation into text, can someone help?

david551 · June 2, 2017, 5:41pm

This really needs to be done by a human.

jersonjunior · June 5, 2017, 11:36pm

I’m using IBM Speech to text is very easy to implement with asterisk

david551 · June 6, 2017, 12:25am

Jersonjunior, are you by any chance taking dictation from a single, cooperative speaker? That’s rather different form transcribing an uncontrolled, two speaker, conversation.

jersonjunior · June 6, 2017, 5:42pm

Speaker labels let you identify which individuals spoke which words in a multi-participant exchange. You can use the information to develop a person-by-person transcript of an audio stream, such as contact to a call center, or to animate an exchange with a conversational robot or avatar. The feature works best for audio files of telephone conversations that involve two people in an extended conversation. For best performance, the audio should be at least a minute in length. (Labelling who spoke and when is sometimes referred to as speaker diarization.)

The feature is optimized for two-speaker scenarios. It can handle up to six speakers, but more than two speakers can result in variable performance. Two-person exchanges are typically conducted over narrowband media, but the feature is supported for the following models:

en-US_NarrowbandModel and en-US_BroadbandModel
es-ES_NarrowbandModel and es-ES_BroadbandModel
ja-JP_NarrowbandModel and ja-JP_BroadbandModel

https://www.ibm.com/watson/developercloud/doc/speech-to-text/output.html#speaker_labels

gabriel123 · June 8, 2017, 9:14am

Thank you jersonjunior, I’ll try to implement IBM Speech and I’ll let you know

gabriel123 · June 8, 2017, 2:54pm

It works but I have to do it on stream.
And I don’t want to attach a audio file that is transcribed later.
Like when I am speaking with someone on the telephone, I need to retrieve the text

navaismo · June 8, 2017, 2:59pm

That’s why people explained to you earlier it would be difficult, whoever is requesting this for you(usually government) should pay the fair amount to do it by you and to purchase a license for a software, for free believe my friend you will never got that unless you write it and opensource the API.

gabriel123 · June 8, 2017, 3:14pm

Thanks for the reply.
Yes it is my company who is asking me to do this for a project and gave me a license for google’s speech recognition.
But I can’t find how to return text results in real-time with google

navaismo · June 8, 2017, 4:24pm

You cannot do it realtime as far i know at least with the google api.

gabriel123 · June 8, 2017, 4:28pm

Ok thanks man. I also think it is not possible, I just have to be 100% so that I can tell my manager

navaismo · June 8, 2017, 4:42pm

But you can try to inject the audio as post call task, so not realtime but you can have it(sort of)

david551 · June 8, 2017, 7:26pm

Whilst I still think that the ability to do this at all is over-hyped, to the extent it can be done, you need to train the recognizer on the conversation, so there is going to be a start up delay of at least the time to do the training. Almost certainly you will get the best results if you train on the whole conversation.

Even the commercial product that someone was promoting says you need a certain length of speech for it to work well.

jersonjunior · June 8, 2017, 10:07pm

IBM Voice Gateway It works in real time according to the manufacturer, but I believe it will not suit your situation:

https://www.ibm.com/ms-en/marketplace/voice-gateway

gabriel123 · June 9, 2017, 9:34am

What I need to do is similar to the Case 2 with the Cognitive Agent Assistant.
Thanks again guys

sinen · October 17, 2017, 4:40pm

Hi guys,
I have a question is the same topic.
Me too I’m stating to use Watson API with asterisk and I could not found any good tutorial to start with. based in the discussion you guys already test the integration with Asterisk
Could you please point to some good tuto. I’m not that familiar with as Asterisk and my job is to integrated with watson
Thank you

minkoko · April 5, 2018, 4:16am

Hello guys,
I have a question for speaker labels.
Speaker labels is not true.The speaker labels is all 0.

meightee · April 5, 2018, 7:19am

Fosdem 2018 had a presentation and demo on this subject.

Was there, a transcript of a conference call was made realtime, so it is possible.

See this link: https://fosdem.org/2018/schedule/event/jitsi/

There is also a video recording of the presentation, think you can go on from there…

Good luck.

dose · April 5, 2018, 8:47pm

Hello,

You have live (streaming) speech recognition with google cloud speech API but I don’t think that there is a speaker label:

https://cloud.google.com/speech/docs/streaming-recognize

Best Regards

juju3301 · April 9, 2018, 9:50pm

I’ve made an integration with Nuance Transcription Engine. If you’re interested by documentation and code source, please contact me.

Topic		Replies	Views
Speech to text in Asterisk Asterisk Support	8	3963	May 13, 2017
Trying to transcribe a telephone conversation into speech Asterisk Support	9	1234	June 6, 2017
Trying to transcribe sip conversation to text Asterisk Integration	2	941	August 13, 2018
Record both parties in asterisk in separate files and detect who is talking right now Asterisk Support	6	731	September 20, 2023
Asterisk Speech Recognition \| IVR Speech To Text Asterisk Integration	2	1068	February 4, 2020

Speech-to-text transcription

Related topics