Speech to text ON SITE

I know there is some google voice thing you can use to translate voice mails into text, etc.

I frankly find that process to be absurd. Sure, its probably easy because you don’t have to have any of that software on the PBX or in your network. But I “WANT” that done in house.

First, there are security concerns. I just do not trust google voice to keep things secret. Some of our voice mails are extremely confidential and really cannot be accessible to third parties.

Second, while I am sure google will offer this google voice service for years to come at no cost… if they ever change their mind or have a service outage my transcription service will fail. I cannot have that. It has to work or not work at MY discretion not google’s.

Third, I may wish to do a great deal more of this then google permits for free via google voice. I am not sure if they are comfortable with me using their systems to transcribe hundreds of messages daily for example. On my own systems I can expand hardware to meet demand. But google is not getting paid to provide this service and so will likely limit my access at some point.

To this end. I need to do it in house. I have an asterisk pro that is setting my PBX. Great guy. But he says that he is unaware of any onsite Speech to Text system that can be integrated with Asterisk.

That is absurd. My cellphone is capable of speech to text. And no, I do not mean through google now or something. It literally can decode offline in my hand. So if an android phone can do it… what excuse does the full PC running Asterisk have here? I don’t even need the transcriptions done in real time.

In fact, I don’t even need the transcriptions done by the literal PBX machine itself. It just needs to happen SOMEWHERE in our offices. So if we have a transcription server or transcription servers that are fed transcription tasks in series and then feed the output back to the PBX or the email server or a web server then fine.

Anyway, are there ideas out there for this? I don’t need a line by line instruction set to do this as I have a lot of bright people that can connect the dots. But none of them is aware of a software package that could do this.

Here are the two ideas I have and I hope someone can validate ONE of them.

  1. There might just be a package or plugin for asterisk that can handle onsite Speech to Text conversion.

  2. Asterisk might be able to OUTPUT voice mails to a shared directory or FTP or something and ANOTHER system that monitors that network share or FTP could grab the file, do a speech to text conversion and then forward the output to some other system that will take the text as input and sort the text appropriately so that it registers as a transcribed phone call that came from X phone number at time T.

Frankly I could almost program option 2 myself. The only missing piece would be a speech to text engine that takes command line arguments. If I had that, I could script something.

Anyway… thoughts?

Continuous speech to text is a very difficult problem. A few years ago, some services claiming to do it mechanically were actually using humans to do some or all of the work. I still think it will be unlikely that this is a practical project for open source development, so I would expect this only to be available as commercial products.

Also, people like Google may be the only ones with the resources to throw at the problem.

It is actually still a very difficult problem to clean up speech in a noisy environment for the human to actually do the deciphering (hearing aid industry).

I haven’t used it but you can try Lumenvox (commercial)(lumenvox.com/products/speech_engine/).
Feed the voicemail audio file and get text in response.
They also have something to offer for Asterisk (lumenvox.com/partners/digium/Asterisk.aspx)

–Satish Barot