We have installed Asterisk (Version 13) in out local Ubuntu 16.4 Box. We are able to make calls from two softphones (ekiga). We are trying to integrate out speech recognition engine with Asterisk, so that the both callers get real-time transcription. We are planning to write an external program (in java or python) which listens in an ip like 10.100.99.22:5060. In the asterisk server we will configure the SIP server and give the ip address of the java/python program as the register, so that the java/python program will get the rtp stream and we can pass it to our real-time speech recognition engine. I would like to know, what is are trying to achieve is really doable or possible thing? Please correct if we are doing something wrong. Following will the sample configuration for the sip trunk. Could you please guide us on this?
register => <<>>
Seems a lot of people want to do real time transcription, but I don’t think current technology is capable of this without a high error rate. You need to train the recognizer on a considerable amount of speech from a speaker to get good recognition and you need to look ahead to gain more context to properly establish the likely words.
I assume you mean Asterisk daemon, as you actually seem to want to use Asterisk as SIP client here.
You will have to use a SIP stack in your Java program, with your current design, as SIP registration, in no way, sets up a media connection.
There is a chan_rtp that I believe will send a raw RTP stream. Not many people will have used that.
As to the rest of the logic for capturing real time audio, I have given a much detail as I think reasonable for a peer support forum in a recent thread, although EAGI streaming of raw media was used in that case.
Your example “SIP trunk” configuration contains several elements of bad practice, and obsolete parameter names. It is the typical sort of configuration created by ITSPs a decade ago and designed to minimise support calls to them, not to be secure. It is also using a SIP channel driver that is no longer supported.
You need to go back to first principles. However, as I said your actual proposed design wouldn’t use SIP>