I tried to configure to have a record every 5 seconds but I only get one. Can someone help me to troubleshoot where the misconfiguration is?
exten => 7200,1,Answer()
same => n,Set(MONITOR_BASE_FILENAME=${STRFTIME(${EPOCH},,%Y%m%d-%H%M%S)})
same => n(start_loop),NoOp(Starting loop iteration)
same => n,Set(MONITOR_FILENAME=${MONITOR_BASE_FILENAME}-${STRFTIME(${EPOCH},,%S)}.wav)
same => n,MixMonitor(/tmp/${MONITOR_FILENAME})
same => n,Dial(PJSIP/7200)
same => n,Wait(5)
same => n,GotoIf($[${DIALSTATUS} = "ANSWER"]?start_loop:end_loop)
same => n(end_loop),Hangup()
Tks Antony. I want to see if I can stream the audio of the call to an external voice-2-text (V2T) service but I think we canāt do that with Asterisk. So I leverage the recording feature and will have the external service to translate the call content in real-time. Thatās why I canāt wait until the call completes to have entire recorded file.
Please let me know if you are aware of any solution as āsamplingā seems to be not an optimal solution.
Tks for your support and your time. As I mentioned in my reply to Pooh, my goal is to stream the SIP call to an external service to convert it to text in real-time manner. So i am looking for a solution other than post-process the record.
As I am new to Asterisk, I think I have to check ARI to see if I can handle the SIP call while recording (or streaming would be ideal).
Do not forget to stream the two channels (inbound / outbound or A / B or local
/ remote - call them what you will) independently, so that the speech-to-text
system can work with two streams of fairly clear text (with gaps in them when
the other person speaks, usually) and isnāt trying to understand a single
audio channel of two people in different voices speaking across each otherās
words.
Iād question the idea of transcribing the two sides separately. Large language model based transcribers donāt translate word by word, but take in large amounts of surrounding context, and giving them only half the conversation could easily make it impossible for them to establish the correct context. They are going to get better at separating two speakers, and if it is necessary to split the two sides, one really requires a recognizer that takes both sides and decodes them in parallel.
You donāt mean sampling, but thatās what I originally thought you wanted.
It seems to me that feeding isolated 5 second slices to a transcriber is pretty much the worst thing you could do. Ideally you should feed the whole call, after it is complete, but failing that, you need a transcriber that can take a stream and maybe back track up to five seconds.
As I suggested in my other reply, what is making AI speech to text increasingly effective is the ability to make use of a large amount of preceding context, but you also need a few seconds of following context.
I think all the standard methods of capturing audio for recognizers capture a stream.
Tks a lot for your info and your opinion. There are still a lot of things to do at AI side to improve, I believe. But atm, my objective is to obtain the input for that AI.
According to your suggestion, i.e. replacing Dial with ARI, I am not sure if we would be able to handle the call without using SIP. I conduct some initial steps and suddenly just having that concern in mind. Can you elaborate your last sentence a bit, i.e. āstandard methods of capturing audioā?
It seems that there are not many tutorial/examples for streaming a call even though it is a common topic IMHO. Please correct me if I am wrong.
What disqualifies āeagi()ā from consideration?
It was not a good solution for a recent client because they expanded scope and decided they wanted bi-directional audio, but for a STT application it should suffice.
I think what they are trying to do is to do near real time speech to text by breaking up the audio into short files and feeding them to a recognizer designed to transcribe whole files.
However, you canāt read .wav files that are only half written as they contain meta data that is not filled in until the file is closed. This meta data is about size and location of chunks. Whilst it could be updated after every write, that would involve seeking backwards and forwards, which isnāt otherwise necessary.
I saw your reply and sorry for not getting back with my thanks. I tried to avoid this one as I could not imagine how to work it out at the beginning. Here is what I did:
exten = 7100,1,Answer()
same = n,eagi(/var/lib/asterisk/agi-bin/test.eagi)
same = n,Dial(PJSIP/7100,60)
same = n,Hangup()
I can see the raw file but I think it does not work because the script executes before the Dial performs. So i am not sure how to have both Dial and eagi running in parallel.