Hi,
I want to try to create something like Sean McCord did at Astricon a few years ago (AstriCon 2019: Audio Pipes - Playing with Real Time Aud...). The idea is to have the user speak, take what they say and send it off to Google/AWS etc. for speech to text analysis and then look through the words that were said for specific key words. I know how to do this by asking the user to record after the beep, take that sound file and then send it off to AWS, Google etc. for analysis however I am looking to do it in a more natural way. Like I want to play a sound file like “Please ask me a question” and then as soon as I detect speech grab it and and soon as there is no audio send it off for analyis.
TIA and have a great weekend!