I’ve been successful at doing asynchronous semi-realtime transcription with Google Cloud Speech but now I’m trying to make it work with AWS Transcribe, as a way to compare them.
AWS Transcribe only accepts “16-bit Linear PCM encoding” for realtime transcriptions. I’m assuming that this is what asterisk defines as the
So I’ve called
externalMedia with both
slin16 to no avail.
I know my stack works because I can change the provider to Google Cloud Speech and specify
ulaw (which GCS does accept) it works.
BUT, if I change the format to
slin16 which GCS also does accept, GCS fails to work (which is the same case as with AWS Transcribe).
This makes me think that I’m wrong in thinking that “16-bit Linear PCM encoding” is
slin or that I need to fiddle a bit with the packets (endianess?).
So, is “16-bit Linear PCM encoding” what asterisk defines as