I’ve been successful at doing asynchronous semi-realtime transcription with Google Cloud Speech but now I’m trying to make it work with AWS Transcribe, as a way to compare them.
AWS Transcribe only accepts “16-bit Linear PCM encoding” for realtime transcriptions. I’m assuming that this is what asterisk defines as the slin codec.
So I’ve called externalMedia with both slin and slin16 to no avail.
I know my stack works because I can change the provider to Google Cloud Speech and specify ulaw (which GCS does accept) it works.
BUT, if I change the format to slin or slin16 which GCS also does accept, GCS fails to work (which is the same case as with AWS Transcribe).
This makes me think that I’m wrong in thinking that “16-bit Linear PCM encoding” is slin or that I need to fiddle a bit with the packets (endianess?).
So, is “16-bit Linear PCM encoding” what asterisk defines as slin/slin16?
Both are particular configurations of 16 bit linear PCM. Both further qualify it as signed and mono. slin adds 8kHz sampling and slin16 adds 16 kHz sampling.
Without checking the code, I don’t know if the external format is little or big endian.
I’m still trying to undestand what is going on and so I’ve made a couple of tests:
I recorded the bridge via ARI request to /bridges/{bridgeId}/record with sln16 as format and then successfully played the resulting audio with, confirming that sln16 is working on my installation:
play -r 16000 -b 16 -e signed-integer -c 1
I also saved to a file the incoming RTP packets (stripping their header so as to only save the RAW data) to a file. Playing those doesn’t work. Crackling noises is all I hear using the same play... command as before.
Its worth noting that the POST call to /channels/externalMedia has a format of slin16 (with an “i”).
(I find it a little confusing that the format parameter for externalMedia accepts slin16 but the call to record doesn’t, it expects sln16 (without the “i”). And viceversa. Using slin16 on record or sln16 on externalMedia both give an error.)
Anyway, my problem still continues: I can’t seem to be getting slin16 packets via RTP with externalMedia. Or I’m getting them but in an unexpected way (to me).
OK, upon further investigation, I can play both streams (the one from externalMedia and the one created with record):
The one created with record can be played with:
play -r 16000 -b 16 -e signed-integer -c 1 --endian little audios/bridge-recording-204516.raw
And the one saved from RTP packets like so:
play -r 16000 -b 16 -e signed-integer -c 1 --endian big ./transcription16k.raw
So, as it should, I’m getting the data just fine when using externalMedia but with an endianness that AWS Transcribe doesn’t expect.
I just created a script to send the raw files to AWS Transcribe, and it gladly accepts the one created from the recording, but not the one saved from the stream.
It seems I need to fix the endianness of the data coming via externalMedia when the format is sln16.
Any idea on what I need to do to the stream to correctly change its endianness?