Codec selection mismatch(?) for externalMedia and AWS Transcribe

I’ve been successful at doing asynchronous semi-realtime transcription with Google Cloud Speech but now I’m trying to make it work with AWS Transcribe, as a way to compare them.

AWS Transcribe only accepts “16-bit Linear PCM encoding” for realtime transcriptions. I’m assuming that this is what asterisk defines as the slin codec.

So I’ve called externalMedia with both slin and slin16 to no avail.

I know my stack works because I can change the provider to Google Cloud Speech and specify ulaw (which GCS does accept) it works.

BUT, if I change the format to slin or slin16 which GCS also does accept, GCS fails to work (which is the same case as with AWS Transcribe).

This makes me think that I’m wrong in thinking that “16-bit Linear PCM encoding” is slin or that I need to fiddle a bit with the packets (endianess?).

So, is “16-bit Linear PCM encoding” what asterisk defines as slin/slin16?

Thanks

Both are particular configurations of 16 bit linear PCM. Both further qualify it as signed and mono. slin adds 8kHz sampling and slin16 adds 16 kHz sampling.

Without checking the code, I don’t know if the external format is little or big endian.

Well, I gave it a try by changing the endianness of the packets. It didn’t work.

Then I decided to write all the packets to disc, and then try to play the audio like so:

play -r 16000 -b 16 -e signed-integer -c 1 ./transcription.raw

I only heard some weird crackling noise.

I’m sure I have a mismatch somewhere. Or, more possible, not understanding how the audio is encoded.

Any ideas are appreciated.

I’m still trying to undestand what is going on and so I’ve made a couple of tests:

I recorded the bridge via ARI request to /bridges/{bridgeId}/record with sln16 as format and then successfully played the resulting audio with, confirming that sln16 is working on my installation:

play -r 16000 -b 16 -e signed-integer -c 1

I also saved to a file the incoming RTP packets (stripping their header so as to only save the RAW data) to a file. Playing those doesn’t work. Crackling noises is all I hear using the same play... command as before.

Its worth noting that the POST call to /channels/externalMedia has a format of slin16 (with an “i”).

(I find it a little confusing that the format parameter for externalMedia accepts slin16 but the call to record doesn’t, it expects sln16 (without the “i”). And viceversa. Using slin16 on record or sln16 on externalMedia both give an error.)

Anyway, my problem still continues: I can’t seem to be getting slin16 packets via RTP with externalMedia. Or I’m getting them but in an unexpected way (to me).

Any help or pointers are appreciated.

OK, upon further investigation, I can play both streams (the one from externalMedia and the one created with record):

The one created with record can be played with:

play -r 16000 -b 16 -e signed-integer -c 1 --endian little audios/bridge-recording-204516.raw

And the one saved from RTP packets like so:

play -r 16000 -b 16 -e signed-integer -c 1 --endian big ./transcription16k.raw

So, as it should, I’m getting the data just fine when using externalMedia but with an endianness that AWS Transcribe doesn’t expect.

I just created a script to send the raw files to AWS Transcribe, and it gladly accepts the one created from the recording, but not the one saved from the stream.

It seems I need to fix the endianness of the data coming via externalMedia when the format is sln16.

Any idea on what I need to do to the stream to correctly change its endianness?

I got it working with byteswapping since 16 bits.