What is the best output format for converting downloaded Amazon Polly sound files?

Amazon Polly will produce mp3 and ogg, sampled at 8,000 Hz, 16,000 Hz and 22,050 Hz.

Should I start with the highest rate using either of these formats and convert with sox to which format? Is sln the best choice of output file format?

Does it make any sense to sample at a higher rate than the input file when using sox? For example, If I used an ogg file as input downloaded from Amazon Polly at 22,050 Hz, could I convert it at 48,000 Hz?

I notice that asterisk has the following file format.

slin48 ogg_opus opus

What does this mean and does this mean I can convert the ogg input file to sln at 48,000 Hz and rename it to sln48?

Should I also convert to other file formats like G729 and ulaw and if so, at what sampling rate? I’d like to produce the highest possible quality.

  1. As @david551 noted in another thread with a similar question, it depends on what your target devices are. If your target devices are PSTN or other 8kHz-only capable devices, then go ahead and make everything 8kHz. If Asterisk has to up-sample or down-sample for a call, then it costs CPU cycles.

Making a bunch of assumptions…

  1. Starting with a higher-quality source file is better.
  2. Convert using sox to sln. Then rename to the appropriate .slnXY as required by Asterisk’s slin file format support.
  3. slin (slinXY) is a good choice, because Asterisk can easily take slin to another format without having to decode it. It’s just an encode and possibly a resample.
  4. Up-sampling your source file won’t increase its fidelity. If your targets were all 48kHz-capable targets, then it’d make sense to go ahead and upsample before you gave the file to Asterisk, so that Asterisk didn’t have to up-sample for every playback.
  5. Yes, you could up-sample a 22kHz file to a 48kHz file. I’d only do it if my targets were 48kHz capable devices. Right now, that means your targets must be speaking Opus. Asterisk doesn’t do any other 48kHz encoding.
  6. File formats represent Asterisk’s ability to read and/or write to a type of file. Yes, if you start with an ogg input file I’d convert it to slin. I’d only convert it to an slin that was appropriate for my targets (clients). You don’t want a bunch of G.722 clients where Asterisk has to down-sample 48kHz on every single call to 16kHz, for example.
  7. You should only convert to G.729 if you’re going to have G.729 clients.
  8. You should only convert to ulaw if you’re going to have ulaw clients.
  9. The benefit of having a file in the native format of the client is that encoding doesn’t have to take place.
  10. G.729 and G.711 u-law are 8kHz codecs. You can’t improve the behavior for them. G.729 is less than PSTN quality. G.711 u-law is PSTN quality. If those are your clients, make yourself 8kHz G.729 and 8kHz u-law files and be done. It’ll be easiest on your CPU.
2 Likes

Thank you, your very specific answer has greatly enlightened me. Documentation on these questions is very hard to find in the wild so I felt it useful to ask some specific questions in this more generic post. I apologize if it looks like repetition.

1 Like