Click sound at the beginning when playing prompts

When I play the voice prompts that I’ve created there is always a pop or click at the beginning. I create the prompts in TextAloud and use sox to convert to ulaw format. Inevitably when played there is always a click at the beginning of the prompt when played. I’ve tried a number of things: silence at the beginning, fade in for 1 second, insert 1 sec silence prompt before, norm -9. Nothing gets rid of the click. Anybody have any ideas how to get rid of these pesky things. They’re not audible when I play in Audacity.

My guess is that you have recorded it as an RIFF file (.wav) with a mu-law payload, but tried to play it back as .ulaw, which is raw mu-law. I don’t think Asterisk supports mu-law in RIFF, although it does support signed linear 16 bit mono (.wav) and GSM (.WAV or .wav49) wrapped in RIFF.

1 Like

Thanks for your reply, David. You are correct: the TTS program I use saves the file as WAV(PCM) 8k 16 bit. I then use sox to convert to ulaw. I’m doing this so that it doesn’t need to be transcoded at playback.

When I look at the file in a hex editor I can see the RIFF reference. I’m interested in creating as high-quality a prompt as possible, trying to avoid transcoding by Asterisk. If I save as signed linear 16 bit mono (.wav), will Asterisk transcode the file?

Is there a way to remove the RIFF header, or else some other way to achieve what I’m looking for?

Asterisk will transcode and the transcoding is computationally cheap and doesn’t lose any more information than is intrinsically lost in using mu-law.

Perhaps not the best way, but I found a small program (Linux or Win) called riffstrip that removes the RIFF header from wav files. I used it to strip the riff headers from my ulaw files that were converted by sox from wav/pcm files. Now I no longer have clicks at the beginning of my ulaw prompt files. This might help someone else.

sox, itself, should be able to do this, e.g. see

RIFF is more complex than just a header. Although the files you are using may appear just to have a header, in general, RIFF files may have the audio in more than one chunk, with control information between them, and not even necessarily in the natural order.