Asterisk sound quality

Hi,

I have never fully understood audio in Asterisk. I know that the PSTN uses 8khz and that’s why the quality is never that great (since you are capped at 8khz). Currently when I convert files over I use something like this for f in *.mp3; do ffmpeg -i “$f” -ar 8000 -ac 1 -sample_fmt s16 -c:a pcm_s16le “${f%.mp3}.wav”; done. The quality is meh and again I understand as the PSTN has it’s limits. What I don’t understand is if I do for f in *.mp3; do ffmpeg -i “$f” -ar 16000 -ac 1 -sample_fmt s16 -c:a pcm_s16le “${f%.mp3}.wav”; mv “${f%.mp3}.wav” “${f%.mp3}.wav16” done (so I am converting it to wav16 as opposed to just wav) the quality is a lot better. Isn’t the PSTN still limited to 8khz?

On Monday 20 October 2025 at 22:17:30, dovi5988 via Asterisk Community wrote:

the quality is a lot better

The quality of what?

How is your 16kHz audio getting sent into the PSTN? What hardware are you
using for this?

Isn’t the PSTN still limited to 8kHz?

I would say no.

A great deal of “the PSTN” has for many years been operating on SIP and/or
similar digital signalling protocols, with associated digital audio.

What bandwidth / sampling rate any given PSTN provider uses is probably pretty
difficult to find out, but I’d be prepared to bet that a lot use something
significantly higher than 8kHz these days.

In many countries it is now impossible to obtain a “legacy” analogue phone
line. You can get a connection from your local telephone service provider,
but it only does DSL, and for telephone service itself, you then get an ATA
adapter built into your DSL modem, and you plug your legacy analogue
telephones into that.

The sampling rate of the audio from your telephones over a few metres of
cable, and then sent as digital audio down the service provider’s DSL line, is
anyone’s guess.

The long and low-quality phone lines from your house to the local central office
(telephoen exchange) which imposed the 8kHz sampling limit on historical phone
calls are a thing of the past in most parts of the world.

I’m sure there are not-especially-difficult ways of sending various signals down
a modern phone connection and thereby establishing what sampling rate the
service provider is using, but I am not surprised if you find that using 16kHz
audio sounds better than 8kHz audio over the modern PSTN.

I’m still intrigued to know what hardware you are using to get this audio into
the PSTN. though (first question above).

Antony.


I don’t know, maybe if we all waited then cosmic rays would write all our
software for us. Of course it might take a while.

  • Ron Minnich, Los Alamos National Laboratory

It is not enough to merely downsample the audio, you have to filter it first, to get rid of frequency components above the maximum that can be represented with the new sample rate — the Nyquist limit. If you don’t do this, you get “aliasing”, where those components manifest as spurious unwanted frequencies within the representable limit.

Here is an example

child = subprocess.Popen \
  (
    args =
        (
            "ffmpeg", "-i", audio_file,
            "-loglevel", "quiet",
              # quiet messages, unfortunately quiets error reports as well
            "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "8000",
            "-ac", "1", "-af", ",".join(["lowpass=f=3000"] * 16),
            "-y", "/dev/stdout",
        ),
    stdin = subprocess.DEVNULL,
    stdout = subprocess.PIPE,
  )

from one of my seaskirt_examples scripts. Note the 16th-order lowpass filter — overkill maybe, but feel free to experiment for yourself to see how much difference it makes.

You might want to use “sox” instead of ffmpeg. Its resampler has a built-in configurable low-pass filter that requires quite a bit less CPU than sixteen of ffmpeg’s filter passes.

8kHz is the traditional sampling rate, but the audio bandwidth is less. As others have described, firstly the the sampling rate has to be at least twice the audio bandwidth, and secondly, filters to prevent the input frequency exceeding do not provide an instant cut off. That wasn’t close to physically realisable in the past, when such filters would have delayed the audio a lot, and unevenly.

The actual traditional audio bearer capability is 3.1kHz audio, those 3.1kHz being between 300Hz and 3.4kHz. That got locked down by the 4kHz channel spacing used in frequency division multiplexed systems, and the practically realisable filters for those. Keeping the same audio bandwidth resulted 8kHz sampling being a sensible rate.

There is a move to providing a 7kHz audio service (16kHz sampling) as the PSTN moves to VoIP and mobile phone networks are doing the same, for all but 2G.

Mobile networks further complicate the quality issue, in that they generally don’t provide a 3.1kHz audio service but rather a speech service, which means they use codecs that make assumptions about the human voice, which allows lower bit rates, but limits their ability to carry music, or inband DTMF.

Asterisk is not limited to 8kHz sampling, and nor are most IP phones. If you choose codecs with faster sampling rates, the lowest common denominator will get used by Asterisk.

Traditional digital phone systems also have a limited dynamic range, as they only use 8 bits per sample, but coded somewhat logarithmically, so that they get about 13 bits of effective dynamic range, with an acceptable level of distortion. The traditional codecs need very little processing to convert to linear analogue, but codecs like MP3, require a lot of processing, and also introduce extra delays.

In the early days of VoIP, especially with low wage country call centres, there was a demand for very low bitrate codecs. but now, the main demand is for higher audio quaility, although there is still a preference to keep the rate down and avoid too much CPU usage.

I believe that sox applies an appropriate anti-aliasing filter, automatically, when down sampling. I would certainly hope that Asterisk does.

Well in this case I’d call it the highest common denominator, as in the one with the highest quality/sample rate/whatever. :wink:

Otherwise, thanks, that was informative,

I think you misunderstood. If one leg is on the traditional PSTN, and the other leg is using slin44, the overall quality will be determined by the PSTN leg. going one way it will start at PSTN quality. Going the other way, it will be downsampled to PSTN quality.

That is the mathematical definition of “highest common denominator” — or rather, “highest common factor”.

Also, appreciate the mention of sox. But as we have discussed before, I like the fact that FFmpeg is a toolkit that separates out functions (like audio extraction, filtering and resampling) that you can combine in ways that you choose, rather than having them chosen for you.

That is not the mathematical definition, and I was using the term in the popular semse. not the mathematical one. The mathematical one relates to integer division.

When you resample in ffmpeg; it automatically uses a low-pass. Aliasing is not a problem and hasn’t been since the late 90s. The command listed above is fine for resampling.

Also…the phrase is lowest common denominator; and it applies for both here. If the phones don’t do 44 but do 22khz; then 22khz the idiom applies here too. It’s mostly a way of stating they find some kind of common ground to work on; it’s not always literally the absolute lowest.

Actual PTSN networks are limited to 8khz, period. No ifs, ands, or butts. That is Bellcore standard and is still followed on analog networks to the letter.

The problem is most people don’t have a PTSN connection anymore. I know I don’t. My one line is VoIP and the other is “digital voice over fiber”. My former landlines are still limited to about 4khz of bandwidth due to analog circuitry. I do get slightly better quality with them…but only because it’s not getting pushed over a full analog network.

No one is going to start breaking Bellcore standards on a real analog network. But once you put VoIP in the mix and the network is literally from an ATA to your phone; then the standards are off and you’re more limited by the ATA and the phone itself.

On Sat, Oct 25, 2025 at 7:30 AM david551 via Asterisk Community <notifications@asterisk.discoursemail.com> wrote:

david551
October 25

ldo:

That is the mathematical definition of “highest common denominator” — or rather, “highest common factor”.

That is not the mathematical definition, and I was using the term in the popular semse. not the mathematical one. The mathematical one relates to integer division.


Visit Topic or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, click here.

On Saturday 25 October 2025 at 15:03:39, dewdude via Asterisk Community wrote:

Also…the phrase is lowest common denominator; and it applies for both
here. If the phones don’t do 44 but do 22khz; then 22khz the idiom applies
here too. It’s mostly a way of stating they find some kind of common
ground to work on; it’s not always literally the absolute lowest.

If one side offers 44kHz, 22kHz and 8kHz, and the other side offers 22kHz and
8kHz, then the lowest common denominator is 8kHz.

The highest common denominator is 22kHz.

To avoid mathematical / philosophical / linguistic arguments, I suggest people
say somethng like “the highest common capability” when talking about two sides
negotiating what quality to use.

Basically, the two sides offer each other the list of rates they support, and
the agrrement is the highest which is on both lists.

Antony.


It may not seem obvious, but (6 x 5 + 5) x 5 - 55 equals 5!

I wasn’t arguing the technical correctness of the idiom; just which one was the idiom. By definition, idioms are usually an incorrect usage of a phrase to start with…so it’s kind of redundant and pedantic to argue about proper usage.

It’s also silly at this point to worry about sample rates. PTSN is 8khz. That’s it. That’s all it will ever be. Period. They will not be upgrading it to 16khz, ever. The only way that happens is when the PTSN is entirely dead and there’s absolutely no more legacy installations anywhere on the planet. But as long as ulaw and alaw are out there running between switches; it’s easier to just stick with the universally supported format and stay there. You can’t guarantee the person on the other end will get that good quality. My cell phone…for exmaple…will NOT make any kind of call without noise processing. Even on my own PBX with ulaw trunk…the phone itself applies the same lousy processing and makes it sound like a cell phone call.

How was the person getting higher quality? Likely they’re not on PTSN anymore. Most people aren’t.

On Sat, Oct 25, 2025 at 9:22 AM Pooh via Asterisk Community <notifications@asterisk.discoursemail.com> wrote:

Pooh
October 25

On Saturday 25 October 2025 at 15:03:39, dewdude via Asterisk Community wrote:

Also…the phrase is lowest common denominator; and it applies for both
here. If the phones don’t do 44 but do 22khz; then 22khz the idiom applies
here too. It’s mostly a way of stating they find some kind of common
ground to work on; it’s not always literally the absolute lowest.

If one side offers 44kHz, 22kHz and 8kHz, and the other side offers 22kHz and
8kHz, then the lowest common denominator is 8kHz.

The highest common denominator is 22kHz.

To avoid mathematical / philosophical / linguistic arguments, I suggest people
say somethng like “the highest common capability” when talking about two sides
negotiating what quality to use.

Basically, the two sides offer each other the list of rates they support, and
the agrrement is the highest which is on both lists.

Antony.


It may not seem obvious, but (6 x 5 + 5) x 5 - 55 equals 5!


Visit Topic or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, click here.

In practice, it is actually to use the first one in the list that SDP responder also supports. With Asterisk, you choose the order, so if you have a legacy network with bandwidth limitations, you might list G.729 first, if you want to be able to do inband fax, you offer G.711 first, and if you want to primarily do speech, but with a wide audio bandwidth, and on a modern network, you might offer Opus first. (Actually you get the choice of either using your preferred order, on the B side, or using A side’s order, limited to your list for the B side, amongst other variations)

It’s also not, strictly speaking a negotiation. The response can include codecs that weren’t in the incoming offer, and you can send with codecs that were in the received offer, even if you didn’t offer them yourself. It is really a statement of what you are prepared to receive.

I’m not sure about this, but I think Asterisk can change the operational codec, if the other side ends up using an acceptable codec, which wasn’t the original working choice, and would do so to avoid an unnecessary transcoding. (If you go through an IVR, you have to make a working choice of codec, before you know the B side’s capabilities. I think Asterisk offers the option of only offering hte working first choice, or keeping its offer open.)

See res_pjsip - Asterisk Documentation res_pjsip - Asterisk Documentation and res_pjsip - Asterisk Documentation etc.

The only true negotiation would be if you got a Not Accepable Here response, but I don’t think that is often used in a way from which it could be recovered.

I qualify responder with SDP, because some systems (e.g. Cisco, in certain configurations), don’t include SDP in the INVITE; the offer is in the 200 OK, and the response is in the ACK. A and B sides are reversed, in that case.

For those not familiar with sampling theory, the 4 and 8 kHz figures are for different things, and both correspond to a guaranteed audio bandwidth of about 300-3.4kHz (total 3.1kHz).

8KHz is not the lowest. 4kHz would also be a common denominator acceptable to all parties. Also 2kHz etc. That’s why “lowest common denominator” is actually a meaningless term.

Interesting you should mention that, since the sample rates we are talking about are commonly integers.