Call Quality - Dense, Echo, Coarse

I’ve installed vicibox11 on a Virtualbox VM as a testing sandbox with Dinstar GSM Gateway 32port.
8GB RAM 40GB storage and 4Core CPU
The system and VM are connected to the local network on a D Link switch (1210-28p 24 port poe switch) using a D link WiFi USB (DWA 182 Wireless AC1300 MU-MIMO WiFi USB).
The gateway is connected on LAN to the switch.
The latency is minimal at ~4ms from agent system and VM and then ~2ms from VM to Gateway, the jitter is very minimal <5ms too.
I’ve no idea what is causing the quality to be subpar.

Recorded call on Vicidial caller’s side: CallerAudio_Vicidial_DinstarGSM.mp3 - Google Drive
Recorded call on Receiver’s mobile: ReceiverAudio_Vicidial_DinstarGSM.mp3 - Google Drive

I think this is what you can expect if you use the GSM format. The codec uses lossy compression and sampling is limited to 3.1 kHz audio. In this case there is simply no brilliant sound.

Try to repeat the tests with different equipment and use at least one of the G.711 codecs to hear the difference.

G.711 is also 3.1kHz audio. (Technically GSM is speech, which is more restrictive than audio, but it is also based on an 8kHz sample rate, so only covers 3.1kHz.)

I’d agree that this is what you would expect from GSM codecs. The weak link is the mobile network, and maybe the gateway, if the network supports wider bandwidth speech codecs.

Doesn’t sound subpar to me. Sounds like every other recorded phone call I’ve ever heard. I’ve heard far worse on voice messages left on cellphones.

Remember that:

The frequency response of a Plain Old Telephone Service (PSTN) line, also known as a voiceband, is limited to a narrow range of300–3,300
hertz (Hz)
.

This because of 4 reasons:

  1. A “traditional” telephone set with a carbon mic in it has an operating frequency response of approximately 300 to 3000 Hz.

  2. The original copper phone lines were notorious for picking up high frequency crosstalk that could easily be heard by telephone subscribers. As a result the major telephone manufacturers (Bell Telephone’s Western Electric arm in the US, British Telecom in the UK, etc.) quickly discovered higher fidelity speakers and microphones than what were originally in the old “candlestick” telephones were rejected by consumers. This fact was lost, then rediscovered in the 1980’s when after the Carterphone decision a flood of cheap telephones manufactured in Asia hit the market - with no filters and limiting on their speakers and mics.

  3. When the PSTN began digitizing voice they had a vested interest in NOT increasing frequency response so that it would not require higher bandwidth on the digitized data stream. Standard uncompressed voice at this frequency response range can fit into 64k. But it requires 160k or more for 1 higher fidelity stereo channel for example. The cell carriers all have an even stronger interest in crunching voice calls down to the smallest amount of bandwidth possible.

  4. Although human hearing range is 20-20,000Hz, this is an optimal range. Most people as they age lose frequency response in their hearing. There’s “intelligible” and there’s “high fidelity” Telephone systems aim for “intelligible” not “high fidelity”

This is why voice talent like Alison Smith is not that common. Her voice happens to have a sound profile with a lot of energy in that range, which is why she can record all those voicemail prompts and so on and still sound somewhat “brighter” even though if you took a spectrum analyzer to her recordings you wouldn’t see much activity in the range above 3000Hz. The human brain hears her voice and “knows” it’s supposed to have more high frequencies in it and just substitutes them in your head so she sounds “brighter” even though from a science standpoint the high frequencies don’t exist.

Typical VoIP phones also follow the same design restrictions so if you really want to test if your phone system can record voice clearly, use a softphone like Linphone on a laptop and a USB mic and headset plugged into it, and you will get a really nice crisp bright recording. Remember that the most uncontrolled part of any recording is the conversion of the sound waves in the air to the digital stream and the digital stream back to the sound waves in the air. You have a mic and a speaker involved in that and modern “pinhole” mics on cell phones, laptops, and so on generally have terrible sound quality. Garbage in, garbage out, as the saying goes.

But, like I said, even if you preserve fidelity inside your gear you don’t have control over the fidelity coming in from the remote call. The PSTN is going to strip out any “brightness” or “crispness” on calls.

What you got there, is pretty much the best you are gonna get. And a codec change isn’t going to do anything to fix it.

Ted

On 10/2/2024 1:48 AM, TarunGopinath via Asterisk Community wrote:

TarunGopinath
October 2

I’ve installed vicibox11 on a Virtualbox VM as a testing sandbox with Dinstar GSM Gateway 32port.
8GB RAM 40GB storage and 4Core CPU
The system and VM are connected to the local network on a D Link switch (1210-28p 24 port poe switch) using a D link WiFi USB (DWA 182 Wireless AC1300 MU-MIMO WiFi USB).
The gateway is connected on LAN to the switch.
The latency is minimal at ~4ms from agent system and VM and then ~2ms from VM to Gateway, the jitter is very minimal <5ms too.
I’ve no idea what is causing the quality to be subpar.

Recorded call on Vicidial caller’s side: CallerAudio_Vicidial_DinstarGSM.mp3 - Google Drive
Recorded call on Receiver’s mobile: ReceiverAudio_Vicidial_DinstarGSM.mp3 - Google Drive


Visit Topic or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, click here.

You missed 2a.

Before networks went digital, inter-exchange trunks were carried as single sideband, typically over 48kHz channels (group band), with 4kHz channel spacing. Limiting the audio bandwidth to 3.1kHz allowed enough room for easy to realise filters to reach high levels of attenuation, before the active part of the next channel was reached. (It’s also the need for realisable anti-aliasing filters that means you need an 8kHz Nyquist sampling frequency, rather than a 6.2kHz one.

Also, 3.4kHz, isn’t actually high enough for really good speech, as there is significant information above that frequency in sibillants (s, f, etc.), and those can be difficult to distinguish over 3.1kHz audio circuits.

Mobile phones use voice tract models and may actually be worse for some languages, than others, as a result.

I had forgotten about all of the hacks used to “multiplex” analog transmissions like the sideband trick, which is also still used in some lower frequency FM and AM radio bands.

I keep forgetting that years ago at one time there were people who’s entire 40 year careers were working this stuff out - much of which was discovered via trial and error - hunched over adjustable potentiometers and coils, tone generators and oscilloscopes, and watched over by chalkboards covered with trig functions and graphs, in analog circuits at Bell Labs and other companies.

And then, in a snap of time in the digital age, it’s almost all forgotten…