This is almost certainly comfort noise / silence suppression (CNG) on the provider side, or possibly VAD on your own endpoint. The clue is that it only happens after a 1-2 second pause and targets the leading edge of speech — that’s the classic signature of a codec or gateway re-keying the audio stream after suppressing silence frames.
A few things to check and try:
**1. Confirm silence suppression from the trunk provider**
Capture on the inbound leg specifically and look for CN (Comfort Noise) packets — payload type 13 in RTP:
```
tcpdump -i eth0 -w /tmp/inbound.pcap port 5060 or portrange 10000-20000
```
Open in Wireshark, filter `rtp.p_type == 13`. If you see CN packets, the provider is running VAD/CNG. Some providers let you disable it, some don’t. Worth a support ticket to ask.
**2. Disable VAD on your PJSIP endpoint**
In your endpoint config, make sure silence detection is off:
```
[your-endpoint]
type=endpoint
rtp_timeout=120
rtp_timeout_hold=300
```
And in `rtp.conf`:
```
[general]
strictrtp=yes
rtpstart=10000
rtpend=20000
```
Asterisk doesn’t enable VAD by default, but confirm you don’t have `silenceThreshold` or `silencesuppression=yes` anywhere in your configs.
**3. The jitterbuffer angle**
Since you’re using AudioSocket (external process), the jitterbuffer setting matters a lot. The fixed jitterbuffer can introduce exactly this symptom — it holds initial frames after silence while it re-fills. Try switching to adaptive:
```
[general]
jbenable=yes
jbforce=yes
jbimpl=adaptive
jbmaxsize=200
jbtargetextra=40
jbresyncthreshold=1000
jblog=yes
```
The `jbresyncthreshold` is key here — set it high (1000ms) so the jitterbuffer doesn’t try to resync after every silence gap, which causes exactly the clipping you describe.
**4. rtp_keepalive isn’t enough**
You have `rtp_keepalive=2` which sends keepalive packets every 2 seconds. That keeps the NAT pinhole open but doesn’t solve the actual problem — the provider’s media gateway is still going to clip the leading edge when it restarts the audio stream after silence. What might help more is `send_rpid=yes` and making sure the provider keeps the media path active.
**5. AudioSocket-specific workaround**
If the provider won’t disable CNG and the jitterbuffer tuning doesn’t help, you can add a small pre-buffer in your AudioSocket application. Before processing speech, buffer ~100ms of audio after detecting the transition from silence to voice. This eats the garbled leading edge and gives you clean audio. Not ideal but it works when the trunk is the problem.
One more thing: you’re on 20.6.0-rc1 which is a release candidate. There were some AudioSocket fixes in 20.7.0 and 20.8.0 related to frame timing. Worth upgrading to the latest stable 20.x if you haven’t already.