Hi,
Hoping to get some guidance here. Been trying to track down a very very weird problem for months and today I finally ran across a “smoking gun”. Looking to understand what’s happening a little better.
In our company we have a CUCM 8.6 system and make very heavy use of Asterisk for a MeetMe based bridge (among other things.) Since an upgrade to Asterisk 11.5 last summer, been getting sketchy reports of individual callers to the bridge getting “bad quality” audio, isolated to Cisco 7962G phones. (We have all sorts of phones in our fleet, but only the 7962 has this issue.) It’s relatively rare, and I have not been able to reproduce it, but I have personally experienced it. When it does happen, it only affects one conference participant at a time.
The issue sounds exactly like this–and forgive me for posting a link to Cisco’s forum here :
supportforums.cisco.com/thread/2180971
Pretty weird. Lasts about 90 seconds and goes away, or you can hold/unhold to restart the RTP and it also goes away.
Today, finally captured the network traffic and I see that exactly when this issue starts, there is a “marked” RTP packet in the stream. I’m also missing a number of packets leading up to the marked packet, so it looks just like this issue:
issues.asterisk.org/jira/i#brow … RISK-17952
I have a capture at the phone and also a tcpdump from the server that shows this, so I know it’s coming from the server itself. The capture glitches a little at that point when I play it back using Wireshark, so it appears the 7962 has some particular problem with this marked packet, and basically loses it’s mind for a bit. Other phones seem more resiliant.
In the Asterisk issue 17952, the behavior is listed as “normal” since Asterisk is not the source of the audio and is not the source of the skew leading to the marking.
But…
In this case the source of the RTP is Asterisk MeetMe on that server, so I can’t figure out why Asterisk would suddenly think it’s a good idea to skip some audio. The server definitely knows that something happened, else it would not mark the packet, right?
And no, I don’t think we have issues with server capacity… this HP blade has 24 cores, RHEL 6.4, 256 GB RAM. Typically runs with a load average of 0.2. I’ve synthetically tested the server all the way up to the 512 call DAHDI limit, and tested ConfBridge to a couple of thousand. Normal load is a small fraction of that.
And yes, I do hope to migrate to ConfBridge some day… the problem is that the WebMeetme based portal we use would need totally redone, and time is required.
I thought I would reach out for some guidance while I try to dig through the code.
For now, can anyone point me to where this RTP header is built? Dahdi or Asterisk? Is it rebuilt by the bridge, or is it copied from a source packet?
Any thoughts on this issue would be greatly appreciated.
Thank you!
-Brian