MOH interrupts Queue after PRI provider upgrades switch

Something interesting…

We have had no code or hardware changes in a long time (months). Our provider of our PRI upgraded their switches two days ago and many old complaints got fixed - our CID started working and some line noise went away and other good benefits - but one thing immediately broke that I haven’t been able to figure out yet.

It’s an old Asterisk box. 1.2.12.1. We can’t upgrade yet for various reasons and we just need to keep it running for a few more months before retiring it.

Prior to switch upgrade - call a number and it rings to several phones at the same time in a queue. It would ring for 20 seconds or however long we set it and then go to voicemail. The caller would hear ringing. The agents would see/hear the phone ringing and 20 seconds to answer. These are mostly small doctor offices with two or three phones in the front office in queue so if one person is at lunch the other can grab it without moving desks.

After the switch upgrade - the caller hears one ring and immediately hears music on hold for about 15 seconds or so and then goes to voicemail. The agents only hear one quick ring and if they don’t catch it, it doesn’t show (no lights no ring) on their SIP device anymore.

I have tested calling a SIP device directly (no queue) and it works perfectly fine. All queues broke in this manner at the time of upgrade and all exhibit same behaviour - whether they have announcements or not, whether they have 1 agent or 20 and whether or not anyone is logged in or not. Any and all queues for all DID’s broke at the time of upgrade.

I would think that rules out queues.conf, agents.conf, musiconhold.conf, and extensions or the dial plan. I could be wrong but put a generic dial plan below.

I am thinking it might be more signalling related - as in zaptel or zapata? I have checked all settings with the PRI provider and they have confirmed them.

Extensions.conf
exten => 1234567890,1,Ringing
exten => 1234567890,2,Wait(2)
exten => 1234567890,3,Queue(TestQ|tT|||20)
exten => 1234567890,4,Voicemail(u123)
exten => 1234567890,s+1,Hangup

Queues.conf
[TestQ]
strategy=ringall
timeout=25
wrapuptime=0
maxlen=0
member => Agent/1
member => Agent/2

Agents.conf
agent=1,1234,Bob
agent=2,1234,John

Any help would be great. In the output, we have often times received congestion / busy notices but it doesn’t effect a call to go through.

Output

-- Accepting call from 'xxxxxxxxxx' to '1234567890' on channel 0/22, span 1
-- Executing Ringing("Zap/22-1", "") in new stack
-- Executing Wait("Zap/22-1", "2") in new stack
-- Executing Queue("Zap/22-1", "TestQ|tT|||20") in new stack
-- Started music on hold, class 'default', on channel 'Zap/22-1'[color=#FF0000] - I don't think it used to do this but not sure...[/color]
-- outgoing agentcall, to agent '1', on 'Local/1@default-a2d4,1'
-- Called Agent/1
-- outgoing agentcall, to agent '2', on 'Local/2@default-0688,1'
-- Executing Dial("Local/1@default-a2d4,2", "SIP/ABC1") in new stack

Jul 13 16:36:29 NOTICE[7668]: app_dial.c:1056 dial_exec_full: Unable to create channel of type ‘SIP’ (cause 3 - No route to destination)
== Everyone is busy/congested at this time (1:0/0/1)
– Called Agent/2
– Executing Dial(“Local/2@default-0688,2”, “SIP/ABC2”) in new stack
== Auto fallthrough, channel ‘Local/1@default-a2d4,2’ status is ‘CHANUNAVAIL’
– Agent/1 is circuit-busy
Jul 13 16:36:29 NOTICE[7669]: app_dial.c:1056 dial_exec_full: Unable to create channel of type ‘SIP’ (cause 3 - No route to destination)
== Everyone is busy/congested at this time (1:0/0/1)
== Auto fallthrough, channel ‘Local/2@default-0688,2’ status is ‘CHANUNAVAIL’
– Agent/2 is circuit-busy
– Channel 0/22, span 1 got hangup request [color=#FF0000]– I HUNG UP[/color]
– Stopped music on hold on Zap/22-1
– User disconnected from queue TestQ when they almost made it
== Spawn extension (default, 1234567890, 3) exited non-zero on ‘Zap/22-1’
– Hungup 'Zap/22-1’
voip*CLI>

The problems all seem to be on the outgoing side. Specifically IP routing issues.

I am trying to make sense of that. First, everything worked great and then the PRI provider made an upgrade to their phone core switches (not network switches) and the queue broke and now it’s an outgoing IP routing issue? I just don’t see where anything on the IP side changed. The only thing that changed is the master source switch of the PRI which would only involve signalling. Everything network related is still the same and unchanged.

We’ve tried to tweak the LBO because the switch was literally 100 feet further from the old switch. All the other signallings are supposedely correct according to the PRI provider. Has anyone else experience a similar issue?

The diagnostics clearly show a purely IP problem. The supplier update is a red herring, unless they upset the networking totally incidentally to the official change.

Strange thing is that some of the errors do indicate a networking issue: “Unable to create channel of type ‘SIP’ (cause 3 - No route to destination)”. Also in your dial plan you really should have: Queue(TestQ|tTr|||20) (adding the “r” if you want ringing and not MoH - not sure why this worked before unless the default for 1.2 are different from later versions).

As you can dial the agents extensions as normal I don’t think it’s an issue with the phones themselves though. Have you just simply tried rebooting?

What sort of ISDN adapter do you have and can you post the Zaptel configuration?

I think it worked before because he always had at least one available agent, so it never actually queued.

I guess you could be correct. However if you are playing MoH in a queue and an agent phone is ringing I thought the caller still got MoH whilst the agent line is ringing (although I could be wrong).

David: I think you sound as if you are in the UK. Where are you based?

Adding the “r” worked. It should have been there in the first place. |tTr|||20)

For some reason with the old switch things worked and with the new switch (with properly provisioned PRI’s) things stopped working (or really started working right). We tested pulling of the CID and it fixed some other minor bugs I was having too. So basically there were some coding errors or things we had put in to somewhat make things work with the old switch. They weren’t right - but they made it work. So now that things are right - we didn’t have to have the “jerry rigged” code anymore.

Very odd - don’t quite understand it all but very happy it is working now.