Calls in queue being delivered out of order


#1

I’m receiving complaints about the following scenario a few times a week-

  • Caller #1 enters queue at 11:26:04
  • Caller #2 enters queue at 11:28:50
  • Agent “Chris” becomes available at 11:32:44 and receives Caller #2, who had been waiting for 3m54s, even though Caller #1 had been waiting for 6m40s

Or, put more simply, callers are being delivered to agents out of order from when they joined the queue.

What could possibly be going on? I’m running Asterisk 1.6.2.18 from the Fedora 14 repository.


#2

announcements cause this type of behavior. While the system is telling the customer how important that they are to you, they are unable to be picked up by an agent that just became available. So the next one in line gets handled.


#3

We do queue position announcements - “you are currently caller number x in line” - I bet that’s it.

I’ll take them off now and see if it fixes the issue. Thanks a lot for the suggestion!


#4

Hm, still happening.

In the two screenshots below, the call at 12:52:08 which is first in line can only be handled by “Rowan” because they’re the only person logged in to that particular queue. “Rowan” is also logged in to other queues, however.

So “Rowan” becomes available, but they receive the call that’s second in line instead of the one that’s first in line-

Then after “Rowan” finishes that call, they again receive the second call in line-


#5

And here’s another complaint that came in just now, where “Will” receives the second call in line which has been waiting approx. 11 seconds less than the first call-

These reports are actually coming from two separate call centers running two separate phone systems. One is running Asterisk 1.6.2.19 on CentOS 5.6 x64, and the other is running Asterisk 1.6.2.18 on Fedora 14 x64.


#6

Although recorded announcements is the only detailed mechanism I know for this, I think all this sort of behaviour basically comes from the fact that the code is structured such that calls look for available members, and only do so once a second. This includes making sure that the member wouldn’t find a better choice in another queue.

There is no thread for the queue which looks for available work.

Incidentally, your original redaction using blurring is almost certainly insecure, as the limited number of possible symbols should make it possible to find a combination that produces the blurred image.


#7

Hi

Announcmenst will do this , but its odd always teh second call getting delivered. what does a first caller hear ?

Also what does the queues.conf look like for the queues ? have you got autofill set to yes ?

You might have to watch teh cli for a while to see whats happening.

Ian


#8

So if I’m understanding you correctly, you’re saying the above can be explained as follows-

Two calls in line - Call#1 is first, Call#2 is second…

  • Call#1 wakes up, checks for available agents, but doesn’t find any, and so it sleeps for 1000ms.
  • Call#2 wakes up 500ms after Call#1 went to sleep, and checks for available agents. It finds one, so it calls the agent.
  • Call#1 wakes up again after sleeping 1000ms, checks for available agents, and again doesn’t find any, so it sleeps for 1000ms.

But then wouldn’t that take care of the situation above?

Not too concerned, all callers are employees with company-provided phones whose numbers are regularly given to customers.


#9

Not a clue. I presume they just hear MoH since I took position announcements off.

queues.conf is being configured by the latest stable version of FreePBX. And yes, autofill is enabled.

All queues follow this pattern, with none weighted above the others-

[1]
announce-frequency=0
announce-holdtime=no
announce-position=no
autofill=yes
eventmemberstatus=yes
eventwhencalled=yes
joinempty=no
leavewhenempty=yes
maxlen=0
memberdelay=0
music=default
penaltymemberslimit=0
periodic-announce-frequency=0
queue-callswaiting=silence/1
queue-thereare=silence/1
queue-youarenext=silence/1
reportholdtime=no
retry=1
ringinuse=no
servicelevel=60
strategy=linear
timeout=15
timeoutpriority=app
timeoutrestart=yes
weight=0
wrapuptime=1

#10

But then wouldn’t that take care of the situation above?

[/quote]

I’d have to spend more time than is reasonable checking the code, but the thought was that a call may skip an agent because the agent might otherwise pick a higher priority call on another queue, but, in reality some other agent gets that high priority call.


#11

One second of silence is still an announcement, although you may have turned the announcements off by other means.


#12

Looks like that’s the point I’m at, then, so here it goes-

wait_our_turn() in app_queue.c appears to be the meat of the queue distribution logic. It’s called from queue_exec(), which is a function that runs on a separate thread for each caller in a loop until the caller is bridged with an agent.

The decision to place the distribution logic for each caller on its own thread appears to be because of the below code in wait_our_turn(), which makes the following blocking calls to say_position() and ast_waitfordigit()-

/* Make a position announcement, if enabled */
if (qe->parent->announcefrequency && (res = say_position(qe,ringing)))
	break;
...
/* Wait a second before checking again */
if ((res = ast_waitfordigit(qe->chan, RECHECK * 1000))) {
...

It’s pretty obvious from this code how the scenario I laid out above could actually happen, where Call#2 steals an agent while Call#1 is blocking.

But wait - the first thing wait_our_turn() does is make a call to is_our_turn(), which is supposed to act as the giant synchronization primitive between all callers. The intention appears to be that is_our_turn() will only let a caller be bridged with an agent if that caller is first in line. But this invariant is broken by the “autofill” feature, which, if enabled, will basically let any caller in line randomly bridge with an agent, as long as the caller is “close enough” to position 1 in the queue.

This is a huge problem when there is high contention for agents, where an agent comes available only once every few minutes. When “autofill” is enabled, if Call#1 doesn’t happen to grab the agent when they become available, there’s a relatively large chance that several other callers could steal the agent from them, causing Call#1 to have to wait minutes longer.

But it gets worse. is_our_turn() only synchronizes callers in its own queue. It doesn’t look to see what callers or agents are in any other queues, so if an agent is logged in to multiple queues, there is no guarantee that Call#1 who has been waiting the longest will be delivered to the next available agent, because a call from another queue could have already stolen that agent while Call#1 was sleeping.

Could a developer confirm this, and would it be reasonable to report this as one gigantic race condition? It seems like currently, the only way for callers in queue to actually be delivered in order is for there to be only a single queue with “autofill” turned off.


#13

Howdy,

The best places to interact with developers are over on the #asterisk-dev IRC channel on Freenode, or on the asterisk-dev mailing list (lists.digium.com).

I’m sure they’re interested in your analysis.