Looks like that’s the point I’m at, then, so here it goes-
wait_our_turn() in app_queue.c appears to be the meat of the queue distribution logic. It’s called from queue_exec(), which is a function that runs on a separate thread for each caller in a loop until the caller is bridged with an agent.
The decision to place the distribution logic for each caller on its own thread appears to be because of the below code in wait_our_turn(), which makes the following blocking calls to say_position() and ast_waitfordigit()-
/* Make a position announcement, if enabled */
if (qe->parent->announcefrequency && (res = say_position(qe,ringing)))
break;
...
/* Wait a second before checking again */
if ((res = ast_waitfordigit(qe->chan, RECHECK * 1000))) {
...
It’s pretty obvious from this code how the scenario I laid out above could actually happen, where Call#2 steals an agent while Call#1 is blocking.
But wait - the first thing wait_our_turn() does is make a call to is_our_turn(), which is supposed to act as the giant synchronization primitive between all callers. The intention appears to be that is_our_turn() will only let a caller be bridged with an agent if that caller is first in line. But this invariant is broken by the “autofill” feature, which, if enabled, will basically let any caller in line randomly bridge with an agent, as long as the caller is “close enough” to position 1 in the queue.
This is a huge problem when there is high contention for agents, where an agent comes available only once every few minutes. When “autofill” is enabled, if Call#1 doesn’t happen to grab the agent when they become available, there’s a relatively large chance that several other callers could steal the agent from them, causing Call#1 to have to wait minutes longer.
But it gets worse. is_our_turn() only synchronizes callers in its own queue. It doesn’t look to see what callers or agents are in any other queues, so if an agent is logged in to multiple queues, there is no guarantee that Call#1 who has been waiting the longest will be delivered to the next available agent, because a call from another queue could have already stolen that agent while Call#1 was sleeping.
Could a developer confirm this, and would it be reasonable to report this as one gigantic race condition? It seems like currently, the only way for callers in queue to actually be delivered in order is for there to be only a single queue with “autofill” turned off.