SIP XFERs via reINVITE/REFER and order of NOTIFYs

I’m trying to troubleshoot some MSTeams integration hiccups.
I’m using an Asterisk box to interface between an ipPBX and MS Teams, and sort of works. But sometimes MS Teams loose track of a call put on hold.
(i.e. you can not resume it)

HOLD is done by a reINVITE, resume is done with a REFER.
But it seems Teams does not like some NOTIFYs, and respond with 500 to some.
Oddly, the (console) debug output is not in the order I would expect, with a Ringing showing before a Trying, and then an OK, although Seq numbers are in the expected order. And the 500 response is to the “out of order” NOTIFY, which is courious…

So question: is the console output synchronous to actual messages ?
If they are, what would make an out of order TX possible ?

TIA,
-Carlos

Console output is in the order that things occur. As to being out of order, without a complete SIP trace I can’t really say. A REFER also isn’t used for resuming exactly, it’s used for transferring.

The being out of order comes from 2 places:
-I expect Trying happens before Ringing
-The Trying event has a CSeq: 32434 NOTIFY and the Ringing event has a CSeq: 32435 NOTIFY

Order that “things occur” might not be a good definition here.
I don’t really want to post the trace w/o sanitizing it, I don’t mind sending it to you direct, but that’s what it happened, more than once.

Are you having problems with blind and consult transfers AND just putting a call on hold? hold uses Inactive and this can lead to the call being put on hold forever and MS Teams hanging up the call. Does the call disappear in Teams but stay on hold on pbx handset? or just to the pstn? where does the pstn terminate?

some more info : MS Teams Proxy limitations

RFC and sections Description Deviation
RFC 6337, section 5.3 Hold and Resume of Media RFC allows using “a=inactive”, “a=sendonly”, a=recvonly” to place a call on hold. The SIP proxy only supports “a=inactive” and does not understand if the SBC sends “a=sendonly” or “a=recvonly”

I was just testing, transfers (attendant initiated) seem to work.
I called myself in a teams extension and put the call on hold. Teams sent a reINVITE with a=inactive , Asterisk gave me music. Fine.

Then I tried to resume and … it works sometimes. When it does not work, Teams goes into an inconsistent state with the call staying in a non working mood, and that seems to be correlated to a “500 Internal Server Error” that is the answer to an out of order NOTIFY.

PSTN is on another SIP trunk. PSTN -> Asterisk -> Teams.
(Actually PSTN -> CUBE -> Asterisk -> Teams)

what version of Asterisk? Latest? i upgraded mine to latest due to some inconsistencies in call handling with MS Teams.

Yep, 17.3.
Trace does not fit forum limits, pasted here:
https://pastebin.com/QaCtsWFE

be interested to see if you get same issue with a sip phone registered to Asterisk. what is the endpoint registered to you are testing? or is it just pstn to cisco cube to asterisk? ill test on my setup as well when i get a chance.

There are no phones registered to Asterisk in this setup…

Trying NOTIFY does seem to have been sent out of sequence (and with its CSEQ out of sequence).

Is this chan_sip or chan_pjsip. I can’t think how the former could do this, as I think everything relevant is on one thread.

It should be harmless, though.

Yup, you are right, this is pjsip, I should have noted that.
It SHOULD be harmless, we agree, but if you have a strict state machine, it could be that you do not expect a Trying after you already got a Ringing, right ?
And that 500 sounds (to me at this stage) kind of pointing at that.

I do not know the working of the channel, but this “slippage” or reordering needs some kind of buffer somewhere. And it would seem either locking or ownership or sorting mech ?

It doesn’t need a buffer, the messages themselves aren’t out of order. The code which produces the messages may be generating them out of order in the first place. A change went in[1] which touched this area of code, so it’s possible your specific scenario exposed an issue with it. If you undo the change and rebuild Asterisk and it is fixed, then that is the problem and you’d need to file an issue[2] with all the information including packet trace and console output.

[1] https://gerrit.asterisk.org/c/asterisk/+/13852
[2] https://issues.asterisk.org/jira

Ok, I will try that, but could you please tell us how the code would generate with inverted sequence numbers, like in 2 1 3 ?
I do not understand your saying “the messages aren’t out of order” for that sequence.

I don’t know, but since that code was recently touched I’d rather investigate that first to see if it is indeed the problem and having some kind of weird interaction. If the problem still occurs, then an issue would still need to be filed with the information I mentioned and there would be no time frame on when it would be resolved.

A correct implementation of the UAS protocol will send 500 and ignore the out of sequence message, so won’t be confused. Trying is completely optional.

The only thing that might change state is Asterisk itself, in response to the 500.

David,
you are assuming a correct implementation, that would be an ideal world, right ?
As someone used to say, better be strict on TX, tolerant on RX.
The notifies are being pushed into a task pool, so there is a buffer in the sense of my previous message. The system seems to track state related threads to avoid reordering, but AFAIK there is evidence that it does not. Given that I have access to this side source and not the other…I would rather omit the Trying altogether (the other side has the REFER accepted, so no new info is really there in the Trying, is there ?)

Ok, there is more to it definetelly.
There is specific code to send a 100 Trying if it was not already sent (NOTIFY, that is). And this code is the one somehow sending in rapid sucesison the Trying/Ringing that gets swapped.
I kind of disabled that (pretended a Trying was already sent when generating the Ringing) and the OOO Trying/500 went away. Bad news is that hold call still gets stuck.

Someone putting an effort in sending the Trying means it is not that “completely” optional, it seems, at least on that someone’s view.

Hats off to Joshua. Reverting 13852 fixes the problem.
The out of order Trying is still inserted, but now it does not generate a 500 response.
(nor does the call stay in limbo at Teams after resume is tried).

Thanks.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.