I’m struggeling for a few days now with something strange in my setup.
Running Asterisk 13.3.0 with pjsip, phones are sending a REGISTER with the parameter Expires: 3600
After a lookup in ps_endpoints, the contact is sucessfully added:
– Added contact ‘sip:+123456789@10.20.30.40:5060’ to AOR ‘+123456789’ with expiration of 3600 seconds
All goes well, but after some 12 minutes, the contact is gone, all I can see is a database delete action when debug logging is enabled:
SQL: DELETE FROM ps_contacts WHERE id=
Sadly, I have no control about the register frequency on the phones side.
Have debug logging on 1, verbose logging on 3, all I can see is the database SQL query removing the record, no other messages. Also pjsip logger on, no sip packet coming from the phone before the delete.
Just saw that the interval is rather random. Mostly between 5 and 45 minutes i guess.
Contacts just dissapear and of course coming back every hour due to the phone registration.
I’m sorry for answering my own post, but I think I have found the problem.
I did forget to mention that we are using two Asterisk machines, who share the same database.
Both machines are getting SIP traffic from the provider and handsets in a roundrobin way.
So what happens:
Registration comes in on Asterisk1, is set in contacts with a expire of 3600 sec.
Before the end of the expire, phone sends another register, but it arrives on Asterisk2 due to the roundrobin strategy.
Asterisk2 does a nice update of the shared record, but it seems Asterisk1 isn’t aware of this.
The timer of Asterisk1 for this phone expires and does a remove from the shared contacts table.
Some time later, a new registration comes in on Asterisk1.
It seemed random to me, but the registration attempts are within the 3600 sec on a 70-90% timespan base or so. After analyzing the debuglog, I saw a delete exactly after 3600 sec of the initial register attempt.
For testing purposes, I have set all expires on 7200 sec, just to see if this is really the case.
Is there maybe something in the settings I’m missing, or is this just normal behaviour that both Asterisk machines are unaware of the update record, even if they share the same database?
It’s normal, you can’t have the res_pjsip_registrar_expire module loaded in such an environment. It reacts based on actions from the registrar on the instance itself. If another instance gets a REGISTER then it won’t know. The only way to make it work otherwise would be either having it query the database constantly to update its state or some communication between instances to push the information around. Neither of these exist today.
When a registrations is send to instance 1, the internal timer will countdown to purge the record.
When I increased the expiration time, the problem only occurs less.
At some point, the re-registration will be send to only instance 2 for a couple of rounds, due to the roundrobin method and will be purged by instance 1. When we deploy more instances in the future, this will happen even ofter.
Are you sure, there is no way two (or more) instances can keep the counter in the external database?
Otherwise, is there a way to completely disable the purge routine like set the expire to 0?
Or just set the module res_pjsip_registrar_expire to noload in modules.conf?
I can have a custom external process keep track of this and purge only when it’s needed.