[Solved] Endpoint removed from ps_contacts before expiration

Hello all,

I’m struggeling for a few days now with something strange in my setup.

Running Asterisk 13.3.0 with pjsip, phones are sending a REGISTER with the parameter Expires: 3600
After a lookup in ps_endpoints, the contact is sucessfully added:
– Added contact ‘sip:+123456789@10.20.30.40:5060’ to AOR ‘+123456789’ with expiration of 3600 seconds

All goes well, but after some 12 minutes, the contact is gone, all I can see is a database delete action when debug logging is enabled:

SQL: DELETE FROM ps_contacts WHERE id=

Sadly, I have no control about the register frequency on the phones side.

Is there a parameter I’m missing?

Thanks!

Have you confirmed that the phone is not removing the contact? What is the complete console output around when it is deleted?

Have debug logging on 1, verbose logging on 3, all I can see is the database SQL query removing the record, no other messages. Also pjsip logger on, no sip packet coming from the phone before the delete.

Just saw that the interval is rather random. Mostly between 5 and 45 minutes i guess.
Contacts just dissapear and of course coming back every hour due to the phone registration.

Any suggestions?

Ok, additional output debuglevel increased.
This is the relevant logging part:

[17-Sep 16:56:51] DEBUG[1223]: pjsip0 <?>     tdta0x7f540c00 .Destroying txdata Response msg 200/OPTIONS/cseq=13723 (tdta0x7f540c008160)
[17-Sep 16:56:54] DEBUG[32410]: res_config_odbc.c113 custom_prepare Skip: 0; SQL: DELETE FROM ps_contacts WHERE id=?
[17-Sep 16:56:54] DEBUG[32410]: res_odbc.c1057 odbc_release_obj2 odbc_release_obj2(0x1c17f08) called (obj->txf = (nil))
[17-Sep 16:56:54] DEBUG[32372]: threadpool.c508 grow Increasing threadpool Sorcery's size by 1
[17-Sep 16:56:54] DEBUG[1722]: res_config_odbc.c113 custom_prepare Skip: 0; SQL: SELECT * FROM ps_aors WHERE id = ?
[17-Sep 16:56:54] DEBUG[1722]: res_config_odbc.c129 custom_prepare Parameter 1 ('id') = '+12345678901'
[17-Sep 16:56:54] DEBUG[1722]: res_odbc.c1057 odbc_release_obj2 odbc_release_obj2(0x1c17f08) called (obj->txf = (nil))
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [60] in [-2147483648, 2147483647] gives [60](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [3600] in [-2147483648, 2147483647] gives [3600](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [0] in [-2147483648, 2147483647] gives [0](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [7200] in [-2147483648, 2147483647] gives [7200](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [3600] in [-2147483648, 2147483647] gives [3600](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [10] in [-2147483648, 2147483647] gives [10](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [3600] in [-2147483648, 2147483647] gives [3600](0)
[17-Sep 16:56:54] DEBUG[1722]: config.c3530 ast_parse_arg extract int from [60] in [0, 86400] gives [60](0)
[17-Sep 16:56:54] DEBUG[1722]: res_config_odbc.c113 custom_prepare Skip: 0; SQL: SELECT * FROM ps_contacts WHERE id LIKE ? ORDER BY id
[17-Sep 16:56:54] DEBUG[1722]: res_config_odbc.c129 custom_prepare Parameter 1 ('id LIKE') = '+12345678901;@%'
[17-Sep 16:56:54] DEBUG[1722]: res_odbc.c1057 odbc_release_obj2 odbc_release_obj2(0x1c17f08) called (obj->txf = (nil))
[17-Sep 16:56:54] DEBUG[32373]: threadpool.c508 grow Increasing threadpool stasis-core's size by 1
[17-Sep 16:56:54] DEBUG[32380]: devicestate.c368 _ast_device_state No provider found, checking channel drivers for PJSIP - +12345678901
[17-Sep 16:56:54] DEBUG[32380]: res_config_odbc.c113 custom_prepare Skip: 0; SQL: SELECT * FROM ps_endpoints WHERE id = ?
[17-Sep 16:56:54] DEBUG[32380]: res_config_odbc.c129 custom_prepare Parameter 1 ('id') = '+12345678901'
[17-Sep 16:56:54] DEBUG[32380]: res_odbc.c1057 odbc_release_obj2 odbc_release_obj2(0x1c17f08) called (obj->txf = (nil))
[17-Sep 16:56:54] DEBUG[32380]: res_sorcery_realtime.c119 sorcery_realtime_filter_objectset Filtering out realtime field 'disallow' from retrieval
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [0] in [-2147483648, 2147483647] gives [0](0)
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [0] in [-2147483648, 2147483647] gives [0](0)
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [1800] in [-2147483648, 2147483647] gives [1800](0)
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [0] in [-2147483648, 2147483647] gives [0](0)
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [0] in [-2147483648, 2147483647] gives [0](0)
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [90] in [-2147483648, 2147483647] gives [90](0)
[17-Sep 16:56:54] DEBUG[32380]: config.c3530 ast_parse_arg extract int from [0] in [-2147483648, 2147483647] gives [0](0)
[17-Sep 16:56:54] DEBUG[32380]: devicestate.c464 do_state_change Changing state for PJSIP/+12345678901 - state 5 (Unavailable)

Any ideas?

I’m sorry for answering my own post, but I think I have found the problem.

I did forget to mention that we are using two Asterisk machines, who share the same database.
Both machines are getting SIP traffic from the provider and handsets in a roundrobin way.

So what happens:

Registration comes in on Asterisk1, is set in contacts with a expire of 3600 sec.
Before the end of the expire, phone sends another register, but it arrives on Asterisk2 due to the roundrobin strategy.
Asterisk2 does a nice update of the shared record, but it seems Asterisk1 isn’t aware of this.

The timer of Asterisk1 for this phone expires and does a remove from the shared contacts table.

Some time later, a new registration comes in on Asterisk1.

It seemed random to me, but the registration attempts are within the 3600 sec on a 70-90% timespan base or so. After analyzing the debuglog, I saw a delete exactly after 3600 sec of the initial register attempt.

For testing purposes, I have set all expires on 7200 sec, just to see if this is really the case.

Is there maybe something in the settings I’m missing, or is this just normal behaviour that both Asterisk machines are unaware of the update record, even if they share the same database?

Thanks a lot!

It’s normal, you can’t have the res_pjsip_registrar_expire module loaded in such an environment. It reacts based on actions from the registrar on the instance itself. If another instance gets a REGISTER then it won’t know. The only way to make it work otherwise would be either having it query the database constantly to update its state or some communication between instances to push the information around. Neither of these exist today.

Ok, consider it solved now, with increased expiration (double) due to roundrobin registars on two instances.

Thanks for your help!

Unfortunately, it’s not resolved yet.

When a registrations is send to instance 1, the internal timer will countdown to purge the record.
When I increased the expiration time, the problem only occurs less.
At some point, the re-registration will be send to only instance 2 for a couple of rounds, due to the roundrobin method and will be purged by instance 1. When we deploy more instances in the future, this will happen even ofter.

Are you sure, there is no way two (or more) instances can keep the counter in the external database?

Otherwise, is there a way to completely disable the purge routine like set the expire to 0?
Or just set the module res_pjsip_registrar_expire to noload in modules.conf?
I can have a custom external process keep track of this and purge only when it’s needed.

Thanks for your help.

The res_pjsip_registrar_expire module is what implements that logic. Disabling it is noloading it. Contacts will revert to being purged on retrieval.

Ok, thanks, not using the res_pjsip_registrar_expire module is the solution for me.

Changed subject to [Solved].