I am trying to track down some advice on solving an issue which, for realtime endpoints only, shows Endpoint: state as “Unavailable” while the Contact: state is “Avail”. The CLI/log file shows the usual Contact/AOR deletion and creation messages but doesn’t mention anything about the endpoint. Calls can be sent to the endpoint.
If I remove sorcery caching for endpoints the realtime SIP extensions begin to behave the same as the static configuration file; the Endpoint: state is “Not in use” (correct) and I receive the “Endpoint XXX is now Reachable” message on the CLI.
The configuration of my sorcery.conf (for endpoints) is:
endpoint=config,pjsip.conf,criteria=type=endpoint
endpoint/cache=memory_cache,object_lifetime_stale=600,object_lifetime_maximum=1800
endpoint=realtime,ps_endpoints
Commenting out the middle line (endpoint/cache) is what causes the real time endpoints to behave the same as the statically configured endpoints. Re-enabling the cache causes the old behaviour (endpoint unavailable) to return whilst the contact state is still ‘Avail’.
This is impacting on the availability of the endpoints when queueing checks for available queue members. If the endpoint is marked as unavailable calls from the queue will not be forwarded.
There are quite a few different posts about endpoints but I couldn’t find one that specifically related to the cache configuration and this symptom. The version I tested this on is 18.10.1 and 20.7.0. Versions 18.11 and 18.14 also have the problem although I haven’t been able to test disabling endpoint caching because those servers are in production and can’t be restarted at the moment.
The following is the difference between the ‘pjsip show endpoint 157’ with and without sorcery caching of endpoints (in both cases the contact line shows ‘Avail’):
No caching - Endpoint: 157 Not in use 0 of inf
Caching - Endpoint: 157 Unavailable 0 of inf
You should actually show the full configuration including AOR, and provide console output with debug. The logging will generally tell you what is going on and may even provide light on why it is doing what it is doing.
Thank you for your response. Please see attached copies of both the realtime and statically configured peers configuration settings. There is also a log from the CLI and the debug log showing initial registration. 157 is the realtime peer and the endpoint status is not updated in the log file.
This is the configuration for sorcery.conf under res_pjsip:
Hi Josh. Thank you for the suggestions. I will run those tests tomorrow and see if there are any changes in behaviour. If I remember correctly, the order (in sorcery.conf) of the .conf first, cache second and realtime third was based on documentation about the order in which the sources would be accessed (ie cache had to be higher than realtime/DB or the DB would be read and the cache never used).
You’re using an uncommon configuration of both .conf and realtime, so I can’t remember the specifics of that but in general cache may be required to be first to properly operate.
I did test the scenarios as requested and the initial results show that by placing the cache first (before pjsip.conf and realtime) the endpoint status is updated correctly for both realtime and statically configured endpoints.
[2024-05-01 12:46:06.626] – Added contact ‘sip:157@CPE.Public.IP:31037’ to AOR ‘157’ with expiration of 3600 seconds
[2024-05-01 12:46:06.628] – Removed contact ‘sip:157@CPE.Public.IP:33258’ from AOR ‘157’ due to remove existing
[2024-05-01 12:46:06.633] == Contact 157/sip:157@CPE.Public.IP:33258 has been deleted
[2024-05-01 12:46:06.714] == Endpoint 157 is now Reachable
[2024-05-01 12:46:06.714] – Contact 157/sip:157@CPE.Public.IP:31037 is now Reachable. RTT: 86.970 msec
[2024-05-01 12:46:07.833] – Added contact ‘sip:154@CPE.Public.IP:51528’ to AOR ‘154’ with expiration of 3600 seconds
[2024-05-01 12:46:07.837] – Removed contact ‘sip:154@CPE.Public.IP:10973’ from AOR ‘154’ due to remove existing
[2024-05-01 12:46:07.837] == Contact 154/sip:154@CPE.Public.IP:10973 has been deleted
[2024-05-01 12:46:07.913] == Endpoint 154 is now Reachable
[2024-05-01 12:46:07.913] – Contact 154/sip:154@CPE.Public.IP:51528 is now Reachable. RTT: 78.985 msec
Endpoint: 157/157 Not in use 0 of inf
InAuth: 157/157
Aor: 157 1
Contact: 157/sip:157@CPE.Public.IP:31037 f4e08b536d Avail 79.676
Endpoint: 154/154 Not in use 0 of inf
InAuth: 154/154
Aor: 154 1
Contact: 154/sip:154@CPE.Public.IP:51528 9bdc7e342d Avail 80.011
The different performance based on changing sorcery.conf:
Caching First; correctly shows the endpoint status however when Asterisk is started (or possibly reloaded) the entire ps_endpoints database table (thousands of entries) is retrieved which takes a few seconds and creates an unacceptably large load on both the Asterisk server (as it needs to process the results) and the database server. Adding full_backend_cache=no stops the Asterisk server performing an initial load but if you run the CLI command ‘pjsip show endpoints’ at any point it will load all the endpoints from the database.
pjsip.conf First; doesn’t show the endpoint status properly which is affecting functions like queueing. When ‘pjsip show endpoints’ is run it will only show the endpoints configured in the static file plus any endpoints that have registered via realtime without loading every single database entry (this is preferred). However, the endpoint status is incorrect (Unavailable).
I haven’t tested realtime first (yet). Based on the testing so far I think there is a bug causing the endpoint status to not be updated if caching is not the first entry. I would assume that the order of the file/cache/realtime entries shouldn’t affect the endpoint status from being recorded.
It probably shouldn’t, but I think I originally wrote caching to be first. It’s not enforced in code, probably should have been, or written to not expect it. Oh, and as for pulling all the endpoints in various cases such as “pjsip show endpoints” that is intended. Things are supposed to give you a complete view.
Thank you for your feedback. I did manage to test with realtime first (ie before the cache and static configuration) and an endpoint was created. The test sorcery configuration is below:
That leads me to conclude that the only scenario where an endpoint isn’t created is where sorcery is configured to use static → cache → realtime. Do you think it would be possible for you to have a look at the code and see if there is an improvement that can be made so the endpoint is created even if I use my preferred configuration? Unfortunately my coding skills are certainly not up to a task of this complexity.
The static → cache → realtime flow seems to be the only one that will reduce the amount of DB/Asterisk processing required because it doesn’t retrieve all the endpoints from the database server.
Appreciate any input you have or guidance you can give.