Endpoints periodically deemed Unreachable after responding to OPTIONS request

Soelvberg · July 3, 2024, 1:29pm

I have a customer that has two siptrunks connections - both using TLS.
Frequently, but not always, Asterisk marks the endpoints unreachable immediately after resceiving their response to an OPTIONS request sent by us.
The qualify_frequency is set to 60, and when we 1 minute later send a new OPTIONS request, Asterisk now believe that the endpoint is reachable again.when we receive their response.
I see no pattern for when this happens, but typically happens 5-6 times per day and other days not at all. It happens both when there is high load on the system, and early moring when the (almost) only activity is OPTIONS requests and responses.

I have attached 3 examples from today, and the transport and endpoint configuration that we have.

gjoseph · July 3, 2024, 2:54pm

I deleted your attachments because they had ip addresses in them but try setting the qualify_frequency to something other than exactly 60 seconds. Try 55 seconds.

Soelvberg · July 3, 2024, 3:04pm

Here is a redacted example of an incident.

Incident1_redacted.txt (2.6 KB)

Will try with 55, and let you know

Soelvberg · July 8, 2024, 1:17pm

Unfortunately setting the qualify frequency to 55 seconds did not change anything.
The SipTrunks are still wrongly reported as unreachable 2-3 times a day.

It “only” lasts for a minute, but this is a very busy customer, and calls are getting lost because the agents can’t answer the calls. When they fail to answer, then our system marks the agent as unreachable, and within seconds all agents become unreachable.

Please note that we have two siptrunks configured for this tenant.
One for international numbers and another for national ones.
Could Asterisk get confused and marking the wrong endpoint as unrechable when it receives a response to OPTIONS?
And why will Asterisk ever mark anything as unrechable when it receives a reply to OPTIONS?

Do you have another Idea about where the error lies?
And how to avoid it?

jcolp · July 8, 2024, 1:22pm

From a SIP protocol perspective if a response is received after the request transaction goes away, then it wouldn’t be marked as reachable because it has no idea about it anymore. Is that the cause? Unknown, but just answering that question.

A debug level log would be needed[1] as that shows the logic of what is happening inside.

It’s also unlikely to get confused.

[1] Collecting Debug Information - Asterisk Documentation

jcolp · July 8, 2024, 1:23pm

Oh, and you haven’t stated what version of Asterisk is in use.

Soelvberg · July 8, 2024, 1:37pm

Sorry that I forgot the version
We are using v. 20.6.0

I have set the debug level to 1, and will let you know whether this reveals anything

The response is received within trhe same second that we sent our OPTIONS request, so it is not a timeout problem:

[Jul 8 12:06:51] VERBOSE[6870] res_pjsip_logger.c: <— Transmitting SIP request (498 bytes) to TLS:XX.233.237.XX:5063 —>
OPTIONS sip:sbc1.xxxxxxxxxx.com:5063 SIP/2.0
Via: SIP/2.0/TLS XX.229.231.XX:5067;rport;branch=z9hG4bKPj32dd5d61-5645-4f08-8d4c-c4ac339e28c2;alias
From: sip:3_sbc1xxxxxxxxxxcom@10.0.0.4;tag=4706a3ba-7f33-42f7-ac16-f92d3066226a
To: sip:sbc1.xxxxxxxxxx.com
Contact: sip:zylinc@XX.229.231.XX:5067;transport=TLS
Call-ID: 6518f3ac-40f9-4d67-be84-99d560a4fb71
CSeq: 40394 OPTIONS
Route: sip:52.233.237.78:5063;transport=tls;lr
Max-Forwards: 70
User-Agent: ZySwitch
Content-Length: 0

[Jul 8 12:06:51] VERBOSE[57] res_pjsip_logger.c: <— Received SIP response (462 bytes) from TLS:XX.233.237.XX:5063 —>
SIP/2.0 200 OK
Via: SIP/2.0/TLS XX.229.231.XX:5067;alias;received=XX.229.231.XX;rport=33335;branch=z9hG4bKPj32dd5d61-5645-4f08-8d4c-c4ac339e28c2
From: sip:3_sbc1xxxxxxxxxxcom@10.0.0.4;tag=4706a3ba-7f33-42f7-ac16-f92d3066226a
To: sip:sbc1.xxxxxxxxxx.com;tag=1c1624881152
Call-ID: 6518f3ac-40f9-4d67-be84-99d560a4fb71
CSeq: 40394 OPTIONS
Contact: sip:sbc1.xxxxxxxxxx.com:5063;transport=tls
Server: Mediant VE SBC/v.7.20A.259.221
Content-Length: 0

[Jul 8 12:06:51] VERBOSE[575] res_pjsip/pjsip_configuration.c: Endpoint 3_sbc1xxxxxxxxxxcom is now Unreachable
[Jul 8 12:06:51] VERBOSE[575] res_pjsip/pjsip_options.c: Contact 3_sbc1xxxxxxxxxxcom/sip:sbc1.xxxxxxxxxx.com:5063 is now Unreachable. RTT: 0.000 msec

jcolp · July 8, 2024, 1:43pm

Please follow the instructions on the documentation site for collecting information.

gjoseph · July 8, 2024, 2:00pm

The output of pjsip show aor <trunk aor> and pjsip show settings would be helpful.

Are REGISTER requests being sent on this trunk? If so, the output of pjsip show registration <trunk registration> would also help.

Soelvberg · July 8, 2024, 2:05pm

I tried setting the debug level to 5 as written, but anything higher than 2 results in a massive log, that I would never be able to send to you.
Running it just for a 30-40 seconds,resulted in a full log of 6MB.
I have now set it to 2, and hope that this is information enough.

Soelvberg · July 8, 2024, 2:12pm

There is actually a third SipTrunk that we do register towards, however never see the issue for this.
We do not register the 2 siptrunks that peridically becomes unreachable.
We and the provider has hardcoded the remote contact to use.
Please note the these SipTrunks are using TLS - don’t know whether this can explain something…

pjsip show registrations

<Registration/ServerURI…> <Auth…> <Status…>

3_Unitel-32422666-32422669/sip:sip3.uni-tel.dk 3_Unitel-32422666-32422669 Registered (exp. 3007s)

pjsip show settings:

Global Settings:

ParameterName : ParameterValue

all_codecs_on_empty_reinvite : false
allow_sending_180_after_183 : false
contact_expiration_check_interval : 30
debug : yes
default_from_user : asterisk
default_outbound_endpoint : default_outbound_endpoint
default_realm : asterisk
default_voicemail_extension :
disable_multi_domain : false
endpoint_identifier_order : ip,username,anonymous
ignore_uri_user_options : false
keep_alive_interval : 90
max_forwards : 70
max_initial_qualify_time : 0
mwi_disable_initial_unsolicited : false
mwi_tps_queue_high : 500
mwi_tps_queue_low : -1
norefersub : yes
regcontext :
send_contact_status_on_update_registration : no
taskprocessor_overload_trigger : global
unidentified_request_count : 5
unidentified_request_period : 5
unidentified_request_prune_interval : 30
use_callerid_contact : no
user_agent : ZySwitch

System Settings:

ParameterName : ParameterValue

accept_multiple_sdp_answers : false
compact_headers : false
disable_rport : false
disable_tcp_switch : true
follow_early_media_fork : true
threadpool_auto_increment : 5
threadpool_idle_timeout : 60
threadpool_initial_size : 0
threadpool_max_size : 50
timer_b : 32000
timer_t1 : 500

Soelvberg · July 8, 2024, 2:23pm

The AORs that you requested:

pjsip show aor 3_sbc1bestsellercom

  Aor:  <Aor..............................................>  <MaxContact>
Contact:  <Aor/ContactUri............................> <Hash....> <Status> <RTT(ms)..>

==========================================================================================

  Aor:  3_sbc1bestsellercom                                  0
Contact:  3_sbc1bestsellercom/sip:sbc1.bestseller.com: 785685b8cd Avail        15.368

ParameterName : ParameterValue

authenticate_qualify : false
contact : sip:sbc1.bestseller.com:5063
default_expiration : 3600
mailboxes :
max_contacts : 0
maximum_expiration : 7200
minimum_expiration : 60
outbound_proxy :
qualify_frequency : 60
qualify_timeout : 3.000000
remove_existing : false
remove_unavailable : false
support_path : false
voicemail_extension :

clientprod1*CLI> pjsip show aor 3_sbc2bestsellercom

  Aor:  <Aor..............................................>  <MaxContact>
Contact:  <Aor/ContactUri............................> <Hash....> <Status> <RTT(ms)..>

==========================================================================================

  Aor:  3_sbc2bestsellercom                                  0
Contact:  3_sbc2bestsellercom/sip:sbc2.bestseller.com: 4fd6efc620 Avail         7.874

ParameterName : ParameterValue

authenticate_qualify : false
contact : sip:sbc2.bestseller.com:5063
default_expiration : 3600
mailboxes :
max_contacts : 0
maximum_expiration : 7200
minimum_expiration : 60
outbound_proxy :
qualify_frequency : 60
qualify_timeout : 3.000000
remove_existing : false
remove_unavailable : false
support_path : false
voicemail_extension :

Soelvberg · July 8, 2024, 2:25pm

And the transport configuration for TLS in case that provides anything:

pjsip show transport transport-tls

Transport: <TransportId…> <BindAddress…>

Transport: transport-tls tls 3 96 0.0.0.0:5067

ParameterName : ParameterValue

allow_reload : false
allow_wildcard_certs : No
async_operations : 1
bind : 0.0.0.0:5067
ca_list_file :
ca_list_path :
cert_file : /var/lib/asterisk/certs/novus.crt
cipher :
cos : 3
domain :
external_media_address : XX.229.231.XX
external_signaling_address : XX.229.231.XX
external_signaling_port : 5067
local_net : 10.0.0.0/255.0.0.0
local_net : 172.16.0.0/255.240.0.0
local_net : 172.16.0.0/255.240.0.0
local_net : 172.16.0.0/255.240.0.0
local_net : 192.168.0.0/255.255.0.0
method : tlsv1_2
password :
priv_key_file : /var/lib/asterisk/certs/novus.key
protocol : tls
require_client_cert : No
symmetric_transport : false
tos : 96
verify_client : No
verify_server : No
websocket_write_timeout : 100

Soelvberg · July 8, 2024, 2:57pm

Just noticed that the same localnet was shown 3 times when I requested the transport settings from the CLI.
This is however not what is writted in the pjsip.conf.
I don’t know whether this could be related to the unreachable problem, but just to be sure, I have pasted the entire transport configuratin as writted in pjsip.conf below:

; *********************************************************************************
; PJSIP_TRANSPORTS.CONF
; *********************************************************************************

[transport-udp]
type=transport
protocol=udp
bind=0.0.0.0:5090
tos=cs3
cos=3
allow_reload=false
external_media_address=XX.97.242.XXX
external_signaling_address=XX.97.242.XXX
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5090

[transport-tcp]
type=transport
protocol=tcp
bind=0.0.0.0:5090
tos=cs3
cos=3
allow_reload=false
external_media_address=XX.97.242.XXX
external_signaling_address=XX.97.242.XXX
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5090

[transport-tls]
type=transport
protocol=tls
bind=0.0.0.0:5091
tos=cs3
cos=3
allow_reload=false
method=tlsv1_2
external_media_address=XX.97.242.XXX
external_signaling_address=XX.97.242.XXX
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5091
cert_file=/var/lib/asterisk/certs/novus.crt
priv_key_file=/var/lib/asterisk/certs/novus.key

[transport-wss]
type=transport
protocol=wss
bind=0.0.0.0
; All other transport parameters are ignored for wss transports

[anonymous]
type=endpoint
context=default
tos_audio=ef
cos_audio=5
auth=
disallow=all
allow=alaw
allow=ulaw
allow=g722
allow=opus
allow=alaw
allow=ulaw
allow=gsm
force_rport=yes
100rel=no
connected_line_method=invite
direct_media_method=invite
direct_media=no
trust_id_inbound=no
trust_id_outbound=no
media_encryption=sdes
media_encryption_optimistic=true
timers=yes
timers_min_se=90
timers_sess_expires=1800

FrancoSmash · July 8, 2024, 3:11pm

OPTIONS sip:sbc1.xxxxxxxxxx.com:5063 SIP/2.
bind : 0.0.0.0:5067
bind=0.0.0.0:5091
bind=0.0.0.0:5090
external_signaling_port=5091

Thru your entire notes I see those outputs which I believe creates a conflict, would it not ??

Soelvberg · July 9, 2024, 8:30am

You are right. I inadvertently gave you the transports for a wrong Tenant.
This tenant uses the ports 5066 and 5067 for unencrypted and encrypted respectively.
All tenants is however set up in same way - only the poprts differ, which is how we seperate tenants from each other.
The port 5063 is the port of the SBCs that we communicate with.
I have attached the correct transport configuration.

attached is also the full log with debug level set to 2. I have however removed the start and the end of the log to be able to attach it. The entire log is 42MB and seemingly you don’t accept zip attachments. If you need the entitre log, then just say so.

There are two incidents on the log where the two SBCs were deemed unreachable:

[Jul 9 07:18:45] SBC2 Unreachable
[Jul 9 07:19:45] SBC2 Reachable
[Jul 9 07:29:14] SBC1 Unreachable
[Jul 9 07:30:14] SBC1 Reachable

Just search for "bestsellercom is now "

You will see a lot of other “is now Unreachable” log messages, but that is merely webRtc softphones logging off which is working like a charm, and is not part of the problem.

debug_log_103184_redacted.txt (3.5 MB)
TransportConfiguration.txt (1.9 KB)

jcolp · July 9, 2024, 8:32am

That debug level didn’t actually have the messages from the OPTIONS and qualify handling. That appears to be at level 3.

Soelvberg · July 9, 2024, 8:36am

If I set it to 3, thenthe log will be filled with this

[Jul 9 08:33:07] DEBUG[9544][C-000004c3]: res_rtp_asterisk.c:5725 calc_rxstamp_and_jitter: clientprod1-1720513877.3091: pkt: 5044 Arrival sec: 101.123 Arrival ts: 808983 RX ts: 1926230961 Transit samp: -1925421978 Last transit samp: -1925421978 d: 0 Curr jitter: -1(536870.912) Prev Jitter: 17( 0.002) New Jitter: 16( 0.002)

The logfile would be huge, and will probably exceed the size allocated - which is around 15GB.

Is there a way that I can get rid of the rtp log lines above?

Soelvberg · July 9, 2024, 8:39am

I could naturally change it in the source code, but the customer is running 24/7, and it is very hard to get a slot for updating their Asterisk

jcolp · July 9, 2024, 8:44am

Nothing immediately comes to mind.

Topic		Replies	Views
Reachable and Unreachable Asterisk Support	1	300	January 2, 2016
PJSIP endpoints become unavailable for a few seconds	1	1991	February 10, 2016
Endpoints become unavaiable every few minutes Asterisk SIP	29	2161	January 27, 2023
Endpoint xxxxx is now Unreachable. Then goes to Reachable even through OPTIONS keepalive response is being received Asterisk Endpoints	12	3129	November 11, 2021
A lot of peers become unreachable/reachable at the same time Asterisk Support	0	839	April 3, 2017

Endpoints periodically deemed Unreachable after responding to OPTIONS request

<Registration/ServerURI…> <Auth…> <Status…>

ParameterName : ParameterValue

ParameterName : ParameterValue

ParameterName : ParameterValue

ParameterName : ParameterValue

Transport: <TransportId…> <BindAddress…>

ParameterName : ParameterValue

Related topics