Endpoints periodically deemed Unreachable after responding to OPTIONS request

I have a customer that has two siptrunks connections - both using TLS.
Frequently, but not always, Asterisk marks the endpoints unreachable immediately after resceiving their response to an OPTIONS request sent by us.
The qualify_frequency is set to 60, and when we 1 minute later send a new OPTIONS request, Asterisk now believe that the endpoint is reachable again.when we receive their response.
I see no pattern for when this happens, but typically happens 5-6 times per day and other days not at all. It happens both when there is high load on the system, and early moring when the (almost) only activity is OPTIONS requests and responses.

I have attached 3 examples from today, and the transport and endpoint configuration that we have.

I deleted your attachments because they had ip addresses in them but try setting the qualify_frequency to something other than exactly 60 seconds. Try 55 seconds.

Here is a redacted example of an incident.

Incident1_redacted.txt (2.6 KB)

Will try with 55, and let you know

Unfortunately setting the qualify frequency to 55 seconds did not change anything.
The SipTrunks are still wrongly reported as unreachable 2-3 times a day.

It “only” lasts for a minute, but this is a very busy customer, and calls are getting lost because the agents can’t answer the calls. When they fail to answer, then our system marks the agent as unreachable, and within seconds all agents become unreachable.

Please note that we have two siptrunks configured for this tenant.
One for international numbers and another for national ones.
Could Asterisk get confused and marking the wrong endpoint as unrechable when it receives a response to OPTIONS?
And why will Asterisk ever mark anything as unrechable when it receives a reply to OPTIONS?

Do you have another Idea about where the error lies?
And how to avoid it?

From a SIP protocol perspective if a response is received after the request transaction goes away, then it wouldn’t be marked as reachable because it has no idea about it anymore. Is that the cause? Unknown, but just answering that question.

A debug level log would be needed[1] as that shows the logic of what is happening inside.

It’s also unlikely to get confused.

[1] Collecting Debug Information - Asterisk Documentation

Oh, and you haven’t stated what version of Asterisk is in use.

Sorry that I forgot the version
We are using v. 20.6.0

I have set the debug level to 1, and will let you know whether this reveals anything

The response is received within trhe same second that we sent our OPTIONS request, so it is not a timeout problem:

[Jul 8 12:06:51] VERBOSE[6870] res_pjsip_logger.c: <— Transmitting SIP request (498 bytes) to TLS:XX.233.237.XX:5063 —>
OPTIONS sip:sbc1.xxxxxxxxxx.com:5063 SIP/2.0
Via: SIP/2.0/TLS XX.229.231.XX:5067;rport;branch=z9hG4bKPj32dd5d61-5645-4f08-8d4c-c4ac339e28c2;alias
From: sip:3_sbc1xxxxxxxxxxcom@10.0.0.4;tag=4706a3ba-7f33-42f7-ac16-f92d3066226a
To: sip:sbc1.xxxxxxxxxx.com
Contact: sip:zylinc@XX.229.231.XX:5067;transport=TLS
Call-ID: 6518f3ac-40f9-4d67-be84-99d560a4fb71
CSeq: 40394 OPTIONS
Route: sip:52.233.237.78:5063;transport=tls;lr
Max-Forwards: 70
User-Agent: ZySwitch
Content-Length: 0

[Jul 8 12:06:51] VERBOSE[57] res_pjsip_logger.c: <— Received SIP response (462 bytes) from TLS:XX.233.237.XX:5063 —>
SIP/2.0 200 OK
Via: SIP/2.0/TLS XX.229.231.XX:5067;alias;received=XX.229.231.XX;rport=33335;branch=z9hG4bKPj32dd5d61-5645-4f08-8d4c-c4ac339e28c2
From: sip:3_sbc1xxxxxxxxxxcom@10.0.0.4;tag=4706a3ba-7f33-42f7-ac16-f92d3066226a
To: sip:sbc1.xxxxxxxxxx.com;tag=1c1624881152
Call-ID: 6518f3ac-40f9-4d67-be84-99d560a4fb71
CSeq: 40394 OPTIONS
Contact: sip:sbc1.xxxxxxxxxx.com:5063;transport=tls
Server: Mediant VE SBC/v.7.20A.259.221
Content-Length: 0

[Jul 8 12:06:51] VERBOSE[575] res_pjsip/pjsip_configuration.c: Endpoint 3_sbc1xxxxxxxxxxcom is now Unreachable
[Jul 8 12:06:51] VERBOSE[575] res_pjsip/pjsip_options.c: Contact 3_sbc1xxxxxxxxxxcom/sip:sbc1.xxxxxxxxxx.com:5063 is now Unreachable. RTT: 0.000 msec

Please follow the instructions on the documentation site for collecting information.

The output of pjsip show aor <trunk aor> and pjsip show settings would be helpful.

Are REGISTER requests being sent on this trunk? If so, the output of pjsip show registration <trunk registration> would also help.

I tried setting the debug level to 5 as written, but anything higher than 2 results in a massive log, that I would never be able to send to you.
Running it just for a 30-40 seconds,resulted in a full log of 6MB.
I have now set it to 2, and hope that this is information enough.

There is actually a third SipTrunk that we do register towards, however never see the issue for this.
We do not register the 2 siptrunks that peridically becomes unreachable.
We and the provider has hardcoded the remote contact to use.
Please note the these SipTrunks are using TLS - don’t know whether this can explain something…

pjsip show registrations

<Registration/ServerURI…> <Auth…> <Status…>

3_Unitel-32422666-32422669/sip:sip3.uni-tel.dk 3_Unitel-32422666-32422669 Registered (exp. 3007s)


pjsip show settings:

Global Settings:

ParameterName : ParameterValue

all_codecs_on_empty_reinvite : false
allow_sending_180_after_183 : false
contact_expiration_check_interval : 30
debug : yes
default_from_user : asterisk
default_outbound_endpoint : default_outbound_endpoint
default_realm : asterisk
default_voicemail_extension :
disable_multi_domain : false
endpoint_identifier_order : ip,username,anonymous
ignore_uri_user_options : false
keep_alive_interval : 90
max_forwards : 70
max_initial_qualify_time : 0
mwi_disable_initial_unsolicited : false
mwi_tps_queue_high : 500
mwi_tps_queue_low : -1
norefersub : yes
regcontext :
send_contact_status_on_update_registration : no
taskprocessor_overload_trigger : global
unidentified_request_count : 5
unidentified_request_period : 5
unidentified_request_prune_interval : 30
use_callerid_contact : no
user_agent : ZySwitch

System Settings:

ParameterName : ParameterValue

accept_multiple_sdp_answers : false
compact_headers : false
disable_rport : false
disable_tcp_switch : true
follow_early_media_fork : true
threadpool_auto_increment : 5
threadpool_idle_timeout : 60
threadpool_initial_size : 0
threadpool_max_size : 50
timer_b : 32000
timer_t1 : 500

The AORs that you requested:

pjsip show aor 3_sbc1bestsellercom

  Aor:  <Aor..............................................>  <MaxContact>
Contact:  <Aor/ContactUri............................> <Hash....> <Status> <RTT(ms)..>

==========================================================================================

  Aor:  3_sbc1bestsellercom                                  0
Contact:  3_sbc1bestsellercom/sip:sbc1.bestseller.com: 785685b8cd Avail        15.368

ParameterName : ParameterValue

authenticate_qualify : false
contact : sip:sbc1.bestseller.com:5063
default_expiration : 3600
mailboxes :
max_contacts : 0
maximum_expiration : 7200
minimum_expiration : 60
outbound_proxy :
qualify_frequency : 60
qualify_timeout : 3.000000
remove_existing : false
remove_unavailable : false
support_path : false
voicemail_extension :


clientprod1*CLI> pjsip show aor 3_sbc2bestsellercom

  Aor:  <Aor..............................................>  <MaxContact>
Contact:  <Aor/ContactUri............................> <Hash....> <Status> <RTT(ms)..>

==========================================================================================

  Aor:  3_sbc2bestsellercom                                  0
Contact:  3_sbc2bestsellercom/sip:sbc2.bestseller.com: 4fd6efc620 Avail         7.874

ParameterName : ParameterValue

authenticate_qualify : false
contact : sip:sbc2.bestseller.com:5063
default_expiration : 3600
mailboxes :
max_contacts : 0
maximum_expiration : 7200
minimum_expiration : 60
outbound_proxy :
qualify_frequency : 60
qualify_timeout : 3.000000
remove_existing : false
remove_unavailable : false
support_path : false
voicemail_extension :

And the transport configuration for TLS in case that provides anything:

pjsip show transport transport-tls

Transport: <TransportId…> <BindAddress…>

Transport: transport-tls tls 3 96 0.0.0.0:5067

ParameterName : ParameterValue

allow_reload : false
allow_wildcard_certs : No
async_operations : 1
bind : 0.0.0.0:5067
ca_list_file :
ca_list_path :
cert_file : /var/lib/asterisk/certs/novus.crt
cipher :
cos : 3
domain :
external_media_address : XX.229.231.XX
external_signaling_address : XX.229.231.XX
external_signaling_port : 5067
local_net : 10.0.0.0/255.0.0.0
local_net : 172.16.0.0/255.240.0.0
local_net : 172.16.0.0/255.240.0.0
local_net : 172.16.0.0/255.240.0.0
local_net : 192.168.0.0/255.255.0.0
method : tlsv1_2
password :
priv_key_file : /var/lib/asterisk/certs/novus.key
protocol : tls
require_client_cert : No
symmetric_transport : false
tos : 96
verify_client : No
verify_server : No
websocket_write_timeout : 100

Just noticed that the same localnet was shown 3 times when I requested the transport settings from the CLI.
This is however not what is writted in the pjsip.conf.
I don’t know whether this could be related to the unreachable problem, but just to be sure, I have pasted the entire transport configuratin as writted in pjsip.conf below:

; *********************************************************************************
; PJSIP_TRANSPORTS.CONF
; *********************************************************************************

[transport-udp]
type=transport
protocol=udp
bind=0.0.0.0:5090
tos=cs3
cos=3
allow_reload=false
external_media_address=XX.97.242.XXX
external_signaling_address=XX.97.242.XXX
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5090

[transport-tcp]
type=transport
protocol=tcp
bind=0.0.0.0:5090
tos=cs3
cos=3
allow_reload=false
external_media_address=XX.97.242.XXX
external_signaling_address=XX.97.242.XXX
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5090

[transport-tls]
type=transport
protocol=tls
bind=0.0.0.0:5091
tos=cs3
cos=3
allow_reload=false
method=tlsv1_2
external_media_address=XX.97.242.XXX
external_signaling_address=XX.97.242.XXX
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5091
cert_file=/var/lib/asterisk/certs/novus.crt
priv_key_file=/var/lib/asterisk/certs/novus.key

[transport-wss]
type=transport
protocol=wss
bind=0.0.0.0
; All other transport parameters are ignored for wss transports

[anonymous]
type=endpoint
context=default
tos_audio=ef
cos_audio=5
auth=
disallow=all
allow=alaw
allow=ulaw
allow=g722
allow=opus
allow=alaw
allow=ulaw
allow=gsm
force_rport=yes
100rel=no
connected_line_method=invite
direct_media_method=invite
direct_media=no
trust_id_inbound=no
trust_id_outbound=no
media_encryption=sdes
media_encryption_optimistic=true
timers=yes
timers_min_se=90
timers_sess_expires=1800

OPTIONS sip:sbc1.xxxxxxxxxx.com:5063 SIP/2.
bind : 0.0.0.0:5067
bind=0.0.0.0:5091
bind=0.0.0.0:5090
external_signaling_port=5091

Thru your entire notes I see those outputs which I believe creates a conflict, would it not ??

You are right. I inadvertently gave you the transports for a wrong Tenant.
This tenant uses the ports 5066 and 5067 for unencrypted and encrypted respectively.
All tenants is however set up in same way - only the poprts differ, which is how we seperate tenants from each other.
The port 5063 is the port of the SBCs that we communicate with.
I have attached the correct transport configuration.

attached is also the full log with debug level set to 2. I have however removed the start and the end of the log to be able to attach it. The entire log is 42MB and seemingly you don’t accept zip attachments. If you need the entitre log, then just say so.

There are two incidents on the log where the two SBCs were deemed unreachable:

[Jul 9 07:18:45] SBC2 Unreachable
[Jul 9 07:19:45] SBC2 Reachable
[Jul 9 07:29:14] SBC1 Unreachable
[Jul 9 07:30:14] SBC1 Reachable

Just search for "bestsellercom is now "

You will see a lot of other “is now Unreachable” log messages, but that is merely webRtc softphones logging off which is working like a charm, and is not part of the problem.

debug_log_103184_redacted.txt (3.5 MB)
TransportConfiguration.txt (1.9 KB)

That debug level didn’t actually have the messages from the OPTIONS and qualify handling. That appears to be at level 3.

If I set it to 3, thenthe log will be filled with this

[Jul 9 08:33:07] DEBUG[9544][C-000004c3]: res_rtp_asterisk.c:5725 calc_rxstamp_and_jitter: clientprod1-1720513877.3091: pkt: 5044 Arrival sec: 101.123 Arrival ts: 808983 RX ts: 1926230961 Transit samp: -1925421978 Last transit samp: -1925421978 d: 0 Curr jitter: -1(536870.912) Prev Jitter: 17( 0.002) New Jitter: 16( 0.002)

The logfile would be huge, and will probably exceed the size allocated - which is around 15GB.

Is there a way that I can get rid of the rtp log lines above?

I could naturally change it in the source code, but the customer is running 24/7, and it is very hard to get a slot for updating their Asterisk

Nothing immediately comes to mind.