Pjproject: SSL_ERROR_SSL error then endpoint is unreachable

Hi,

Recently, we are experiencing some endpoints losing connectivity, sometimes multiple in a short period of time with errors relating to SSL appearing in the logs:

 [2020-02-07 12:10:16] WARNING[13283] pjproject:                      SSL SSL_ERROR_SSL (Read): Level: 0 err: <336151528> <SSL routines-ssl3_read_bytes-???> len: 32000 peer: 10.105.6.3:51157
 [2020-02-07 12:10:16] VERBOSE[4930] res_pjsip_registrar.c: Removed contact 'sips:endpoint1@10.105.6.3:51157;transport=TLS' from AOR 'endpoint1' due to shutdown
 [2020-02-07 14:24:25] WARNING[13283] pjproject:                      SSL SSL_ERROR_SSL (Read): Level: 0 err: <336130329> <SSL routines-SSL3_GET_RECORD-decryption failed or bad record mac> len: 32000 peer: 10.96.70.25:34540
 [2020-02-07 14:24:25] VERBOSE[11913] res_pjsip/pjsip_configuration.c: Endpoint endpoint2 is now Unreachable                                                         
 [2020-02-07 14:24:25] VERBOSE[11913] res_pjsip/pjsip_options.c: Contact endpoint2/sips:endpoint2@10.96.70.25:34540;transport=TLS is now Unreachable.  RTT: 0.000 msec
 [2020-02-07 15:18:19] WARNING[13283] pjproject:                      SSL SSL_ERROR_SSL (Read): Level: 0 err: <336130315> <SSL routines-SSL3_GET_RECORD-wrong version number> len: 32000 peer: 10.90.1.212:56583                 
 [2020-02-07 15:18:19] VERBOSE[2800] res_pjsip_registrar.c: Removed contact 'sips:endpoint3@10.90.1.212:56583;transport=TLS' from AOR 'endpoint3' due to shutdown            
 [2020-02-07 15:18:19] VERBOSE[8877] res_pjsip/pjsip_options.c: Contact endpoint3/sips:endpoint3@10.90.1.212:56583;transport=TLS has been deleted                       

We upgraded from Asterisk 13.20.0 to 13.30.0 about a month ago. We have seen similar looking errors in the past but not with the endpoint going unreachable. After the upgrade to 13.30.0 the end of the error has “peer:” and the IP address appended to it.

Could this be an issue with our pjsip.conf? We are using Let’s Encrypt and the certificate is valid. The folder containing the cert files is accessible by Asterisk. Here’s the TLS transport section with our real domain replaced with ‘domain.ca’:

[transport-tls]    
type = transport    
protocol = tls    
bind = 0.0.0.0:5061    
cert_file = /etc/letsencrypt/live/domain.ca/cert.pem    
priv_key_file = /etc/letsencrypt/live/domain.ca/privkey.pem    
method = tlsv1    

I’ve always wondered if we should be using fullchain.pem for the cert_file… but it seemed to be working.

Has anyone experienced this?

Thanks

Edit:

Here’s a sample of the logs when debug was enabled:

[2020-02-02 03:00:10] WARNING[13283] pjproject:                      SSL SSL_ERROR_SSL (Read): Level: 0 err: <336130315> <SSL routines-SSL3_GET_RECORD-wrong version number> len: 32000 peer: 10.90.7.3:59782                        
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip/pjsip_transport_events.c: Reliable transport 'tlss0x7f90c9813e78' state:DISCONNECTED                  
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip/pjsip_transport_events.c: Reliable transport 'tlss0x7f90c9813e78' state:SHUTDOWN                         
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip/pjsip_transport_events.c: running callback 0x7f90bfdf917a(0x7f9060070748) for transport tlss0x7f90c9813e78         
...
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip/pjsip_distributor.c: Could not find matching transaction for Request msg SUBSCRIBE/cseq=783020 (rdata0x7f90c98aec08)         
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip/pjsip_distributor.c: Calculated serializer pjsip/distributor-000000df to use for Request msg SUBSCRIBE/cseq=783020 (rdata0x7f90c98aec08)         
...
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip.c: 0x7f90b0020748: PJSIP tsx response received                                                        
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip.c: 0x7f90b0020748: Cancelling timer                                                                      
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip.c: 0x7f90b0020748: Timer cancelled                                                                  
...
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip.c: 0x7f90b0020748: Callbacks executed                                                                                                            
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip.c: 0x7f90b0020748: wrapper destroyed                                                    
...
[2020-02-02 03:00:10] DEBUG[13283] res_pjsip/pjsip_transport_events.c: Reliable transport 'tlss0x7f90c9813e78' state:DESTROY               
...
[2020-02-02 03:00:10] VERBOSE[22863] res_pjsip/pjsip_options.c: Contact endpoint4/sip:endpoint4@10.90.7.3:59782;transport=TLS is now Unreachable.  RTT: 0.000 msec                                                                     

Looks like the transport stops working?

Edit2:

Here’s the pjproject version:

asterisk*CLI> pjsip show version 
PJPROJECT version currently running against: 2.9

I’m starting to think this is a symptom of other network related issues, not with PJSIP or Asterisk.