Transport won't load on upgrade to 18.12.0

Hi,

I attempted an upgrade to 18.12.0 this morning from 18.11.2, and the new version does not work at all. I’ve begun digging into why, and it seems like it’s probably related to updating pjproject to 2.12, but I’m not sure - I’m posting here to hopefully gain a little more insight so I can create a useful bug report.

First off, I’m on a platform that won’t let you bind IPv4 addresses on an IPv6 socket, and I suspect this is the core of my issue, but I haven’t yet found the bad bind in code. Another quirk of this platform is that AF_NETLINK sockets are all but completely unsupported, in case it’s important.

The first 3 errors when asterisk is started, from the logs:

[May 12 08:31:31] ERROR[888860] res_pjsip/config_system.c: Could not create DNS resolver(120022), resorting to system resolution
[May 12 08:31:31] ERROR[15627] res_pjsip/config_transport.c: Transport 'transport-udp' could not be started: Address already in use
[May 12 08:31:31] ERROR[15627] res_sorcery_config.c: Could not create an object of type 'transport' with id 'transport-udp' from configuration file 'pjsip.conf'

I have a bunch more errors down the line related to not being able to load transport-udp, of course. None of these three show up in logs for the previous versions. Digging in, it seems the 120022 refers to EAFNOTSUP - which is what makes me think there’s one of those improperly-IPv6 sockets in play. “Address already in use” doesn’t make sense, since a sockstat shows no open network sockets from asterisk, and no sockets on port 5060 at all (where asterisk should bind).

The transport section of pjsip.conf:

[transport-udp]
type=transport
protocol=udp
bind=192.168.3.7
local_net=192.168.0.0/16
local_net=172.20.0.0/15
external_media_address=64.XX.YY.ZZ
external_signaling_address=64.XX.YY.ZZ

(of course my actual WAN ip is in there, but clipping for privacy).

Any suggestions on where to start digging with this one? Otherwise, I’ll start digging further into the code to try to find this problem.

Thanks in advance!

SIP transports are completely implemented within PJSIP and are each in their own file[1].

[1] pjproject/sip_transport_udp.c at master · pjsip/pjproject · GitHub

I see - and sure enough, reverting the patch in commit e5e02f783d66c6ea10001934c9da20b11fa2effc ("[PATCH] pjproject: Update bundled to 2.12 release.") and keeping everything else in 18.12.0 works perfectly for me. Haven’t found the offending changes in 2.12 yet, but this confirms to me that it’s not an asterisk-specific issue, and I need to keep digging into pjproject instead - thanks for the pointer.

I’m sure no one here cares, but I finally found the problem (and I’m not the type to figure something out and then not document the answer with my question) - pjproject PR #2604 uses EPOLLEXCLUSIVE for ioqueues, which my platform does not support, but does define. Since they only check whether it’s defined, it builds with epoll-exclusive mode, but this does not function. Manually changing the defines at the top of pjlib/src/pj/ioqueue_epoll.c to make sure USE_EPOLLEXCLUSIVE and USE_EPOLLONESHOT are both 0 builds to a working configuration.

Alternatively, deleting the last 3 lines from (Asterisk’s) third-party/pjproject/Makefile.rules (which set --enable-epoll in pjproject’s configure) creates a build that uses select instead - I haven’t tested it in its full configuration for any unexpected issues, but I do see that it seems to work properly in testing. I wish this were a straightforward configure option for Asterisk, but even just a quick 3 line delete is doable, if I can qualify that version in my environment.

I’ll probably go ahead and do a bug report against PJProject, hopefully there can be a quick patch to actually check if epoll-exclusive functions before it’s configured.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.