Asterisk freezes (SIP distributor stalls) when res_odbc max_connections defaults to 1 with PostgreSQL realtime

Setup: Asterisk 20 realtime via res_odbc → PostgreSQL (unixODBC). ~20 endpoints. Zero/low traffic. Random unauthorized REGISTER/INVITE attempts.
Symptom: Asterisk randomly stops processing SIP; pjsip/distributor taskprocessor queue fills and never drains.

What I saw:

  • GDB backtrace shows threads waiting on an ODBC/PostgreSQL connection (no timeout).

  • Happens more often when unsolicited REGISTER/INVITE arrive; almost disappears if I apply an ACL on the SIP port.

  • No logs hinting at ODBC pool exhaustion

Cause (my findings):

Fix/workaround:
Set max_connectionsto 20 (increase from default 1)

Open questions:

  • Is default max_connections=1 intentional? It’s not clearly documented.

  • Should res_odbc warn on pool exhaustion and/or support an acquire timeout?

  • Any best practices to avoid SIP distributor stalls from unauthorized traffic triggering realtime lookups?

  • Why asterisk is not able to drain the queue in that condition (max_connection default to 1) and I have to restart completely asterisk to recover?

This change reliably fixes the freeze for me. Curious what others think.

Thanks:

below a snip of GDB stack trace:

Thread 16 (Thread 0x76b41b7b26c0 (LWP 3513642) “asterisk”): #0 0x000076b44211b4fd in __GI___poll (fds=0x76b41b7b0298, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x000076b43f3eb9d4 in ?? () from /lib/x86_64-linux-gnu/libpq.so.5
#2 0x000076b43f3f1920 in PQgetResult () from /lib/x86_64-linux-gnu/libpq.so.5
#3 0x000076b43f3f351e in PQdescribePrepared () from /lib/x86_64-linux-gnu/libpq.so.5
#4 0x000076b436a6989b in ?? () from /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
#5 0x000076b436a2e041 in ?? () from /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
#6 0x000076b436a52f20 in ?? () from /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so #7 0x000076b436a37592 in ?? () from /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
#8 0x000076b436a3c9c3 in ?? () from /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
#9 0x000076b436a56c10 in SQLExecute () from /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so #10 0x000076b43f4397dd in SQLExecute () from /lib/x86_64-linux-gnu/libodbc.so.2
#11 0x000076b4422193da in ast_odbc_prepare_and_execute () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_odbc.so #12 0x000076b4419bbab5 in ?? () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_config_odbc.so
#13 0x0000584c72dcf69c in ast_load_realtime_all_fields () #14 0x0000584c72dcf8a7 in ast_load_realtime_fields ()
#15 0x000076b43c17dc7e in ?? () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_sorcery_realtime.so #16 0x000076b43c17dd96 in ?? () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_sorcery_realtime.so
#17 0x0000584c72d4ad4b in ast_sorcery_retrieve_by_id ()
#18 0x000076b436f774a9 in ?? () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_pjsip_endpoint_identifier_anonymous.so
#19 0x000076b437f16d0a in ast_sip_identify_endpoint () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_pjsip.so
#20 0x000076b437f38765 in ?? () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_pjsip.so
#21 0x000076b442e869cb in pjsip_endpt_process_rx_data () from /lib/x86_64-linux-gnu/libasteriskpj.so.2 #22 0x000076b437f37a80 in ?? () from /usr/lib/x86_64-linux-gnu/asterisk/modules/res_pjsip.so
#23 0x0000584c72d80df5 in ast_taskprocessor_execute () #24 0x0000584c72d931c0 in ?? ()
#25 0x0000584c72d80df5 in ast_taskprocessor_execute ()
#26 0x0000584c72d92c10 in ?? ()
#27 0x0000584c72da430f in ?? ()
#28 0x000076b44209caa4 in start_thread (arg=) at ./nptl/pthread_create.c:447
#29 0x000076b442129c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Yes. The default mirrors what it was for considerable years, before pooling ever existed.

If it warned on pool exhaustion, it could very well become extremely heavy with log output and messages depending on the usage of the system. Supporting an acquire timeout requires UnixODBC providing such a thing, which may or may not exist depending on the underlying connector and may or may not work.

Asterisk is an extremely simple UnixODBC user. From my past experience over considerable years the issues themselves are in UnixODBC or the connector, and our ability to do anything at the Asterisk level is extremely limited. Any improvements/resolution in the area end up being through UnixODBC configuration or tweaking instead. If the connection blocks for some reason, then we’ll block, and have to wait until UnixODBC returns.

Thanks for the feedback!

I see that unixODBC is currently the only way to use Realtime in Asterisk. And unixodbc as we saw is not exactly realtime :sweat_smile:

What do you suggest to keep Asterisk healthy and avoid hitting connection limits (that are blocking)? (there is no way to monitor it)

asterisk01*CLI> odbc show all

ODBC DSN Settings

Name:   asterisk
DSN:    asterisk
Number of active connections: 20 (out of 20)
Logging: Enabled
Number of prepares executed: 9135
Number of queries executed: 9134

If I have max connection to 20, I see that it’s normal that are used (pooled) all 20 connection, so I don’t have any evidence of “connection pressure”.

In the meantime, I created a Checkmk/Nagios script that monitors the “in-queue” values of task processors, which should trend toward zero. If these values increase, there’s a serious problem.

Or do you have a best practice odbc config in order to avoid this types of issues. This part seems completely undocumented and not much information available online…

Thanks

M.

I would determine why you have blocking queries in the first place. If you are directly connecting Asterisk to a database, then that database has to be responsive quickly and can’t block. If it blocks, then you’ll have problems. You can’t 100% fix that in Asterisk. You can try to smooth it over to a degree by doing things like caching, but that just moves the threshold at which things become evident.

Where is the database? Is it remote or local? Is the disk I/O fast? Does it show anything in the logging?

Personally every time I’ve had to investigate this the fundamental issue has been outside of Asterisk. A SAN encountering issues, a network blocking traffic sporadically.

And this is why I’m not a fan of tightly coupling Asterisk with a database. You make it so that if the database has issues, Asterisk has issues.