DB Connection Issues

PitzKey · March 9, 2023, 7:55am

Hello,

We are using Asterisk 18.12.1-1. We are having an issue that when there is “congestion” to MySQL, Asterisk does not process any new calls until the congestion clears.

For example, if we create 2k channels (1k calls) and then do a channel request hangup all, it will still take a while [after the channels are destroyed] for Asterisk to accept new calls. We can see that Asterisk is still trying to write to the CDR DB.

I am not sure if this is a bug specific to this version of Asterisk or there is anything else here.

FWIW, I started looking at the ODBC connection and I see that Asterisk will only uses 1 connection --no matter how many calls there are on the system.

[root@chart ~]# cat /etc/odbc.ini
[MySQL-asterisk]
Description  = Asterisk CDR
Driver       = MariaDB
Database     = asterisk
User         = asterisk
Password     = asterisk
Server       = localhost
Socket       = /var/lib/mysql/mysql.sock

[MySQL-chartpbx]
Description  = Asterisk CDR
Driver       = MariaDB
Database     = chartdata
User         = <user>
Password     = <pass>
Server       = localhost
Socket       = /var/lib/mysql/mysql.sock

[root@chart ~]# cat /etc/asterisk/res_odbc.conf
[asterisk]
enabled => yes
dsn => MySQL-asterisk
max_connections => 5
username => asterisk
password => asterisk
pre-connect => yes

[chartpbx]
enabled => yes
dsn => MySQL-chartpbx
max_connections => 5
username => vitalpbx
password => vitalpbx
pre-connect => yes

[root@chart ~]# cat /etc/asterisk/cdr_adaptive_odbc.conf
[asterisk]
connection=asterisk
loguniqueid=yes
table=cdr
usegmtime=yes
alias start => calldate

[root@chart ~]# asterisk -x"odbc show all"

ODBC DSN Settings
-----------------

  Name:   asterisk
  DSN:    MySQL-asterisk
    Number of active connections: 1 (out of 5)
    Logging: Disabled

  Name:   chartpbx
  DSN:    MySQL-chartpbx
    Number of active connections: 1 (out of 5)
    Logging: Disabled

[root@chart ~]# asterisk -x"cdr show status"

Call Detail Record (CDR) settings
----------------------------------
  Logging:                    Enabled
  Mode:                       Simple
  Log calls by default:       Yes
  Log unanswered calls:       Yes
  Log congestion:             Yes

* Registered Backends
  -------------------
    Adaptive ODBC
    csv
    cdr-custom

I reloaded Asterisk and rebooted the server, it still seems to only keep one active connection open at all times.

On another system running Asterisk 18.15.1, I do see that the connections increase occasionally, but I can’t spot really any differences in the configuration:

  Name:   asterisk
  DSN:    MySQL-asterisk
    Number of active connections: 2 (out of 5)
    Logging: Disabled

What am I missing?

Thanks

jcolp · March 9, 2023, 9:49am

The CDR logic is single threaded, so it will only ever use a single connection. CDR batching also exists for this reason[1] specifically to try to reduce database impact on CDRs and calls.

[1] asterisk/cdr.conf.sample at 20 · asterisk/asterisk · GitHub

PitzKey · March 9, 2023, 10:25am

Interesting. However, on the system where I do see the connections increasing is right after doing a channel request hangup all

jcolp · March 9, 2023, 10:28am

Are there any other uses of ODBC, such as func_odbc that would be executing? Otherwise there may be cases where it is possible that I’m not aware of, but fundamentally the CDR core is single threaded.

PitzKey · March 9, 2023, 10:51am

No. For testing purposes, I booted a stock FreePBX.

No calls on the system:

Generated 130 calls and then killed them with channel request hangup all

And it seems like the connections decreases only after a core reload

jcolp · March 9, 2023, 10:55am

Not decreasing once a connection exists is normal. It’s a connection pool and will keep connections in the pool after use for reuse.

As for why there is 2, I don’t know for this specific scenario. A second one was requested by something.

jcolp · March 9, 2023, 10:56am

Oh, is CEL going to the database? That would do it. Basically if a connection is in use at the time of the request a second connection would be created. If instead the connection is not in use, then the same one would get used by both CEL and CDR over and over.

PitzKey · March 9, 2023, 11:09am

Indeed. After disabling CEL, the connection stays at 1.

So, I guess, under a heavy load, if CDR is not in batch mode, it will cause asterisk to “hang” until the queue is clear. Is my understanding correct?

If so, was it always like this? Has something changed in the last year or so?
What would it take to get the CDR logic to be multi threaded?

david551 · March 9, 2023, 11:12am

In that case, you probably are using a database for other purposes. FreePBX introduces 100s of lines of diaplan into every call, and, I think, some of those do database lookups. As a general principle, this forum is not suitable for issues due to FreePBX dialplan.

Dead stopping 2,000 channels all at the same time is not a normal thing to do.

Before I retired, one of the things the company I worked for did was call logging reporting software. Their decision was to use plain text files to do the initial capture and import into a database offline. Their priority was getting the calls written quickly and reliably. I sometimes wonder if people logging direct to databases are too enamoured with databases, although FreePBX makes that difficult to avoid.

jcolp · March 9, 2023, 11:18am

If the database blocks Asterisk be it for a short time or a long time, then other things can and will be blocked. This applies to CDR, CEL, amongst other things.

CDR functionality has not been substantially changed since Asterisk 12. Nothing has been done since then which would alter this behavior. Before Asterisk 12 it would still happen since database pooling wasn’t built in and relied on UnixODBC, which was iffy at best.

Making the CDR logic multithreaded would likely be a major rearchitecture of CDRs, with a risk of pushing the problem further into the core, or introducing out-of-order CDR handling which would end poorly. It’s a project.

PitzKey · March 9, 2023, 11:28am

Got it. It seems like the CDR logic needs some enhancements, for example: ASTERISK-30341: cdr_adaptive_odbc: fails to write CDRs after database reload

Indeed. It was a POC.

I guess the most efficient way would be an AMI app that listens to events and stores CDR in a DB.

jcolp · March 9, 2023, 11:36am

Aye, I’m sure there’s improvements that could be done for the database part of it.

The most efficient way is to just write the records out somewhere and have a post-processing solution that then puts them in the database. If that fails (it hangs, database is down, etc) then you still have records and can process them afterwards.

My recommendation for everything is to not directly connect to a database. If you do then you double your exposure to problems and reduce resiliency.

ldo · March 9, 2023, 9:40pm

As soon as I see “ODBC”, I twitch a bit. As I understand it, that is a suboptimal way to access a database–far better to have a direct DBMS-specific driver.

But I don’t think Asterisk offers such an option. So perhaps the other way is to pass the call details to be recorded off to some process that can be written in a language that does provide such native DBMS access. Like for example an AGI written in Python or the like? The communication with Asterisk can be asynchronous, to avoid blocking.

sedwards · March 9, 2023, 10:55pm

In a system I wrote a long time ago (2003), I needed to write a CDR for every significant step (context) in the call for billing purposes. Each product was a separate context so I needed to write a row to MySQL to record how much time was spent in each product. I wrote an AGI in C and never had any issues with it. This system would process 20 thousand calls a day.

In another system I wrote (2009), I accumulate the steps as channel variables and then in the hangup context, launch an AGI in C to write all the steps out at once.

Many ways to skin a cat.

PitzKey · March 14, 2023, 4:25pm

Revisiting this…

I enabled batch mode, I set the buffer to 1000 and maximum batch time to 100.

It seems like it reduced the load a lot.

However, I did notice that after killing 1k channels, it take like 30 seconds for Asterisk to write all calls to the buffer and while that is happening, Asterisk does not respond to any messages. core show channels shows 0. But Asterisk is not responding to INVITEs.

How can I troubleshoot this?
What would be the ideal buffer setting. I am thinking of the size should be 300 and timeout 100.

Any suggestions appreciated

Thanks

jcolp · March 14, 2023, 4:46pm

What’s the status of the taskprocessors? (core show taskprocessors)?

Ideal is relative to the environment, expected load, etc. There is no magical answer.

PitzKey · March 15, 2023, 1:26pm

Well, it seems like that is it.
(There are way more in the list, this seems like the only one that went over)

Processor                                                               Processed   In Queue  Max Depth  Low water High water
stasis/m:cdr:aggregator-00000005                                          1686453      74075     157329       4500       5000

And then again

Processor                                                               Processed   In Queue  Max Depth  Low water High water
stasis/m:cdr:aggregator-00000005                                          1694131      66401     157329       4500       5000

Is there a way we can increase the limit?

jcolp · March 15, 2023, 1:30pm

Unless it’s configurable in cdr.conf then no. You can set PJSIP to only care about PJSIP taskprocessors though[1].

[1] asterisk/pjsip.conf.sample at 20 · asterisk/asterisk · GitHub

PitzKey · March 15, 2023, 2:02pm

Got it. I don’t see any specific settings in cdr.conf that would allow me to change that.

I changed the pjsip task processor to care only about PJSIP, and rebooted the server

[root@yplab ~]# asterisk -x"pjsip show settings" | grep task
 taskprocessor_overload_trigger             : pjsip_only

However, when looking at the asterisk logs, I see:

;When channels were created:
[2023-03-15 09:55:14] WARNING[7732][C-0000069d] taskprocessor.c: The 'stasis/m:cdr:aggregator-00000005' task processor queue reached 5000 scheduled tasks again.
[2023-03-15 09:55:18] WARNING[7883][C-000006d7] taskprocessor.c: The 'stasis/m:cdr:aggregator-00000005' task processor queue reached 5000 scheduled tasks again.

;When channels were destroyed:
[2023-03-15 09:57:08] WARNING[7413][C-0000061e] taskprocessor.c: The 'stasis/m:manager:core-00000006' task processor queue reached 3000 scheduled tasks again.
[2023-03-15 09:57:09] WARNING[7312][C-000005f7] taskprocessor.c: The 'stasis/m:cdr:aggregator-00000005' task processor queue reached 5000 scheduled tasks again.
[2023-03-15 09:57:09] WARNING[6029][C-00000413] taskprocessor.c: The 'stasis/m:manager:core-00000006' task processor queue reached 3000 scheduled tasks again.
[2023-03-15 09:57:09] WARNING[7813][C-000006bd] taskprocessor.c: The 'stasis/m:manager:core-00000006' task processor queue reached 3000 scheduled tasks again.
[2023-03-15 09:57:09] WARNING[7447][C-0000062b] taskprocessor.c: The 'stasis/m:manager:core-00000006' task processor queue reached 3000 scheduled tasks again.

Is statis under the PJSIP taskprocessors that it still cares?
Also, interestingly, I only have issues that Asterisk doesn’t accept new messages after the channels are destroyed. No issues when they are created.

Thank you for your time, Josh!

jcolp · March 15, 2023, 2:03pm

Nope. It doesn’t care about those.