Asterisk crash followed by system death on restart

Hi,

I’ve now had this twice, once on a trixbox system and once on an ubuntu build (recent 1.2 code) , both on 100% different hardware.

Effectively the system dies and won’t restart, or rather when restarting, within a few seconds asterisk has completely hung the system. (it pings but barring this it’s totally dead)

Turns out (100% degree of certainty) that this is caused by corruption in “astdb”. Removing astdb, restarting and then recreating the missing sip entries seems to be a 100% fix.

This is twice in 4 months … the fact that astdb corrupts and when it does yields the system unbootable (need to boot single user to recover) seems to be an absolutely critical flaw, certainly from a user’s perspective.

Both systems were running TE110P cards with 15 incoming lines.
Both systems were 100% diverse hardware down to the TE110.

Please can someone tell me whether this is a known problem and if there is a transparent fix ?

Thanks,
Gareth

i’ve had my doubts about ASTDB in the past, but now i tend to keep all the stuff i would usually keep in there in a MySQL db … sure it costs more, but it’s more flexible and for the size of installs i do it’s fine.

how big is ASTDB getting ? were you running FreePBX on both installs or just the TB one ?

-rw-rw-r-- 1 asterisk asterisk 49152 2007-01-30 10:53 astdb
-rw-rw-r-- 1 asterisk asterisk 57344 2007-01-30 08:04 astdb.old

FreePBX on both.

I have res_mysql.so installed and res_mysql.conf loads, but I’m not overly clear on the setup.

Specifically, what tables do I need to create / what do I put in extconfig.conf to eradicate astdb ?

Ok, this has become a repeating problem. Every now and again astdb corrupts and the system becomes unbootable.

The system then need sto be booted in recovery mode with asterisk not started, astdb removed, asterisk started and critical entries (freepbx) in astdb re-created.

Surely there is a fix for this???!!!

Hi

What are you using astdb for ?

and you need to look at why its being corupted.

But to get over it why dont you make a copy nightly of it then use that to replace the corrupt one with.

Personally I have never had astdb corrupt on me and I use it a lot in the dialplans I have.

Ian

Ok, just to put this in context;

When it “goes”, it takes the system out - which is 40 people. Restoring astdb is relatively trivial compared to the time involved in getting someone to boot the system into single user mode and do something to the corrupt DB.

Bear in mind that starting asterisk on the corrupt DB hangs the entire system, so if you autoboot asterisk, you don’t get the chance to remove astdb following a crash (!)

This is a standard FreePBX installation. I’m not using Astdb for anything - this is pure FreePBX + Asterisk … (so it’s FreePBX doing the using…)

I’m not in a position to debug this combination with regards to finding out what is corrupting it, as we only find out about the corruption when asterisk is restarted, which is infrequently (!)

Hi

If its getting corrupted then I would assume it will be unreadable therefore a simple cronjob to check it daily should alert you to any issues.

Also what do you see in the full log?

what entries are

[quote]missing sip entries[/quote] these are dynamic AFAIK unless freepbx is writing them.

Have you got your sip.conf setup correctly IE limiting the number of retries etc

Ian

Hi,

Your idea to check whether the DB is readable via CRON is a good one - I may well try to implement that as a first step.

FreePBX writes an entry into asdb for each extension, without these entries asterisk does not know that any of the extensions are live and all calls go straight to voicemail.

Nothing appears in the full log when asterisk fails to load, it just dies randomly when loading a module - which is why it took so long to nail it down to the astdb.

Essentially the first process to try to access the astdb hangs … badly …

Which “system” dies? Asterisk or Linux? You said no log entry; any console (Linux and Asterisk) message when it dies? If Asterisk dies for some reason - and it was just writing to AstDB - or simply holding uncommitted data in buffer, AstDB is likely to corrupt. Same if Linux dies at unopportune moments but troubleshooting route would differ. (Of course it is possible that AstDB corruption caused Asterisk to “die”.) If you can determine ways to reproduce this with some certainty, you should file a bug.

To eradicate such a problem, it’s very important to determine whether the application or the OS caused the corruption. I’d suggest that you google about likely causes for Berkeley DB to corrupt. You may get good help from the *nix world. Remember: checking readability of a corrupt AstDB may also cause the sytem to hang.

Both Linux and asterisk!

Here’s the fun thing - I’ve replicated this on two completely diverse sets of hardware and software. (i.e. different digium ISDN-30 cards, different versions / installs of asterisk and freepbx) … Indeed it seems the problem will reproduce itself every few weeks.

If I reboot in single user mode following a corruption, then use /etc/init.d/asterisk start, the system will lock up while asterisk is loading. Effectively something is reading astdb while accessing a device driver and it really does not like it.

If I su asterisk and do asterisk -cvvvvvv, it gets through to the ISDN-30 module and asterisk locks up - BUT not the system.

following rm /var/lib/asterisk/astdb, either startup method will work fine.
(I found this solution by accident!)

Can’t help thinking that Asterisk have gone down the easy licensing route at the expense of stability. We still have an absolutely critical issue we can’t even begin to fix. Roll on a stable version … :cry:

Do you still have this issue ?

What happens when the DB is moved to a lab system and that is restarted.
There are apps avalible to read the DB or even copy it and open it with VI may show something. If its when it gets to loading zaptel then there may be an issue in zapata.conf, and it doesnt need much of an error in that for * not to start. Also zapata config can be written by freepbx which would be rewritten on a reload.

Ian

And yes we use FreePBX …

Very lately, the system has taken to just freezing, but then starting ok after a power down with no “apparent” database damage …

I’m afraid I simply have no confidence in DB1 anymore and the fact that there are tools (which incidentally I’ve failed to find) which can read DB1’s and make sense of them, really doesn’t do much for me.

Not least as there are lots of equivalents out there with decent tools which never experience DB corruption. [MySQL, SQLite etc]

Let alone DB4 is now available … embedding someone else’s blackbox code (yeah, I know it’s OS, but you try debugging it or validating the data) into your complex application just seems a little … well … maybe there are better options.

Aaaarhrhrhrhrhrh.

Hi

What are you storing in the DB that is causing these problems? All my systems use the Astbd exstensivly and to date have never had DB corruption. these include systems of 100+ users and conference servers handing many thousands of calls per month, with every call writing to the DB and reading from the DB.

I am supprised nothing shows up in the full log just before a lockup, is it worth croning a database show to a file then diffing that against the last one. It might show whats happening before a lockup if it is the DB.

[quote]Not least as there are lots of equivalents out there with decent tools which never experience DB corruption. [MySQL, SQLite etc] [/quote] Personally Ive seen corruption in all these as well as Oracle paradox and MS SQL. But it depends on whats happening to the system, lets not forget a database cant corrupt its self, something has to happen to it or the file system for it corrupt.

Ian