After about 20 hours system stops responding

bullet117 · February 19, 2009, 11:20pm

My PBXIAF box stops responding after about 20 hours when it hits a Read command or a SWIFT command and the CLI just sits there on the Read Line it looks like this:

– Executing [s@MainMenu:5] Read(“SIP/238-b3f0b250”, “UseDTMF|WouldYouLikeSpeechOrDTMF|1||2|5”) in new stack
– Accepting a maximum of 1 digits.
– <SIP/238-b3f0b250> Playing ‘WouldYouLikeSpeechOrDTMF’ (language ‘en’)

When I try an amportal restart I get:

SETTING FILE PERMISSIONS
Permissions OK

STARTING ASTERISK
Asterisk ended with exit status 1
Asterisk died with code 1.
cat: /var/run/asterisk.pid: No such file or directory
Automatically restarting Asterisk.
mpg123: no process killed
Asterisk ended with exit status 1
Asterisk died with code 1.
cat: /var/run/asterisk.pid: No such file or directory
Automatically restarting Asterisk.
mpg123: no process killed

Asterisk could not start!
Use ‘tail /var/log/asterisk/full’ to find out why.

The tail of the full file looks like this:

You only have to compile Zaptel support into Asterisk if you need it. One option is to recompile without Zaptel support.
You only have to load Zaptel drivers if you want to take advantage of Zaptel services. One option is to unload zaptel modules if you don’t need them.
If you need Zaptel services, you must correctly configure Zaptel.
[Feb 19 15:36:44] VERBOSE[20760] logger.c: Asterisk Event Logger Started /var/log/asterisk/event_log
[Feb 19 15:36:45] VERBOSE[20765] logger.c: – Remote UNIX connection
[Feb 19 15:36:45] VERBOSE[20773] logger.c: – Remote UNIX connection disconnected
[Feb 19 15:36:45] ERROR[20760] asterisk.c: Asterisk has detected a problem with your Zaptel configuration and will shutdown for your protection. You have options:
You only have to compile Zaptel support into Asterisk if you need it. One option is to recompile without Zaptel support.
You only have to load Zaptel drivers if you want to take advantage of Zaptel services. One option is to unload zaptel modules if you don’t need them.
If you need Zaptel services, you must correctly configure Zaptel.

I have a Digium T1 card in (TE122P) in the box, I configured it (so I thought) and did some testing with the card I pluged in the T1 (this is a partial T1 8 lines 1 for data) line and worked.
I have had the cable unplugged while configuring the PBXIAF, because I need the T1 line on the Trixbox until we switch over to the PBXIAF box.

The zaptel.conf looks like this:

Autogenerated by /usr/local/sbin/genzaptelconf – do not hand edit

Zaptel Configuration File

This file is parsed by the Zaptel Configurator, ztcfg

It must be in the module loading order

Span 1: WCT1/0 “Wildcard TE122 Card 0” (MASTER) B8ZS/ESF RED

span=1,1,0,esf,b8zs

termtype: te

bchan=1-23
dchan=24

Span 2: WCTDM/0 “Wildcard TDM2400P Board 1”

fxoks=25
fxoks=26
fxoks=27
fxoks=28
fxoks=29
fxoks=30
fxoks=31
fxoks=32

channel 33, WCTDM, no module.

channel 34, WCTDM, no module.

channel 35, WCTDM, no module.

channel 36, WCTDM, no module.

channel 37, WCTDM, no module.

channel 38, WCTDM, no module.

channel 39, WCTDM, no module.

channel 40, WCTDM, no module.

channel 41, WCTDM, no module.

channel 42, WCTDM, no module.

channel 43, WCTDM, no module.

channel 44, WCTDM, no module.

channel 45, WCTDM, no module.

channel 46, WCTDM, no module.

channel 47, WCTDM, no module.

channel 48, WCTDM, no module.

Global data

loadzone = us
defaultzone = us

The only way I can get it to start working is by rebooting the box, I would like to have this all configured so all I have to do is unplug the T1 from the Trixbox and plug it into the PBXIAF box, change some phone settings and we are up and going.

Any ideas what my be causing this to hang…

bdawson · March 6, 2009, 9:24pm

Were you able to get this fixed? We are locking up every other day with a TE122 card as well. All will work fine and then I have to restart asterisk and reload the zapata drivers in order to get the phones working again.

thanks,

Brian

Brett_Matthews · March 8, 2009, 10:48pm

I had this happen when RTC timing was not supplying interrupts to ZT dummy. My * box would hang on the Playback app, and nothing was heard.
I ended up fixing this by moving to a kernel that supplies ztdummy with highres timers.

zttest is probably hung showing 0%. Hence when you restart asterisk the problem with zaptel is coming up. Yes you can fix it by restarting zaptel service, and then asterisk after, but the problem comes back after a while anyway.

I would like to know how you get on.

regards, Brett

bullet117 · March 9, 2009, 6:59pm

Brett you mentioned that you "ended up fixing this by moving to a kernel that supplies ztdummy with highres timers."
I am using PBX in a Flash similar to TrixBox, what would I have to do to add highres timers for the ztdummy.

Brett_Matthews · March 9, 2009, 10:21pm

I think PBX in a flash is similar to Elastix which is what I use most of the time, and is based on Centos 5.2.

this version of Centos uses kernel 2.6.18, and high res timers only came out as of about 2.6.22 ish. Some people have tried turning ACPI off to combat the problem, although what’s the point in having a multi core CPU with it all turned off.

I had to compile a vanilla kernel on the Centos box to get high res timers. Not exactly what I wanted to do, but didn’t have much choice. the boxes in question were only using misdn, so I have to rely on ztdummy for timing.

Need more info then just ask.

bullet117 · March 24, 2009, 2:01pm

Brett, I put a post on the PBX in a Flash forum and got the following response from James, a Senior Member

[i]This is all interesting we were having discussions about the whole hardware timing sources and weather a timing only device had a purpose. The outcome of the conversation was with the newer kernels no because the kernel timers with ztdummy kick the crap out of these fake timing devices. That said I decided to dig more because you quote a recommendation by digium about these “CentOS” based distros like elastix and piaf. I found it further odd that the digium free distro AsteriskNOW uses CentOS5 which would imply that the default kernel should be okay. So I desided to read the code and sure enough the programmers seem to see no need to recompile but prefer HPET which is what centos uses
from dahdi_dummy latest svn
Code:
/*

To use the high resolution timers, in your kernel CONFIG_HIGH_RES_TIMERS
needs to be enabled (Processor type and features -> High Resolution
Timer Support), and optionally HPET (Processor type and features ->
HPET Timer Support) provides a better clock source.
*/
So there should be no need to change the kernel because HPET is already there.

Something to remember CentOS is essentialy RHEL. Redhat for contracrual reasons does not match to the kernel tree. 2.6.18 from centos is not the same as 2.6.18 from kernel.org. Redhat backports most of the updates from kernel.orgs newer kernels.[/i]

What are your thoughts on this?

indi123 · April 4, 2009, 4:40pm

Were you able to get this fixed? We are locking up every other day with a TE122 card as well. All will work fine and then I have to restart asterisk and reload the zapata drivers in order to get the phones working again.
_____________________________-
Diesel Watches | Swiss Watches

bullet117 · April 6, 2009, 2:39pm

I haven’t got it fixed yet, from what I have read a person has to enable the highres timers to avoid the lock-ups, I’m not sure how to enable the highres timers…

Brett_Matthews · April 17, 2009, 11:35pm

I have had no ztdummy failures since upgrading the kernel on 2 of our office PABX’s. I admit it could be something to do with using Dell servers, but out of the box, using the Elastix iso images, it installs Centos 5.2, and ztdummy uses RTC as timing … which I have found to be unreliable on boxes that use Digium BRI cards and misdn. There is no problem with PRI as the timing comes from the Telco.

I’m no genius, but this is just what I have found out particular to our setup.