Monitoring the health of the Asterisk daemon, CO lines, etc

Every once in a while, I get calls from clients regarding one of their phone services being down. Even if a phone system has been fine for months, a SIP device will mysteriously stop responding or a Zap channel won’t pick up incoming calls anymore. A restart or power down;up usually fixes it.

I’d like to have some sort of health monitoring in place that I can run (cron) once an hour during business hours to trigger red alarms in my own office. This way, while I won’t always be able to determine exactly what the cause was, I can work to quickly bring whatever service is down back up before anyone at the site knows there was an issue.

If you’ve done anything like this give me a hint or two as to how you went about it. Did you use the manager API? Is cron the only way?

the best thing i can think of would be to have an extension set up that somehow could be able to return a valid response on some input, so that you could call it from your remote location to monitor the system…

something like this: (???)

exten => 99871,1,Answer exten => 99871,n,Read(TEST,,4) ; your system sends 3456 at this point exten => 99871,n,Gotoif($[${TEST}=3456],101) exten => 99871,n,Hangup ;your system would be waiting for audio on the line, if none is played, the system is down. exten => 99871,101,Playback(success)

this is just a rough idea i had when we were having DTMF issues…don’t know if this would help or not, but it’s there…

EDIT: and as far as monitoring asterisk, there are quite a few utilities out there that would work. not sure on the telco lines, but THAT would be VERY useful for me, so if anyone does have a resource for that, please let me know.

how about querying CLI > zap show status and parsing that to find alarms ?

that’s a good idea - i already have a manager socket application built, i will have to play with that…

I’ll start on something using the Manager API. We’ll see if its wiki-worthy…