Lately I am having problems with an Asterisk server that has been working just fine for the last couple of months. What happens is that Asterisk process hangs - it is running and is accessible with “asterisk -r” but it does not receive/respond to any SIP messages.
The server setup is quite simple and the system size is very small. The system consist of:
- HP Microserver server that is running CentOS 6.0 and Asterisk 1.8.5.0 (run as non-root)
- ISDN-to-VoIP gateway (connected to Asterisk via SIP Trunk)
- GSM-to-VoIP gateway (connected to Asterisk via SIP Trunk)
- 2 Grandstream GXP2100 IP Phones
The DialPlan is quite simple - all incomming calls are routed to ring group 450, that rings both SIP Phones (200 and 201). All the calls are recored with the help of MixMonitor (I suspect that that the problem is here). This is the relevant part of the dialplan:
;Call to ring group 450
exten => 450,1,Set(MixMon_Filename=${STRFTIME(${EPOCH},,%Y%m%d-%H%M%S)}-${CALLERID(num)}-${UNIQUEID})
exten => 450,n,MixMonitor(${MixMon_Filename}.wav)
exten => 450,n,Dial(SIP/200&SIP/201,30,t)
exten => 450,n,Hangup()
The server is running very good, but lately we have been having problems with telephony not working. The cause of the problem is in the Asterisk process just hanging and not receiving/responding to any more SIP messages. Asterisk hanged for the first time a week ago (after couple months of normal operation), the second time yesterday morning and the third time today morning. The Asterisk process hanging is quite randomly and is becomming more and more frequent, which is concerning me.
This is the Asterisk process behaviour when the server gets in a state when telephony is not working:
- asterisk and safe_asterisk processes are running
- I can locally login to asterisk console via “asterisk -r” command
- command “sip show peers” is showing all peers as up and running (gateways and IP Phones have status OK)
- command “sip set debug on” is working OK, but when I make a call, I do not see any packets showing in Asterisk console. If I make a packet capture with tcpdump I can see that the SIP packets are reaching the server’s NIC, it is just not reaching the Asterisk process
- command “core restart now” is not working (after I issue the command, nothing happens)
- Linux command “kill asterisk_pid” does not kill the Asterisk process - I have to use “kill -9 asterisk_pid” to kill the process
- after I manuall kill the Asterisk process, safe_asterisk starts up the Asterisk again - after that the Asterisk process is running OK (accepting and replying to SIP messages as expected)
When I checked the Asterisk logs I saw that the last call log before the Asterisk hangs was:
[Sep 17 08:42:26] VERBOSE[13940] pbx.c: -- Executing [450@incomming_calls:1] Set("SIP/GSM_Trunk_1-00000309", "MixMon_Filename=20110917-084226-055555555-1316241740.777") in new stack
[Sep 17 08:42:26] VERBOSE[13940] pbx.c: -- Executing [450@incomming_calls:2] MixMonitor("SIP/GSM_Trunk_1-00000309", "20110917-084226-055555555-1316241740.777.wav") in new stack
[Sep 17 08:42:26] VERBOSE[13940] pbx.c: -- Executing [450@incomming_calls:3] Dial("SIP/GSM_Trunk_1-00000309", "SIP/200&SIP/201,30,t") in new stack
[Sep 17 08:42:26] VERBOSE[13940] netsock2.c: == Using SIP RTP CoS mark 5
[Sep 17 08:42:26] VERBOSE[13940] app_dial.c: -- Called SIP/200
[Sep 17 08:42:26] VERBOSE[13940] netsock2.c: == Using SIP RTP CoS mark 5
[Sep 17 08:42:26] VERBOSE[13940] app_dial.c: -- Called SIP/201
[Sep 17 08:42:26] VERBOSE[13940] app_dial.c: -- SIP/200-0000030a connected line has changed. Saving it until answer for SIP/GSM_Trunk_1-00000309
[Sep 17 08:42:26] VERBOSE[13940] app_dial.c: -- SIP/201-0000030b connected line has changed. Saving it until answer for SIP/GSM_Trunk_1-00000309
[Sep 17 08:42:26] VERBOSE[13942] app_mixmonitor.c: == Begin MixMonitor Recording SIP/GSM_Trunk_1-00000309
[Sep 17 08:42:26] VERBOSE[13940] app_dial.c: -- SIP/201-0000030b is ringing
[Sep 17 08:42:26] VERBOSE[13940] app_dial.c: -- SIP/200-0000030a is ringing
For reference, this is a log of a “normal call”:
[Sep 17 08:32:27] VERBOSE[13848] pbx.c: -- Executing [450@incomming_calls:1] Set("SIP/GSM_Trunk_1-000002ff", "MixMon_Filename=20110917-083227-055555555-1316241141.767") in new stack
[Sep 17 08:32:27] VERBOSE[13848] pbx.c: -- Executing [450@incomming_calls:2] MixMonitor("SIP/GSM_Trunk_1-000002ff", "20110917-083227-055555555-1316241141.767.wav") in new stack
[Sep 17 08:32:27] VERBOSE[13848] pbx.c: -- Executing [450@incomming_calls:3] Dial("SIP/GSM_Trunk_1-000002ff", "SIP/200&SIP/201,30,t") in new stack
[Sep 17 08:32:27] VERBOSE[13849] app_mixmonitor.c: == Begin MixMonitor Recording SIP/GSM_Trunk_1-000002ff
[Sep 17 08:32:27] VERBOSE[13848] netsock2.c: == Using SIP RTP CoS mark 5
[Sep 17 08:32:27] VERBOSE[13848] app_dial.c: -- Called SIP/200
[Sep 17 08:32:27] VERBOSE[13848] netsock2.c: == Using SIP RTP CoS mark 5
[Sep 17 08:32:27] VERBOSE[13848] app_dial.c: -- Called SIP/201
[Sep 17 08:32:27] VERBOSE[13848] app_dial.c: -- SIP/200-00000300 connected line has changed. Saving it until answer for SIP/GSM_Trunk_1-000002ff
[Sep 17 08:32:27] VERBOSE[13848] app_dial.c: -- SIP/201-00000301 connected line has changed. Saving it until answer for SIP/GSM_Trunk_1-000002ff
[Sep 17 08:32:27] VERBOSE[13848] app_dial.c: -- SIP/201-00000301 is ringing
[Sep 17 08:32:27] VERBOSE[13848] app_dial.c: -- SIP/200-00000300 is ringing
[Sep 17 08:32:29] VERBOSE[13848] app_dial.c: -- SIP/200-00000300 connected line has changed. Saving it until answer for SIP/GSM_Trunk_1-000002ff
[Sep 17 08:32:29] VERBOSE[13848] app_dial.c: -- SIP/200-00000300 answered SIP/GSM_Trunk_1-000002ff
[Sep 17 08:33:25] VERBOSE[13848] pbx.c: == Spawn extension (incomming_calls, 450, 3) exited non-zero on 'SIP/GSM_Trunk_1-000002ff'
[Sep 17 08:33:25] VERBOSE[13849] app_mixmonitor.c: == MixMonitor close filestream
[Sep 17 08:33:25] VERBOSE[13849] app_mixmonitor.c: == End MixMonitor Recording SIP/GSM_Trunk_1-000002ff
From what I can gather here is that there is some problem with the MixMonitor that is causing the Asterisk to hang. I am not sure if I am correct (and even if I am right, I do not know how to fix it), so I need your help with identifying and fixing the problem.
Two things that I have also noticed and find a bit odd are also:
With the call that caused the Asterisk process to hang, the “Begin MixMonitor Recording” message came very late - after the “Called SIP/200” and “Called SIP/201” messages. I have been observing other call logs and I did not see the “Begin MixMonitor Recording” message showing up so late - usually the messages show up in the same order as in the upper call log.
- What does the “connected line has changed. Saving it until answer for” mean?
This is my first serious problem with Asterisk so I hope that we can get to the bottom of it. Thanks for all the help in advance.