Call File and SIP Channel Leak?

I’ve been scouring the nets and manuals for a solution to this, and can’t seem to make much headway.

I use python to query phonelists in an MySQL database and build call files for Asterisk.

The outbound calls play a survey and record responses.

I’m working on scaling up the number of calls I can make, and I keep running into trouble.

Currently, I can make three call files per second without trouble, (effectively between 120-180 per minute) but when I try to go past that, I see strange errors.

I’ve already taken care of many errors by increasing the number of file descriptors and other tuning-type operations, but now I believe that an open channel leak is coming into play. I’ve set an rtptimeout, but I still have many channels that seem to sit forever.

I’m putting a ton of data below, hopefully somebody can help me understand these errors.

This graph represents the call files per minute that I’m generating(green), the number of call files in the outgoing directory(red), and the number of open channels(blue), over time. You can see that, for about an hour, I wasn’t generating any new call files, but the number in the outgoing directory and the number of open channels stayed the same.

martythebanker.com/im/speedo/20120329.png

I’m using the latest build of asterisk from the debian repositories.

uname -a
Linux HFC-6800 2.6.32-5-686 #1 SMP Thu Nov 3 04:23:54 UTC 2011 i686 GNU/Linux

asterisk version info:
Connected to Asterisk 1.8.10.0-1digium1~squeeze currently running on HFC-6800 (pid = 19675)

system load:
top - 14:08:11 up 13 days,  1:01,  3 users,  load average: 5.51, 3.61, 2.93
Tasks: 232 total,   1 running, 231 sleeping,   0 stopped,   0 zombie
Cpu(s): 29.1%us,  5.0%sy,  0.0%ni, 61.6%id,  3.6%wa,  0.2%hi,  0.5%si,  0.0%st
Mem:   2069724k total,  1783064k used,   286660k free,    52008k buffers
Swap:  2951160k total,   106460k used,  2844700k free,   608856k cached

sip.conf

[general]
session-timers=refuse
session-expires=180
session-minse=90
session-refresher=uas
[authentication]
        dtmfmode=rfc2833
        context=from-office
        type=friend
        nat=yes
        directmedia=no
        host=dynamic
        nat=no
        directmedia=yes
        disallow=all
        allow=ilbc
        allow=g729
        allow=gsm
        allow=g723
        allow=ulaw
        disallow=all
        allow=ulaw
[Packet8-00]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=14073921813
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-01]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=14073921818
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-02]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=14073921819
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-03]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=14073921820
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-04]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=14073921821
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-05]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=14073921823
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-06]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=16783594220
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-07]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=16783594221
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-08]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=16783594222
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-09]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=16783594223
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

[Packet8-10]
username=xxxxxxxxxxxx
type=peer
ignoresdpversion=yes
secret=xxxxxxxxxxxx
fromuser=16783594224
context=incoming
insecure=port,invite
host=eps1.packet8.net
dtmfmode=rfc2833
allow=ulaw
allow=g729

/etc/security/limits.conf
asterisk        hard    nofile          205000

open files:
lsof | wc -l
6246
lsof -u asterisk | wc -l
1636

some errors: (these happen sporadically, some more often than others)

[Mar 29 11:45:03] ERROR[19708] netsock2.c: getaddrinfo("eps1.packet8.net", "(null)", ...): Name or service not known
[Mar 29 11:45:03] ERROR[10598] cdr_csv.c: Unable to re-open master file /var/log/asterisk//cdr-csv//Master.csv : Too many open files
[Mar 29 11:45:03] ERROR[19708] netsock2.c: getaddrinfo("eps1.packet8.net", "(null)", ...): Name or service not known
[Mar 29 11:45:03] ERROR[19679] asterisk.c: Unable to create pipe: Too many open files


[Mar 29 13:50:11] WARNING[19708] chan_sip.c: Re-invite to non-existing call leg on other UA. SIP dialog '7c004b7446dbeeeb066773e0243076e8@71.43.26.35:5060'. Giving up.

[Mar 29 13:50:27] WARNING[19708] chan_sip.c: Remote host can't match request CANCEL to call '3cf25eca6c8120a55942a79d2d5f06e0@71.43.26.35:5060'. Giving up.

[Mar 29 13:10:39] WARNING[9561] ast_expr2.fl: ast_yyerror():  syntax error: syntax error, unexpected '=', expecting $end; Input:

[Mar 29 13:39:55] WARNING[2012] file.c: Failed to write frame

[Mar 29 13:04:45] WARNING[6251] res_rtp_asterisk.c: RTP Read too short

[Mar 29 14:00:06] WARNING[19708] chan_sip.c: Autodestruct on dialog '698010542e5536cb21a0025563cb93b4@71.43.26.35:5060' with owner in place (Method: BYE)

[Mar 29 12:52:12] WARNING[19708] chan_sip.c: Received response: "Forbidden" from '"asterisk" <sip:14073921819@71.43.26.35>;tag=as4697bebd'

[Mar 29 11:51:29] WARNING[19708] chan_sip.c: Hanging up call 5f69c8e607a754b723a7b57c639b477b@71.43.26.35:5060 - no reply to our critical packet (see        https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions).

[Mar 29 11:48:49] WARNING[15334] res_rtp_asterisk.c: RTP Read error: Bad file descriptor. Hanging up.

[Mar 29 11:47:14] WARNING[13596] res_rtp_asterisk.c: RTP Read error: Socket operation on non-socket. Hanging up.

[Mar 29 11:47:10] WARNING[12217] format_gsm.c: Short read (8) (Resource temporarily unavailable)!

[Mar 29 11:45:05] WARNING[10673] codec_gsm.c: Invalid GSM data (1)
[Mar 29 11:45:05] WARNING[10673] translate.c: gsmtolin did not update samples 0

[Mar 29 11:45:03] WARNING[19679] asterisk.c: Accept returned -1: Too many open files

I noticed with sar -r that my memory utilization at 3 per second was right around 96-97%, and at 4 per second, it started choking again, so I put in an order for more server RAM. I guess 2GB isn’t enough for the load I’m putting on this system.

I’m guessing that it doesn’t handle paged memory very well.

Still don’t know how that will affect the call file leak, last night I wound up with 500+ open channels that sat till this morning when I did a manual restart of asterisk. The call file queue was cleared after we stopped dialing at 8:30.

And now, mysteriously, asterisk has gone from a 450Mb memory footprint, down to about 28Mb.

???

I’m giving it the same load, it just suddenly dropped out of nowhere.