Asterisk Killing Errors

Hi all,

I have an installation of asterisk 1.2.10 on Fedora Core 4, with 2 2.8Ghz procs and 1 gig of memory, that has always worked well for my customers at relatively low traffic. We are now implementing an auto dialer, using 12 channels, on a 70 user site. All goes well until we turn on the auto dialer, at which point errors inevitably begin to occur and eventually brings asterisk to its knees.

This excites the customer greatly.

The errors that we most often see first is this :
Apr 27 16:09:06 WARNING[18724]: format_wav.c:247 update_header: Unable to find our position

This if often followed closely by:
Apr 27 16:10:38 WARNING[19138]: rtp.c:390 ast_rtcp_read: RTP Read error: Bad file descriptor

and its close friend
Apr 27 16:11:07 WARNING[19138]: rtp.c:390 ast_rtcp_read: RTP Read error: Socket operation on non-socket

When these errors appear, they often come in batches of thousands and even thousands per second, making the messages log huge !

Finally, asterisk begins spitting out an “avoided deadlock” warning. Once we see this error, incoming calls are still coming in, but no outgoing calls can be made, including SIP to SIP calls. Asterisk has to be restarted to resume normal operations.

We are running call recording as well as the dialer, but shutting off call recording does not seem to have an effect on the outcome once the dialer is turned on. If we catch it in time, before the deadlock errors, we can shut down the dialer and save asterisk most times.

We monitor systems resources, and have not seen the CPU or memory get even close to being sapped. The Dialer generates a WAV file and stores it to the hard drive. An idea being floated at the moment is that perhaps we are overwhelming the HD I/O with all the call recording and Dialer traffic on the drives, but I have not been able to find any data to support that theory.

Any and all help on this would be GREATLY appreciated. I mean, throw an idea out there and I will look into it. I am a little baffled at this point.

Thanks for all the help in advance !

that is hilarious. i really don’t have much to offer, but i had to comment on your remarkable understatement.

why the format_wav error? is it possible that you’re playing a bad sound file somewhere?

I have been running asterisk 1.2.x since end part of last year and had the same errors i.e.

WARNING[18724]: format_wav.c:247 update_header: Unable to find our position
and thousands of message logs every minute after we have been live with it for a week or so.

I did find a note re the fault on the digium bug list and it was somthing to do with if a file gets >2gig asterisk would die as the code could not cope.

For me it happened when we ended up with some tmp files in the voicemail folder which were huge.ie 3-5 gig in size. To fix our setup we added in the voicemail config the option to make max voicemails 3 mins and have been happy since.

Hope the above helps.

Nick

Hi nickcol,

I checked the size of the VM’s, and the entire directory takes up no more than 400Mb’s at any one time. Also did a little looking around in some temp directories and could not find any large size issues there as well. Having said that though, I did set the time limit on the VM message size anyways just because that seemed a resonable policy. But alas, still getting these cascading errors and eventual fall overs.

I appreciate the advice though ! It was at least something new to try !

I have meet this error.
source code:
asterisk.org/doxygen/1.2/for … 0a182f5d82

00232 {
00233    off_t cur,end;
00234    int datalen,filelen,bytes;
00235    
00236    
00237    cur = ftell(f);
00238    fseek(f, 0, SEEK_END);
00239    end = ftell(f);
00240    /* data starts 44 bytes in */
00241    bytes = end - 44;
00242    datalen = htoll(bytes);
00243    /* chunk size is bytes of data plus 36 bytes of header */
00244    filelen = htoll(36 + bytes);
00245    
00246    if (cur < 0) {
00247       ast_log(LOG_WARNING, "Unable to find our position\n");
00248       return -1;
00249    }
00250    if (fseek(f, 4, SEEK_SET)) {
00251       ast_log(LOG_WARNING, "Unable to set our position\n");
00252       return -1;
00253    }
00254    if (fwrite(&filelen, 1, 4, f) != 4) {
00255       ast_log(LOG_WARNING, "Unable to set write file size\n");
00256       return -1;
00257    }
00258    if (fseek(f, 40, SEEK_SET)) {
00259       ast_log(LOG_WARNING, "Unable to set our position\n");
00260       return -1;
00261    }
00262    if (fwrite(&datalen, 1, 4, f) != 4) {
00263       ast_log(LOG_WARNING, "Unable to set write datalen\n");
00264       return -1;
00265    }
00266    if (fseek(f, cur, SEEK_SET)) {
00267       ast_log(LOG_WARNING, "Unable to return to position\n");
00268       return -1;
00269    }
00270    return 0;
00271 }

because ftell return off_t (long type, 32bit), so it will fail when file size big than 2G.