Is there any way to monitor usage more proactively? I don’t want to wait until there are errors to take action, but I also don’t want to mess with a working box if there’s no issue. By monitoring usage, I can start to get a feel for what’s going on?
I believe Asterisk has an open file limit of 1024. I’m trying to figure out a bash command or some other method that can help me keep an eye on this (if possible).
If what you’re referring to is the number of file descriptors that can be passed to select, then @david551 is correct - it is limited to FD_SETSIZE, which is typically 1024. For the most part, that is what Asterisk uses (subject to some various operating system things). While there are some performance limitations with select, they only typically happen when you have a large number of file descriptors you’re polling on. That’s rarely the case in Asterisk, where you generally only have a few file descriptors (the ones associated with an instance of ast_channel) that are being polled on in a single call - usually about 6 or so.
Granted, Asterisk will be calling select at some point on each channel, each of which typically has its own dedicated thread (sometimes; outbound channels are polled by an inbound channel’s thread, and even that’s not always the case… things are complex in places. ¯\_(ツ)_/¯)
All of that is probably not what you were referring to however. If what you are running into is a lack of file descriptors, that’s a limitation of your operating system configuration, not Asterisk.
You can increase the amount of file descriptors available to you by using ulimit, although that may not be picked up by all processes unless made permanent.
Generally speaking, you cannot monitor this situation in Asterisk. Asterisk is going to merrily consume file descriptors, and it doesn’t have any normal way of tracking which file descriptors it owns. If you compile Asterisk with DEBUG_FD_LEAKS, however, you will get the CLI command core show fd, which will dump all the relevant file descriptors Asterisk has opened. You will take a performance penalty for enabling this option.
You can also use standard Linux commands to figure out how many file descriptors Asterisk has opened. That’s probably a better approach to go by, if you want to have a script you can run to see how close you are getting to your system limits.
Or just bump the limit to 64k or so and move on to another problem.
select is sensitive to the highest FD number, not (just) to the number currently being awaited. If I remember correctly, it has a fixed size bitmap of the active FDs, indexed by FD number.
A long time ago we had problems with FD leaks, and the way the symptom presented was in terms of false wakeups because the FD numbers had wrapped modulo 1024. I think it was actually parking that used select, and we made heavy use of parking.
Thanks for the insight! As I’ve googled it appears there are alot of prescribed ways to do this, many of which do not appear to work. Do you have a link to instructions that you can point me to that would increase the limit in such a way that asterisk would be updated with the new limit? Asterisk 14/CentOS 7.
The limit on select can only be changed by rebuilding the kernel and every other component that uses the select system call.
For the softer limits, make sure that the scripting that runs Asterisk from init is using bash and invoke ulimit before you invoke bash in the same, or a superior shell level.
However note that the defaults are either unlimited, or where it involves pre-allocating the maximum, generous. Also you may find that some limits are configurable, and therefore set, in Asterisk, and I think the stack size for pthreads is determined by that, and not by the kernel.
There are some soft limits set elsewhere, e.g. the temporary file system size, on modern systems, which use RAM and swap, is usually set in /etc/fstab. Other file system sizes are a set by how you partition the disks, further limited by physical constraints.
The only limits I have hit, in practice, is the select one, and that is architectural, so requires the complete distribution to be rebuilt, and physical ones.
Obviously you could also hit limits due to physical resources, like the total disk space.
Linux tends to take a default position that all users are trusted, but other systems may impose default file usage limits, but even then they are unlikely to apply for system daemons started by the system.
That makes sense in Parking land, as it was (maybe still is?) a single select that waits for any of the channels in Park to be serviced. I haven’t looked in awhile, but I’d expect it to be similar now even though Parking uses the waiting bridge technology.
Ideally, one of these days we’d switch to epoll, but it’s a non-trivial amount of work and only has benefits in certain circumstances (like parking a bunch of calls, or maybe really large conferences).