Migrating to the newer manager.conf eventfilter am I doing something wrong?

Trying to improve our AMI event filter. Migrating from the older approach to the newer more efficient approach.

Old…
eventfilter=!Device: confbridge:
eventfilter=!Device: Local/
eventfilter=!Device: CBAnn/
eventfilter=!Device: CBRec/
eventfilter=!Variable: BRIDGEPEER
eventfilter=!Variable: CPLAYBACKOFFSET
eventfilter=!Variable: CPLAYBACKSTATUS
eventfilter=!Variable: CPLAYBACKSTOPKEY

Is this the correct way to filter using the newer action, name, etc?
Asking because this seems to be running slightly slower than using the above eventfilter approach.

eventfilter(action(exclude),name(DeviceStateChange),header(Device),method(starts_with)) = confbridge:
eventfilter(action(exclude),name(DeviceStateChange),header(Device),method(starts_with)) = Local/
eventfilter(action(exclude),name(DeviceStateChange),header(Device),method(starts_with)) = CBAnn/
eventfilter(action(exclude),name(DeviceStateChange),header(Device),method(starts_with)) = CBRec/
eventfilter(action(exclude),name(VarSet),header(Variable),method(starts_with)) = BRIDGEPEER
eventfilter(action(exclude),name(VarSet),header(Variable),method(starts_with)) = CPLAYBACKOFFSET
eventfilter(action(exclude),name(VarSet),header(Variable),method(starts_with)) = CPLAYBACKSTATUS
eventfilter(action(exclude),name(VarSet),header(Variable),method(starts_with)) = CPLAYBACKSTOPKEY

I’m guessing the starts_with is not the best way to do this. Any suggestions?

I’m looking at your filters in more depth but in the meantime, what do you mean by “slower”?

We’re running a very high volume test.

In the older approach, I scanned a single second of events and found 1370 packets from Asterisk.

Using the newer approach, new test I looked at a couple different seconds of data and noticed we fewer (700-800) packets from Asterisk.

Obviously, any second is purely a snapshot and it’s possible some other second could have higher volume.

The tests being run are automated (lots of ConfBridges for channels to communicate with each other, callers, WebSocket/SIP channels for our Web based Agents, etc).

It’s possible I was unlucky on looking at samples to compare. However, asking the person running our automated tests (after I already gathered my own thoughts from the logs) whether he thought the newer approach was better, equal, or slower he indicated that he thought it was slightly slower. He would have only heard things from audio and a screen from one of the agents answering calls.

1370 vs 700-800 events from Asterisk?

During your tests, do a core show taskprocessors and look for the stasis/m:manager:core taskprocessor and compare the counts. You can do this easily with asterisk -rx "core show taskprocessors" | grep manager

You can also do a manager show eventq and see what the backlog looks like.

Oh, I wonder if you’re getting less events because the old filtering was allowing more events to pass than it should have where the new filtering is more specific.

The person running the load test gathered the data. Does this indicate the manager queue is actually fine?

stasis/m:manager:core-00000006 163419 0 27 2700 3000

stasis/m:manager:core-00000006 167030 0 27 2700 3000

stasis/m:manager:core-00000006 167725 0 27 2700 3000

The manager show eventq also wasn’t bad. 10 events.

Have I been looking in the wrong area for the bottleneck?

Those stats appear perfectly normal. This may not be easy or even possible given your test methodology but would it be possible for you to actually capture the events in both scenarios and compare the two to see if you’re actually getting more events from the old filters because some are making it through that are being filtered out by the new filters? Actually, you don’t even have to do this under load. Just run 5 calls under each scenario, ensuring the calls take the same path in both, then compare the results.

Other than the number of events are you seeing any other side effects of the new filters? Are your AMI apps complaining about missing events for instance?

Good idea to check the number of events. I will have the tester run a scaled down test for 5 minutes to see how the number of events compares.

Things seem fine with the AMI app, so I don’t think there are any issues with the actual data and the new filters.

The tester did just send me another snapshot of the taskprocessors.
I pointed out to him that he allocated way too many PJSIP and WebRTC endpoints for what he needs. (5000-6000 extra task processors that he allocated and I was not aware of until now). Nothing would connect to these endpoints, but it’s still a waste that I asked him to cleanup.

The manager in this case reached a Max Depth of 204 (not terrible).
However, the bridge one seems to be bad. (We are using lots of ConfBridges because we have to support anywhere from Agent + 1-35 calls that the Agent is communicating with. It can start at 1, but may grow depending on the conversation. Side effect, we are stuck with the CBAnn and know this runs into the queue issue. I put in a feature request to allow ConfBridges that don’t need to play (AMI Bridge support doesn’t have the multiple party support we need). I suspect that CBAnn queue issue might be related to the bridge Max Depth being high.

stasis/m:bridge:all-00001ac0 623992 0 392 450 500
stasis/m:manager:core-00000006 6781264 0 204 2700 3000
stasis/p:endpoint:PJSIP/x.x.x.x-0000001e 764889 0 155 450 500

We ran a single call at a time test with same number of calls.
Number of events matches between the old and the new eventfilter approach.

I suspect the reason I was seeing more happen in the older approach was a case of dumb luck. There is a lot happening in our load tests. 370 requests to Asterisk include ConfBridge work.
The second leading up to the older eventfilter logs I was looking at had more requests we sent to Asterisk. Meaning they were queued, started processing, caused additional things (ConfBridge + Bridge events), etc.

Closing this question.