404 - Call not found after high activity

Hi everyone,

I’m running into a critical issue with Asterisk ARI that I haven’t been able to solve despite extensive debugging.

Setup:

  • Asterisk 20.11 on CentOS 9
  • Using ARI with a Python client (via asterisk-ari lib)
  • A callbot system that handles simultaneous calls
  • Calls are managed in Stasis with recording, playback, and spying channels

Problem:

After ~10 to 20 minutes of normal operation under load, my ARI app starts failing to retrieve channels using:

client.channels.get(channelId=...)

This happens right after receiving a StasisStart event for that same channel. I log the channel ID from StasisStart, then immediately try to retrieve the channel to attach it to my internal state — and I get a 404 from the ARI API:

404 Client Error: Not Found for url: http://localhost:8088/ari/channels/<channel_id>

Even more confusing: the channel doesn’t appear in client.channels.list(), as if it never existed, even though StasisStart was just received.

This starts happening consistently after several minutes of high activity. Until then, everything works perfectly.

Observations:

  • WebSocket seems still open — I never receive a disconnect event.
  • My client.run(apps="app_dialer") loop is in a background thread with auto-retry if it crashes (but I never see a crash).
  • I increased the ulimit -n to 524288, so it’s not a file descriptor exhaustion.
  • Logging shows ~300 channels when this happens. Mostly Snoop/ channels, some PJSIP/.
  • I even tried adding up to 5 seconds of delay before retrieving the channel from ARI after StasisStart, in case it was a race condition — it didn’t help.

Questions:

  • Is there a hard ARI or Asterisk limit on channels or events per WebSocket client?
  • Could some internal ARI channel registry be getting corrupted?
  • Could “orphaned” Snoop channels be exhausting something internally?
  • Any workaround that avoids losing control of in-progress calls?

My Dial Plan

[incoming]
exten => _X.,1,NoOp(Appel entrant reçu)
same => n,Answer()
same => n,Set(FULL_RECORD_PATH=/var/spool/asterisk/recording/full_call/${UNIQUEID}.wav)
same => n,MixMonitor(${FULL_RECORD_PATH})
same => n,Set(TALK_DETECT(set)=)
same => n,Stasis(app_dialer)
same => n,Hangup()

Any guidance or ideas would be greatly appreciated :folded_hands:
Thanks in advance!

You’d need an Asterisk log and ARI interaction log.

I’m trying to log the ARI interactions accurately, but none of the existing logs are helping.
I suspect the issue might be a delay between when ARI emits an event and when my Python app receives it.
I can log the timestamp when Python handles the event, but I need to know exactly when ARI sends it.

Any idea how I can log this at the source?

“ari set debug all on” to log it to console in Asterisk.

Thanks you for your time

It seems the issue is indeed caused by a delay between when ARI sends the event and when my Python app receives it.

In the ARI logs:

[2025-04-24 19:14:15] VERBOSE[289430] ari/resource_events.c: <— Sending ARI event to 127.0.0.1:34812 —>
{
“type”: “StasisStart”,
“timestamp”: “2025-04-24T19:14:15.635+0200”,
“args”: ,
“channel”: {
“id”: “1745514855.2727”,
“name”: “PJSIP/yaniv_incoming-00000159”,

In my Python logs:

2025-04-24 19:14:25,528 - INFO - 1745514855.2727 - PJSIP/yaniv_incoming-00000159 - Stasis start

Is there any reason this delay could be caused by Asterisk? Maybe some kind of internal ARI queueing?
From what I can see, the server’s performance looks fine — I don’t see where this could come from.

The log message occurs immediately before being written out the Websocket TCP connection. There is no queue within Asterisk at that point. Any queue would be in the underlying TCP/IP stack for the connection, or the receive buffer in the connection for the Python application, or inside of Python.

1 Like

Hello it happens I have the same issue as OP. Do you happen to have any tips or directions on how to debug or fix it?
Thanks!

Hi, I resolved this problem by changing the client.py of the ARI Python library, by adding a queue and priority-based events.
Also, I noticed that it’s much more efficient to have 5 ARI clients each handling 20 messages than a single one handling 100.