Weird stasis_cache.c ERROR in Asterisk 13.x

Hi all

We’re using Asterisk for very primitive operation, SIP-peers and SIP-gateways. Recently we switched from 1.8.26 version to 13.0 (and later to 13.1.0, which is current)

We’re having a realtime config for sip.conf and sip peers.

As we have migrated on 13.0.0 version, we started seeing a log of error like this in our logs:

This problem does not seem to be related to call flow at all, but probably (this is only my suspection) is related to “refreshing registrations” lifecycle, although it definitely is NOT 100% correlated.

After upgrade on 13.1.0 version, this problem still persists. I’ve posted a question on asterisk-users maillist (lists.digium.com/pipermail/aster … 85494.html), but got no response at all. Maybe I’ll get more luck here, at least, somebody will tell me what am I asking wrong.

I am ready to provide any relevant information, just cannot yet figure out WHICH is relevant here.

This is result of “grep ERROR /var/log/asterisk/message” (only some lines are cited)

[2015-01-10 01:41:28] ERROR[14636] stasis_cache.c: Attempting to remove an item from the SIP/ics0002-cached cache that isn't there: ast_endpoint_snapshot_type SIP/ics0002 [2015-01-10 08:06:18] ERROR[14636] stasis_cache.c: Attempting to remove an item from the SIP/GW_Nauka-cached cache that isn't there: ast_endpoint_snapshot_type SIP/GW_Nauka [2015-01-10 08:06:18] ERROR[14636] stasis_cache.c: Attempting to remove an item from the SIP/ics0002-cached cache that isn't there: ast_endpoint_snapshot_type SIP/ics0002 [2015-01-10 09:44:10] ERROR[14636] stasis_cache.c: Attempting to remove an item from the SIP/ics0002-cached cache that isn't there: ast_endpoint_snapshot_type SIP/ics0002 [2015-01-10 10:15:05] ERROR[14636] stasis_cache.c: Attempting to remove an item from the SIP/ics0002-cached cache that isn't there: ast_endpoint_snapshot_type SIP/ics0002
ics0002*CLI> core show version
Asterisk 13.1.0 built by root @ devsrv1 on a x86_64 running Linux on 2014-12-22 19:46:15 UTC

I don’t exactly understand what Stasis cache is (was unable to find any description of it), but I suspect that an ERROR is a serious signal of “something goes wrong”, so I greatly appreciate any advice on this.

Many thanks

Kirill Marchuk

Stasis is part of the internal message passing mechanism.

You should probably report this on issues.asterisk.org/ but they will require complete logs, together with debug level logs (although you can edit them for sensitive data). As you suggest this is associated with registration, you will need to enable SIP logging (for chan_sip, use “sip set debug on”. I don’t know what the equivalent is for pjsip.

Is there a way to find out WHICH component (module, application, etc) is “Attempting to remove an item” ? maybe some debugging mode to print stack traces on ERRORs ?

just wondering…

There may well be, but you can also run it under gdb with a breakpoint that automatically outputs a backtrace then continues execution.

However this sort of bug can be difficult as the primary error could have happened some time before the fault is detected.

Well, it would be totally helpful and nice if someone could provide a link on an article where alike procedures (“run it under gdb with a breakpoint that automatically outputs a backtrace then continues execution”) over Asterisk process would be discussed.

I’ve took a quick look at gdb, but without symbol tables I can’t actually figure out how could I “catch” this precisely ERROR condition…

Also I’m in doubt whether just having run “file /usr/sbin/asterisk” in gdb console would be equal to running asterisk as a service, as it does normally

Thanks a lot !

If you built Asterisk yourself, you should have symbol tables. If you didn’t build it yourself, and can’t you will just have to wait until someone who has is able to debug this.

i am pretty sure I built it myself by running configure/make, etc

and when running “file /usr/sbin/asterisk” it displays “Reading symbol table”

but if I try to run someting like
(gdb) break res/res_config_ldap.c:1710

I get:

No source file named res/res_config_ldap.c.
Make breakpoint pending on future shared library load? (y or [n])

How can I set a break on a specific function/line of a specific module ?

Thanks

Just say yes, and when it loads the module, it will set the breakpoint.

Thanks…

I am probably missing something very obvious, but still…

What I do is:

  1. make sure asterisk is running (service asterisk status/start)
  2. ps ax | grep asterisk => now I know PID
  3. gdb
  4. attach PID => displays loaded symbols
  5. since then, if I try to register in a softphone (Zoiper, for that matter), I get eternal timeout. If I try to run !asterisk -r, it displays a “hint” (“Asterisk 13.0.0, Copyright © 1999 - 2014, Digium, Inc. and others. Created by Mark Spencer markster@digium.com”, etc) but DOES not display command prompt (by the way, I cannot even leave gdb after that - only by sending it to background by Ctrl-Z and “killall gdb” afterwards)

Evidently, something is broken (or, rather, blocked) in an asterisk process whenever gdb is attached, even if NO break- or tracepoints have been defined yet

whenever I run “detach” in gdb, everything returns to normal behaviour. 100% reproducible.

Is it possible to figure out why is this happening ?

Again thanks for your support and help,
Kirill

“continue”

I noticed that this (described above) happens even when I have no breakpoints defined, just after attaching gdb to the process ? is this ok ?

anyway, when I type “continue”, I see a “Continuing” word in the next line and gdb prompts goes off; I still can type in the console, but that looks like it’s “all in vain” (how is this situation called ? that’s exactly what i’ve come across when I’ve tried to launch “!asterisk -r” in gdb after attaching

thanks a lot…

I noticed something else

somehow I had “breakpoint” after starting gdb session like I described…
then I deleted it using “delete breakpoint” and answering “y”

this is what happens after attach process now:
(gdb) bt
#0 0x00007f064ab84223 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00000000004590d3 in monitor_sig_flags (unused=unused@entry=0x0)
at asterisk.c:3659
#2 0x000000000043861d in main (argc=, argv=)
at asterisk.c:4638
(gdb) info breakpoints
No breakpoints or watchpoints. (then why it shows backtrace ??)
(gdb) fg
Continuing.
[New Thread 0x7f0619a7f700 (LWP 13533)]
Cannot get thread event message: debugger service failed
(gdb) detach
Detaching from program: /usr/sbin/asterisk, process 8498

only after running “detach” my “zoiper registration” immediately finishes normally; before that it hangs and hangs and hangs.

So what I conclude is

  1. after attaching to a PID, asterisk process is suspended somehow (also I dont understand why, now that i’ve deleted breaks);
  2. it is suspended on the main function
  3. continue somehow does not help (it helped before, when I pressed Ctrl-C in that mysterious “type in vain” mode)
  4. detach leaves asterisk in peace and it can proceed his activity

things are getting less so mysterious but I still can’t figure out how to solve a task I want to solve (that is: attach to a running asterisk, set breakpoint or tracepoint on a given line of a given module; on triggering that break/tracepoint dump backtrace somewhere, preferably in file, and proceed execution)

thanks a lot

I’ve got a clue of why is this happening (stasis_cache.c error) !

It happens when my webrtc-peer is UNREGISTERing (by sending REGISTER request with a Contact header containing “expires=0”

Strangely and sad, whether or not this error (stasis_cache.c) occurs, unregistration [sometimes ?] does not proceed as such, i.e. that peer is still displayed via “sip show peers” and Asterisk tries to pass calls on this peer, when it’s actually not connected already.

(we’re using chan_sip with realtime peers, rtcachepeers is enabled)

Thanks for any suggestions…

I was having this exact same problem with my system when I upgraded from 11 to 12. What it ended up being was some columns in sip_users mysql table that where set to blank instead of being set to NULL. Soon as I changed them from blank to null the errors went away. Hope that helps :smiley:

Actually I spoke to soon, I think I am still having the same problem. Anyone know how to fix this. Could it have something to do with using a different external IP?

The problem ended up being this

change
rtcachefriends=no
to
rtcachefriends=yes

in sip.conf [general]

Let me know if that fixes it for you too.

sorry for late reply

somehow, we are already having this option for quite a long time:
sip.conf general rtcachefriends yes
(this is a line of a mysql table, that is used as a realtime config instead of sip.conf)

:frowning:

and still we’re getting that stasis_cache.c issue.

on both Asterisk 13.1.0. and 13.2.0

If you re-attach the debugger, it will pause the program and print a back trace of the current state of the program.

2david55: and how do I make the process running afterwards ?

also, how can I for sure make it stop and print a backtrace when process enters given function in a given module ?

continue will make it continue to run under the debugger.

quit will delete all breakpoints and let the program run normally.

To make it stop, you must run it under the debugger with a breakpoint set on the first executable line of the function.

You will need to run Asterisk with the -f option (or -c) so that it doesn’t detach itself from the console.