Channel count in struct ast_endpoint continuously increasing and causing memory leak in case of bridge application


#1

For an endpoint whenever a new channel gets created or hanged up or if endpoint register/unregister itself a new endpoint snapshot in endpoints.c is created and published to stasis. why is there a need to create snapshot and eventually to publish to stasis?

also why the endpoint holds the unique_ids of channels until the asterisk process stops even when those channels have been destroyed for a long time.


#2

Snapshots are raised when underlying state changes to inform parts of the system that things have changed. Also it should not have channel ids for channels that are no longer present, so that would be a bug.


#3

i added some logs in ast_endpoint_snapshot_create function for the channel count and this count is increasing every time a channel is created.
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107069 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107070 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107071 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107072 max channel: -1

here channel count corresponds to

channel_count = ao2_container_count(endpoint->channel_ids);

also at any instance there are as max as 400 channels


#4

Okay. As I mentioned, if they have actually gone away then they should no longer be present. If using a current version of Asterisk then issues can be reported on the issue tracker[1] but we would need a test case to ensure we reproduce it.

[1] https://issues.asterisk.org/jira


#5

okay, i’ll open this issue at issue tracker and i’m using the asterisk-certified-13.21-cert3 version of asterisk.


#6

Certified Asterisk only receives fixes as a result of a license agreement user encountering an issue. If you are not a license agreement holder then that branch will not receive any fix that may come out of this.


#7

but can a fix be provided for non certified branch so that i can maybe apply that as a patch.


#8

Bug fixes always go into currently supported branches where they are applicable. I also have no time frame on when this would get looked into.


#9

okay but can you confirm on the basis of my inputs that this is not the desired behaviour so that i can look into it by myself.


#10

I believe I’ve already confirmed twice that if the channel no longer exists then it should not be present in the list of channel ids.


#11

Alright, thank you. i’ll try to find out the bug myself and also post this as an issue on issue tracker.


#12

https://issues.asterisk.org/jira/browse/ASTERISK-28197


#13

@jcolp can you please give me some direction regarding where should i look in the code for fixing this issue.


#14

Without digging in myself I can’t really give any hints. You’d just have to see how it works, if the condition that causes it to get removed isn’t being met, and why not.


#15

I did some debugging and found that endpoint_cache_clear is the route callback for an endpoint which is responsible for cleanup of channel_ids and in case of dial application it has been invoked twice while in case of originate and bridge it only get invoked once.


#16

In case of Bridge application when bridge_exec from features.c is invoked it calls the ast_channel_unref(current_dest_chan) to remove the channel reference if channel is not in the Bridge, which gives the following function call stack.

#0 topic_remove_subscription (topic=0x7fbf2c008290, sub=0x36095a0) at stasis.c:711
#1 0x00000000005c8a22 in stasis_forward_cancel (forward=0x7fbf2c009fa8) at stasis.c:925
#2 0x00000000004cd126 in ast_channel_internal_cleanup (chan=0x7fbf2c0243d0) at channel_internal_api.c:1553
#3 0x00000000004af818 in ast_channel_destructor (obj=0x7fbf2c0243d0) at channel.c:2363
#4 0x000000000045b94b in internal_ao2_ref (user_data=0x7fbf2c0243d0, delta=-1, file=0x61cb0b “astobj2.c”, line=518, func=0x61cd21 <_ FUNCTION .8693> " _ao2_ref")
at astobj2.c:451
#5 0x000000000045bc2e in __ao2_ref (user_data=0x7fbf2c0243d0, delta=-1) at astobj2.c:518
#6 0x000000000050c26a in bridge_exec (chan=0x7fbf2c00a4d0, data=0x7fbfb3ffc460 “SIP/201-00000000”) at features.c:1129

finally the call to stasis_forward_cancel causes the removal of 4 subscribers from the non-pbx channel’s topic and one of those 4 subscribers carries the callback endpoint_cache_clear which is needed to remove channel_ids from struct ast_endpoint