Channel count in struct ast_endpoint continuously increasing and causing memory leak in case of bridge application

#1

For an endpoint whenever a new channel gets created or hanged up or if endpoint register/unregister itself a new endpoint snapshot in endpoints.c is created and published to stasis. why is there a need to create snapshot and eventually to publish to stasis?

also why the endpoint holds the unique_ids of channels until the asterisk process stops even when those channels have been destroyed for a long time.

#2

Snapshots are raised when underlying state changes to inform parts of the system that things have changed. Also it should not have channel ids for channels that are no longer present, so that would be a bug.

#3

i added some logs in ast_endpoint_snapshot_create function for the channel count and this count is increasing every time a channel is created.
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107069 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107070 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107071 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107072 max channel: -1

here channel count corresponds to

channel_count = ao2_container_count(endpoint->channel_ids);

also at any instance there are as max as 400 channels

#4

Okay. As I mentioned, if they have actually gone away then they should no longer be present. If using a current version of Asterisk then issues can be reported on the issue tracker[1] but we would need a test case to ensure we reproduce it.

[1] https://issues.asterisk.org/jira

#5

okay, i’ll open this issue at issue tracker and i’m using the asterisk-certified-13.21-cert3 version of asterisk.

#6

Certified Asterisk only receives fixes as a result of a license agreement user encountering an issue. If you are not a license agreement holder then that branch will not receive any fix that may come out of this.

#7

but can a fix be provided for non certified branch so that i can maybe apply that as a patch.

#8

Bug fixes always go into currently supported branches where they are applicable. I also have no time frame on when this would get looked into.

#9

okay but can you confirm on the basis of my inputs that this is not the desired behaviour so that i can look into it by myself.

#10

I believe I’ve already confirmed twice that if the channel no longer exists then it should not be present in the list of channel ids.

#11

Alright, thank you. i’ll try to find out the bug myself and also post this as an issue on issue tracker.

#12

https://issues.asterisk.org/jira/browse/ASTERISK-28197

#13

@jcolp can you please give me some direction regarding where should i look in the code for fixing this issue.

#14

Without digging in myself I can’t really give any hints. You’d just have to see how it works, if the condition that causes it to get removed isn’t being met, and why not.

#15

I did some debugging and found that endpoint_cache_clear is the route callback for an endpoint which is responsible for cleanup of channel_ids and in case of dial application it has been invoked twice while in case of originate and bridge it only get invoked once.

#16

In case of Bridge application when bridge_exec from features.c is invoked it calls the ast_channel_unref(current_dest_chan) to remove the channel reference if channel is not in the Bridge, which gives the following function call stack.

#0 topic_remove_subscription (topic=0x7fbf2c008290, sub=0x36095a0) at stasis.c:711
#1 0x00000000005c8a22 in stasis_forward_cancel (forward=0x7fbf2c009fa8) at stasis.c:925
#2 0x00000000004cd126 in ast_channel_internal_cleanup (chan=0x7fbf2c0243d0) at channel_internal_api.c:1553
#3 0x00000000004af818 in ast_channel_destructor (obj=0x7fbf2c0243d0) at channel.c:2363
#4 0x000000000045b94b in internal_ao2_ref (user_data=0x7fbf2c0243d0, delta=-1, file=0x61cb0b “astobj2.c”, line=518, func=0x61cd21 <_ FUNCTION .8693> " _ao2_ref")
at astobj2.c:451
#5 0x000000000045bc2e in __ao2_ref (user_data=0x7fbf2c0243d0, delta=-1) at astobj2.c:518
#6 0x000000000050c26a in bridge_exec (chan=0x7fbf2c00a4d0, data=0x7fbfb3ffc460 “SIP/201-00000000”) at features.c:1129

finally the call to stasis_forward_cancel causes the removal of 4 subscribers from the non-pbx channel’s topic and one of those 4 subscribers carries the callback endpoint_cache_clear which is needed to remove channel_ids from struct ast_endpoint

#17

I tried to stop the removal of subscribers from the non-pbx channel by adding a flag in ast_channel which stopped the channel_ids count from increasing forever but now the subscribers are not getting removed from the channel.
@jcolp based on the above information can you now suggest what should i be doing from preventing this issue.

#18

These are now really developer questions and you may get better results on the developer mailing list or IRC channel.

#19

As I stated before without really digging in and looking through everything, I don’t know. The issue itself is in queue to get looked at.

#20

I think i got the problem, so during bridging asterisk creates another channel (yanked channel or original channel) to transfer the state from initial channel (clone channel) to this newly created channel and later hangup the initial channel after swapping all the states between them.
Here our initial channel structure was created using function ast_channel_alloc_with_endpoint which populates the endpoint_forward field of the channel, this field contains information regarding endpoint topics (which contains the endpoint_cache_clear callback) but the new channel created during bridging is created using function ast_channel_alloc which doesn’t populates the endpoint_forward field of the channel and after masquerading when the initial channel hangs up the information in endpoint_forward field also dies with it.

i tried to swap the value of endpoint_forward field of the two channels in function channel_do_masquerade(dest, source) and everything seems to work fine.

i’ll test this change on heavy load and hope for the better results.