Approach to Recovering SIP Sessions After an Asterisk Crash

Hi everyone,

I wanted to share an approach I’ve been exploring in VoIPBin that might be helpful to others working with Asterisk and SIP.

VoIPBin(https://voipbin.net) is an open-source CPaaS (Communications Platform as a Service) project I’ve been developing on my own. It supports voice, SMS, email, and AI features — but in this post, I’d like to focus on voice and a problem that has been on my mind for quite some time: recovering SIP sessions after an Asterisk crash.

This is not a new problem; it’s something I’ve been considering for years. After many attempts and quite a bit of trial and error, I’ve arrived at a method that seems to work in certain cases. Of course, this may not be the definitive solution, and there might be better, more robust methods out there already. If anyone knows of better approaches, I would be very grateful to learn about them.

The Problem
When Asterisk crashes during an active call, all SIP dialogs immediately disappear. From the client’s perspective, the session vanishes without any BYE or cleanup signals, which results in confusing call drops and broken media streams.

The Goal
To restore the SIP session so that the client perceives it as a continuation of the original call rather than a new one.

The Approach

  • Crash detection: VoIPBin’s sentinel-manager detects the crash.
  • Session lookup: Active call sessions handled by the crashed instance are queried from the database.
  • SIP info retrieval: Using HOMER, recent signaling data is analyzed to extract SIP headers (Call-ID, tags, etc.).
  • New SIP channel creation: A new SIP channel is created on a healthy Asterisk instance.
  • Set recovery variables: Channel variables override From/To URIs, tags, Call-ID, and CSeq to match the original dialog.
  • Send recovery INVITE: This INVITE acts like a re-INVITE, using the original dialog identifiers.
  • Session resumes: If the client accepts, the SIP session is resumed.
  • Flow continues: VoIPBin resumes the call flow, whether it’s a direct call or conference.

To enable this, I patched the PJSIP stack in Asterisk to allow channel variables to override key SIP headers. The patch is available here:
:backhand_index_pointing_right: etc/asterisk/add_pjsip_recovery.patch at main · voipbin/etc · GitHub

A more detailed explanation and demo can be found here:
:backhand_index_pointing_right: Architecture — voipbin documentation

Limitations

  • This method relies entirely on the SIP endpoint (UAC/UAS) supporting mid-dialog re-INVITE.
  • Some clients — Linphone, for example — reject such re-INVITEs, so recovery does not work in those cases.
  • If the endpoint refuses the recovery INVITE, the session cannot be restored.

Closing Thoughts
This is definitely not a perfect solution, and I’m not sure how well it will work beyond the limited environments I’ve tested. I’m sharing it simply in case it provides someone else with a useful idea or a direction to explore further. If there are better or more robust solutions out there, I’d be eager to learn from them.

Thanks for reading,

Sungtae

Asterisk #VoIP #SIP #CrashRecovery #CPaaS #OpenSource #VoIPBin

1 Like

Hi @pchero, thank you for sharing this.

I’ve been pursuing and been very vocal about crash recovery in Asterisk for a long time. I brought it up by several AstriDevCon meetings and many time on the forums.

In 2018 Matt Jordan demoed how he was able to migrate a live ConfBridge from one Asterisk server to another. All callers were PSTN callers via a SIP Trunk and they were migrated over different WAN addresses.

Matt’s demo here:

Back in 2021 I reached out to Matt on Twitter/X and he explained in detail how he was able to pull that off.
Unfortunately, Matt has deleted his account and that thread is gone. :frowning:

But if memory serves well, he explained the following.

Asterisk’s internal plumbing is built in a way that it can be made aware of what is happening in other systems, the PJSIP driver makes this even easier.
(Side note, Asterisk can read/publish hints between systems just fine, it can also read dialplan from remote systems)

He did explain some technical challenges and how he tackled them, which, unfortunately, I don’t remember.

However, I do remember him writing that if he had to rebuild it from scratch, he’d probably use something like redis ro keep track of the sessions and streams, which seems similar to the approach you took.

But I personally think that, the less “side” applications you need to make it happen, the less headaches you’ll have. Meaning, the more you can get of this functionality directly into Asterisk the smoother it’ll be. But I’m probably wrong on this.

Regardless, this has been a decade long dream of mine, I really hope to see an official patch merged into the project.

Also, I didn’t get my hands too dirty yet with ARI other then simple ARI/AI Realtime applications, but I’m curious if two ARI sidecars will make it easier to exchange the data happening inside of Asterisk, especially with the recent changes to WebSockets in Asterisk.

Thank you again every much :folded_hands:

Hi @PitzKey,

Thank you so much for your detailed and thoughtful response!

I really appreciate you sharing Matt Jordan’s work from AstriCon 2018. That live ConfBridge migration sounds like an impressive achievement and very relevant to this challenging problem. It’s a pity that the Twitter thread is gone, but it seems like using Redis or a similar store to track sessions and streams is similar to the concept I’ve explored in VoIPBin.

In fact, the key point in my approach is adding an additional management layer on top of Asterisk’s channel layer. In VoIPBin’s case, we call it the Call layer. This layer extracts and manages session information outside of Asterisk. This effectively creates a kind of stateless architecture at the Asterisk level.

Because of this design, the method can only handle simple SIP session recovery while restoring the dialplan execution state requires a different approach. That’s why VoIPBin heavily relies on ARI for managing dialplan-level logic and flow continuity.

Regarding ARI, I’ve also explored it mostly for simple real-time applications so far. Your idea of leveraging multiple ARI sidecars to synchronize internal state between Asterisk instances sounds very promising. Especially with recent WebSocket improvements, this could enable more seamless failover or state sharing. It might even help implement a distributed SIP session manager, though it would require careful handling of timing and race conditions.

I’m glad to see others who care deeply about this problem, even though I’m still learning and contributing only a little to the community. I hope that we’ll see more official support for crash recovery in Asterisk itself. Meanwhile, I’ll keep working on improving VoIPBin’s approach and would be very happy to exchange ideas with you or anyone interested.

Thanks again for your kind words and insights.

Kind regards,
Sungtae (pchero)

What phone brands have you tested with this?

Adding to this, that if you have a mid registrar (or perhaps even a stateful Kamailio edge) then this idea should work regardless if the SIP client at the other side of the “SBC” supports reinvites.

I tested this using Telnyx’s SIP trunk service, and it works regardless of whether Kamailio is configured as stateful or stateless.

The key factor is whether the endpoint supports re-INVITE.

Telnyx uses Kamailio :smiling_face_with_sunglasses:

Regardless, I was referring as a Kamailio between Asterisk and IP Phones, not necessarily Trunks. Since most Trunking providers support these kind of re-invites.

What is the topology of all this? Telnyx → Kamailio → Asterisk → Phone? How is all this laid out because since Asterisk is a B2AU you would need to move both channels, re-establish them on the other Asterisk instance.

Hi @BlazeStudios,

Thank you for the great question.

In VoIPBin, the overall topology is slightly different from the typical Asterisk B2BUA structure.
Rather than acting as a full B2BUA that manages both SIP legs, Asterisk behaves more like a trunk node responsible for media processing and logic execution.
The full signaling path looks like this:

Telnyx ↔ Kamailio ↔ Asterisk ↔ Kamailio ↔ Phone

This means the endpoint (Phone) does not connect directly to Asterisk. Instead, it connects indirectly through Kamailio.
Because Kamailio maintains the signaling path and SIP session state, it preserves the session (Call-ID, From and To tags, etc.) even if Asterisk crashes or restarts.

This architecture allows the system to attempt session recovery or continuation even after replacing or restarting the Asterisk instance.


VoIPBin’s Call Recovery Logic

In this architecture, VoIPBin performs call recovery through the following steps:

  1. The VoIPBin backend stores the state of every call session externally, typically in Redis or a similar store.
    This includes Asterisk channel UUIDs, SIP session identifiers, participant data, and the current flow execution state.
  2. When an Asterisk instance crashes or becomes unavailable, VoIPBin detects the failure, removes the affected instance from the pool, and assigns a new healthy Asterisk instance.
  3. Once the new Asterisk is assigned, VoIPBin recreates the A leg channel using the stored session information.

Example Scenario

Suppose a user is on an active call between A leg and B leg.
If the Asterisk instance that was handling the A leg crashes, VoIPBin proceeds with recovery as follows:

  1. Using the stored session information, VoIPBin recreates the A leg channel on a new Asterisk instance.
  2. VoIPBin checks the last executed flow for the A leg.
    In this case, the most recent flow step was likely joining a Bridge that connects the A leg with the B leg.
  3. VoIPBin creates a new channel for the A leg and connects it to the same Bridge where the B leg is still present.
  4. A key detail here is that the Bridge is running on a different Asterisk instance, so it is unaffected by the crash.
    As a result, the call can resume by simply rejoining the A leg to the existing Bridge.

This allows recovery when only one side or both sides of the call have been disrupted.


Distinguishing SIP Session Recovery from Application State Recovery

It is important to understand that this is not just SIP session recovery.
The actual logic of the call, including what step it was in and how the channels were connected, is stored and managed outside of Asterisk.

While Kamailio maintains the signaling session, VoIPBin manages the logic and state of the call using its own flow system.
This separation is what makes recovery possible, even if one media server fails.


In summary, VoIPBin enables reliable call recovery by managing session state and call flow externally, without relying entirely on Asterisk for continuity.
If you are interested, please review this document and other sections on how VoIPbin works.