IAX2 Hairpin + music on hold = thread deadlock

I don’t really need help on this issue but am posting some results of an issue that we just spent some time trouble shooting to see if anyone else has run into this problem, or to help someone who does have the problem. This may actually need to be a bug submit as well.

Our setup: We are using Asterisk SVN-branch-1.6.0-r124836. (We also had this problem with 1.6-beta9 and another SVN branch between what we have now and beta9)
We are using asterisk as a soft switch for a small telco company and are still going through the testing / tweaking phases. Our customers are using ATA devices all using SIP and g711u.

Issue 1: we were not able to get the cdr records to fork properly for us so that if one of our customers called another one of our customers a cdr record would be recorded for both customers.
Solution: (This is not the best solution and i am open for ideas, as this solution led to issue 2.) We do a check to see if the callee is local, if so we send the call to an IAX2 channel that hair pins back into the system into our Inbound context. (example: caller’s SIP ATA device --> outbound context --> IAX out to 127.0.0.1 --> IAX call in from 127.0.0.1 --> inbound context --> callee’s SIP ATA device) What this does is give us a full cdr record for each customer… plus some extra garbage cdr for the iax. But at least we have the cdr for each customer this way.

Issue 2: When someone calls local and goes through the IAX hairpin; if they try to do a 3 way call; when they flash the line (putting the first channel on hold) the system goes into a 100% cpu utilization state for the asterisk process. And the person who was put on hold seems to go into limbo and can not be bridged into the 3 way. The owner of the 3 way call does not even need to initiate the 3rd channel because as soon as they put the 1 channel on hold the system goes into deadlock Even after the person on hold hangs up the system stays at 100% cpu.

After much debugging and trouble shooting we found that when this happened the IAX module would spawn 100 extra threads. So when we did an “iax2 show threads” the “stats” always said 110 of 10 threads accounted for with 0 dynamic threads. Granted when the system was in this state the output from the cli becomes very sketchy as the system as in a deadlock and cpu is maxed. This was confirmed by using gdb to debug the threads.

Quick Solution: Do not load res_musiconhold.so.
How it was tested: After a "kill -9 restart the process and unload res_musiconhold.so. Do a local to local call and initiate the 3 way call, everything worked, all parties could hear each other and after hanging up all channels were destroyed properly. Enable res_musiconhold.so do the same call and the system goes into a 100% cpu state and the person on hold gets stuck in limbo. And asterisk then has to be shutdown via kill -9.

I am looking for comments or ideas on a way to avoid this hairpin to get the proper cdrs. But this thread deadlock does seem to be a rather interesting issue. Since we are using this for telco we do not need the music on hold functionality and it would just be an extra load on the system, so our solution works for us, but there clearly is a bug here.

Questions, ideas, comments?
J