Hi everyone, I am hoping you can advise me on how to further troubleshoot this issue.
I’m running Asterisk 11 (FreePBX distribution) on a system with a TDM410E analogue card with two analogue trunks installed.
Every now and then, one of the trunks becomes permanently busy and the users can’t dial out. I’ve managed to reproduce the issue and it seems that the trunk is getting stuck in a ‘pre-ring’ state. I suspect the caller-id detection is getting confused somehow. I’m in the UK so CLI is presented after a line reversal but before the first ring, so this seems to correspond to the ‘pre-ring’ state. The problem appears to happen just after an incoming call is hung up, so the line reversal may be happening on call clear-down. The condition can be cleared by making another incoming call on the trunk. That seems to clear the condition, but sometimes it takes two attempts.
Can anyone suggest what might be going on here? Are there any settings I can tweak to mitigate the issue? Is there a way to get detailed logging of the caller-id detection and/or trunk state?
What version of DAHDI are you using? While a quick read of your problem doesn’t reconcile with my understanding of the problem, I wonder if you’re hitting a problem on the 2.6.x releases that is fixed in 2.7.0 with channels potentially getting stuck in red alarm.
This is causing us a few problems as the channel remains in this state until another in-bound call comes in on that channel. This also means that the CDR reports show the last call to be very long.
To submit a bug report, you will need much more evidence, in particular debug level traces from from chan_dahdi.c and possibly also from the kernel device driver.
You will also need to identify the hardware used and the network operator and the service they are providing, and preferably you will need to provide links to documentation from that network operator describing how the events in question are signalled (BT provide a BTR giving this information for their network, but note that it actually indicates that zero or more of several methods may be used).
Lines intended for business use are more likely to provide usable disconnect supervision than ones for domestic use, and ones in remote areas least likely.
Ultimately, though, any business requiring reliable supervision, where a human doesn’t clear the call, is likely to need an ISDN connection, these days.
Finally you need a rationale as to why the documentation for Asterisk indicates that things should work in the particular case, otherwise you have a feature request, and not a bug. Feature requests should go to the developer mailing list.
Once you have all the required evidence, you need to create a login at issues.asterisk.org/jira/ and submit a formal bug report, there.
In my case the 3 lines I’m talking about are business lines and I have not seen this issue before on earlier versions of Asterisk / DAHDI - We ran a much earlier version of Asterisk for 3 years and didn’t have this problem, but you are right in saying more information is need - I just don’t really know how to produce the logs that might help, the normal asterisk logs don’t seem to show anything in relation to this problem. Are debug logs available for DAHDI ?
Out of interest, I don’t think this issue is seen if you use the default DAHDI settings, but by doing so you will not get CLID detected on BT lines.
My non-default settings include:
cidsignalling=v23
cidstart=polarity
and
battthresh=4
I’m sure someone out there will have a much clearer understanding than me of the correct settings required in the UK for these newer versions of DAHDI and also the impact on DAHDI of using the above settings may have.
Trying to clear down a line when it’s in the ‘Pre-rin’ state by doing a ‘channel request hangup DAHDI/x-1’ has no effect, although it appears to try to do it - the channel just remains as it was. Removing the telephone line cable does clear the line.
Any ideas would be appreciated - otherwise I may have to revert back to my rather old system.
Regarding ISDN - I totally agree that the majority of businesses use them, but its also true that many smaller ones still have offices with multiple analogue lines.
Following a hardware failure I recently had to upgrade an Asterisk 1.4 Zaptel based system to a newer version (1.8.23.1 / DAHDI 2.7.0.1) and this also happens.
The analogue lines are business circuits in a rural village in the UK, connected to a British Telecom/Telent System X exchange.
some lines polarity reversal on answer/clearing and the disconnect clear signal (CPC/kewlstart), others provide disconnect clear only then NU tone - these settings can be changed but not all providers are aware of this - nor is it an easy or quick process doing this (the CP pass it to Openreach who have a two day turnaround!) in any case the technical specifications state that all Terminal Equipment (such as PABX’s) should be configurable to handle all of these.
The presence of 3 polarity reversals (one before CLID, one on answer and a final one on clearing) and disconnect clear signal (CPC/kewlstart) has always confused Asterisk to some extent, (for instance an outbound call is followed by Zaptel/DAHDI thinking the last PR is another incoming CLID). Trying to get Asterisk to make use of the polarity reversals doesn’t work either, and we have 3 lines and only two of ours do the polarity reversal anyway (but all 3 of them can lock up with this new version, so its not the polarity reversal causing it).
This latest problem appears to happen if an inbound caller clears between the pre-ring polarity reversal and the first ring, whatever is going down the line is confusing DAHDI into waiting for a ring that never arrives, hence the channel wedges in the PRE-RING (until the next inbound call (where there is a ring).
Unfortunately this is a common problem with automated sales calls where multiple outbound calls are placed to keep the agents busy and those which cannot be answered are simply abandoned.
The linecard in use here is not a Digium one but I think the problem may be in chan_dahdi.c rather than the line card drivers so is most likely to affect any analogue line card used with UK caller ID (or any other country where there is a polarity reversal then caller ID sent before ringing).
In practice, any serious predictive dialer user would have ISDN and wouldn’t have this problem.
kewlstart is an Asterisk term for a specifically Asterisk hack. I seem to remember it is because Asterisk cannot handle earth start lines, which would normally be used for PABXes.
There are strict limits on the number of silent calls from predictive dialers in the UK.
As I recall, the single line BTR doesn’t guarantee that there will be any disconnect supervision.
[quote=“david55”]In practice, any serious predictive dialer user would have ISDN and wouldn’t have this problem.
kewlstart is an Asterisk term for a specifically Asterisk hack. I seem to remember it is because Asterisk cannot handle earth start lines, which would normally be used for PABXes.
There are strict limits on the number of silent calls from predictive dialers in the UK.
[/quote]
This is true (the ETSI standards for Europe state these limits in many countries) - but many marketing companies now dial from overseas to get past the regulations (especially since Asterisk being used by home users made it very easy to log and record calls!, and CDRs have been used to make claims against UK based telemarketers)
The impact I am describing here is to those receiving such calls on Asterisk systems connected to analogue circuits. Very few humans, even if they realise they have inadvertantly dialled a wrong number (unless they have very quick reactions) are going to cause this particular problem.
I’m cynical enough to believe that the organisations doing this (some of whom are criminal scammers) are well aware of their target countries having pre-ring caller ID (its used elsewhere in Europe, often without a polarity reversal and DTMF digits being sent before the call), and that their engineers could if desired program their predictive diallers to clear within a few miliseconds before CLID can be received, as doing so would make it even harder for the end user to identify the silent calls (an excessive amount of these is illegal just about everywhere in the EU) and/or report them to the Communications Ministry or other authorities.
There are a number of long standing issues about how CLID is detected in chan_dadhi.c - as using the polarity reversal as a “pre-ring” marker isn’t the best way anyway - there is also a marker tone preceeding UK CLID which could be used instead, and this code could also be adapted for some other European countries where pre-ring CLID is sent without the polarity reversal.
Just a point from my side is that the calls that cause this issue are not from automated call centres they are just ordinary calls that seemingly end in the normal way, but Asterisk seems to determine that the polarity reversal at the end of the call is actually the start of a new call and then sits there in this ‘pre-rin’ state until a real call comes in. For this new real call the CLID will not be detected (because Dahdi has already gone past that part of its script).
I would say that this ‘fault’ occurs on at least 1 in 5 in-bound calls and maybe in as much as 1 in 3.
I’m not a software developer, but simplistically it would seem that if Dahdi waited for say 2 seconds after the end of a call before it was allowed to start sensing a new call (or polarity switch) this issue would not occur. You might lose CLID on the next in-bound call if that call started during the 2 second wait, but this situation is unlikely to occur very often and certainly less than this current issue happens.
For the record my asterisk logs show:
At the end of a call:
………
[2013-11-06 13:47:22] VERBOSE[2384][C-000000b3] app_macro.c: == Spawn extension (macro-dialout-trunk, s, 22) exited non-zero on ‘SIP/205-0000028d’ in macro ‘dialout-trunk’
[2013-11-06 13:47:22] VERBOSE[2384][C-000000b3] pbx.c: == Spawn extension (from-internal, 01245216050, 5) exited non-zero on ‘SIP/205-0000028d’
[2013-11-06 13:47:22] VERBOSE[2385][C-000000b3] app_mixmonitor.c: == MixMonitor close filestream (mixed)
[2013-11-06 13:47:22] VERBOSE[2385][C-000000b3] app_mixmonitor.c: == End MixMonitor Recording SIP/205-0000028d
[2013-11-06 13:47:23] VERBOSE[7557][C-000000b4] sig_analog.c: == Starting post polarity CID detection on channel 1
[2013-11-06 13:47:23] VERBOSE[2386][C-000000b4] sig_analog.c: – Starting simple switch on ‘DAHDI/1-1’
When the next call comes in on that channel:
[2013-11-06 15:15:31] NOTICE[3874][C-000000c7] chan_dahdi.c: Got event 17 (Polarity Reversal)…
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Detected ring pattern: 0,0,0
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Checking 0,0,0
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Ring pattern check range: 10
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Ring pattern matched in range: -10 to 10
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Ring pattern check range: 10
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Ring pattern matched in range: -10 to 10
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Ring pattern check range: 10
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Ring pattern matched in range: -10 to 10
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] chan_dahdi.c: – Distinctive Ring matched context from-analog
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:1] NoOp(“DAHDI/1-1”, "Entering from-dahdi with DID == ") in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:2] Ringing(“DAHDI/1-1”, “”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:3] Set(“DAHDI/1-1”, “DID=s”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:4] NoOp(“DAHDI/1-1”, “DID is now s”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:5] GotoIf(“DAHDI/1-1”, “1?dahdiok:checkzap”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Goto (from-analog,s,9)
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:9] NoOp(“DAHDI/1-1”, “Is a DAHDi Channel”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:10] Set(“DAHDI/1-1”, “CHAN=1-1”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:11] Set(“DAHDI/1-1”, “CHAN=1”) in new stack
[2013-11-06 15:15:31] VERBOSE[3874][C-000000c7] pbx.c: – Executing [s@from-analog:12] Macro(“DAHDI/1-1”, “from-dahdi-1,s,1”) in new stack
The first one with BT business lines with both CLID and polarity reversal on answer/clear causing a false “pre ring state” after the call has cleared has been around for a while, it happened with the old Zaptel. it is more a minor annoyance (it does hold the channel for about 8-10 seconds more than it should).
This second one which now occurs if an inbound call is cleared very quickly after it arrives is more serious as it prevents that channel being used for outbound calls until an inbound call arrives and clears the wedged state. I have managed to recreate it by dialling these lines from my mobile phone, so there are chances that a “human” call can also cause it although predictive dialler calls are far more likely to cause this issue due to they way some are placed and abruptly cleared due to lack of operators at the call centre making them.
As a workaround I added a VOIP trunk to the outbound routes for outside calls so they can always be made but as VOIP here is still less resilient its not an ideal solution. This installation in in a healthcare facility where 999 is dialled fairly regularly, Thankfully we have a very good VOIP provider who also has full E999 setup so our CLID and address should still be passed correctly) and I have also installed a directly connected analogue telephone on a totally separate line available for use in emergencies if a call via the PABX will not complete.
I wonder if this issue is occuring in other countries, as the UK is not the only nation to use caller ID before ringing, and ISDN isn’t as widespread as many may think especially for smaller businesses outside cities and large towns.
On my system (DAHDI 2.7.0) the in-bound calls are just normal calls (not of a short duration) and the ‘pre-rin’ status does not clear after a few seconds, it continues until the next in-bound call on that line comes in.
You can not make an outbound call on that channel until it is cleared down. To do this you have to reboot or remove the telephone cable.
The problem did not occur on DAHDI 2.2.0.2-6 on the same telephone lines.
Keeping the Asterisk version the same, do you know specifically what version of DAHDI started showing the problem for you? If you’ve isolated it a specific version of DAHDI then it should be something that I can address.
I’ve downgraded this installation to DAHDI 2.6.2 and still get the issue (the channel can wedge in either PRE-RING or RING state).
On a related note, I remember some versions of DAHDI were supposed to have alarm on analogue channels, which were monitored for battery and if this was removed for more than a certain time (longer than a disconnect clear signal or any other short loss of line current caused by automated testing) then the channel went to alarm
This makes sense, as there is no point signalling to a line which is dis. But this feature appears to have been removed at some point perhaps due to “false alarms” (not just in UK use but elswhere) (in spite of there supposedly being a guard timer),
Another oddity is that whenever a British phone line is disconnected and connected again the voltage transient caused by this is treated as a “pre-ring” state (tying up the channel again until the CLID times out).
So it is this part of the code which is causing problems, but although CLID is a new(ish) feature on analogue lines it is a useful one and not all businesses want to use VOIP for business critical calls but are too small to afford ISDN (which isn’t always available everywhere)
I think its a bit more complex than that - not everyone uses Digium hardware anyway, and those fixing it need to not just actually be in the UK but in different areas (or have full access by remote to a system connected to British lines) - there are also two main types of Telephone Exchange in common use (Telent System X and Ericsson AXE10 (System Y) with very subtle differences in signalling timing.
In any case I don’t think line card drivers are the issue - I’ve had a cursory look over the kernel level code for wctdm and theoretically any kind of signalling should be passed to chan_dahdi.c and it is at this point where the code may need inspection.
These problems (and related ones in other countries using this sort of signalling) have been logged before and corrected in various versions of Asterisk but it seems that other updates undo them, or the “solution” has been to switch off caller ID detection or decide that the country requesting it is too small a market to solve the issue. There have been various other methods suggested on UK blogs of correcting these issues and locally applied, but they have not made it into the official code and therefore the patches become outdated or hard to apply.
If I had the DSP/coding skills I’d have a go at fixing it myself but unfortunately I do not. What I can do is set up a test system connected to a business circuit on System X and set it to gather whatever debug info is required, and then share it in a suitable common area with those who can perhaps fix the code.
I am also going to look over the Dutch asterisk websites to see how / if they ever solved the issue of getting CLID on their analogue circuits as on many of them this is sent pre-ring without a polarity reverse (so maybe a combination of this detection but using v23 FSK instead of DTMF to get the number might work?)