Failed to authenticate user - brain bender

This past week I’ve upgraded a client from version 1.2.1 to 1.4.20 (from source on a 9x version of SuSE) I’ve got Humpty Dumpty glued back together except for a strange flavor of the the “Failed to authenticate user” error. I’ve scoured about every reference to the error that I can find on both the UG’s and Google to no avail. There are quit a few references to the error, but nothing that matches all the data points of my problem…

Topology: Asterisk (“pbx”) is on a public IP. It has ~40 line assignments spread across ~12 physical clients. The client devices range from Polycom 500’s to Grandstream ATA’s and base units, as well as some soft phones for Road Warrior-ing. The majority of the phones are on a “typical” intranet behind an exposed router.

This may sound strange at first, but bear with me: I’ve “looped back” a connection from the Intranet (LAN) into a secondary NIC on the PBX server. It has the requisite routing table entries and works like a charm. As far as Asterisk is concerned the LAN resources are not NAT’d. “Reinvite” is not allowed on any of the devices, so Asterisk is unknowingly working as a bridge. (Before this raises any red flags, be aware that I’ve run/tested the server both WITH and WITHOUT the “LoopBack cable” and get identical Errors in both cases – even after making the expected nat-related parameter changes in the config files.)

So, here we are:

  • incoming calls DO work

  • outgoing calls DO work

  • extension-to-extension calls work SOMETIMES:
    1) origination from a soft phone (SJ for example) WORKS
    2) originated by dialing the ext followed by send WORKS
    3) originated by picking up the handset and then dialing DOES NOT work
    ***A couple of threads claim this latter is a Polycom issue. it might be, however the same thing also happens when using the Grandstreams. The ATA devices and Softphones have only one dialing “mode” so I can’t really draw too many conclusions on their role.

  • attended transfers DO NOT work

  • blind transfers DO work

  • (my personal favorite) calls can transferred into and out of the Parking Lot without any problems

All failures listed above produce a FAST-BUSY after triggering an error of the same type:

‘SIP/14108786545-081bcbf8’
[Jun 2 12:26:01] NOTICE[10759]: chan_sip.c:13969 handle_request_invite:
Failed to authenticate user "Air Works Sales"
sip:202@192.168.1.50;tag=403699EA-41AE073D
^
±------LAN IP assignment for the “LoopBack” cable

If I disconnect my LoopBack cable and make the appropriate NAT changes in sip.cfg, the IP designated above changes to reflect the public IP assigned to the LAN router.

==============
Sip.conf snippet:

[general]
port=5060
bindaddr=0.0.0.0
srvlookup=yes
disallow=all
allow=ulaw
allow=alaw
context=from-vonage
subscribecontext=ext-local-hints
limitonpeer = yes

register => << vonage credentials >>

[vonage.net] …
[vonage-in] …

def_tmpl
type=friend
nat=no
host=dynamic
progressinband=no
context=sip-extensions
callerid="The Air Works"
dtmfmode=rfc2833
canreinvite=no
call-limit=10
qualify=yes
insecure=port,invite

<< clipped road warrior/soft phone devices >>

101
secret=foo
fromuser=101

102
secret=bar
fromuser=102

etc, etc…

=============
for what it’s worth, here’s a sample from the dialplan:

exten => 101,1,Dial(SIP/101) ;Extension
exten => 201,1,Dial(SIP/201) ;Sales
exten => 301,1,Dial(SIP/301) ;Service
exten => 102,1,Dial(SIP/102) ;Extension
exten => 202,1,Dial(SIP/202) ;Sales
exten => 302,1,Dial(SIP/302) ;Service
exten => 103,1,Dial(SIP/103) ;Extension
exten => 203,1,Dial(SIP/203) ;Sales
exten => 303,1,Dial(SIP/303) ;Service
exten => 104,1,Dial(SIP/104) ;Extension
exten => 204,1,Dial(SIP/204) ;Sales
exten => 304,1,Dial(SIP/304) ;Service

each physical phone unit has three lines registered as a bundle:
x01 = primary line
x02 = sales line
x03 = service service
This strategy benefits the station+queue algorithms, but unless I’m missing something, I can’t see that it has any impact on producing the errors I’ve indicated.

In case it makes any different, I’m running the exact same configuration now as under 1.2.1 – and anything worked fine then. (taking into consideration the modifications due to deprecated features/settings between the two versions)

Anyone got any insights? Thanks in advance.

The pickup and dial problem is easy with the polycoms. You need to create a digitmap for your dial plan in your system.

The reason you’re getting a fast busy, is that the phone itself doesn’t have enough information to complete the dialing event. For the most part, it’s not even trying to place the call.

The default digitmap will look something like this:

This is a typical dial plan for someone who’s using the phone in a north american numbering area.

You need to change it to match your dial plan. This will include any feature codes, outside access codes, extensions, etc…

So, the simple dial plan you’ll need will look something like this:

This dial plan digitmap will give you three digit dialing to any extension, and 9+international, 9+10 digits, 9+0 (operator), and 9+service codes like 411 or 911. The extensions have to start with 1, 2 or 3, and can be any number other than that. The comma after the 9 restarts dial tone, so the phone sounds like it’s giving you “outside” dial tone. You can remove it if you like.

Put that string in the sip file that the polycoms load up, and it should work for extension to extension dialing. You’ll probably need to edit it a bit for things like the access number to voicemail, feature codes, call parking lots, conference room numbers, etc…

All sip devices have something like this. Check your admin manual for details about dialing, and dial plans.

Sounds simple enough. Before I start adjusting things (or making the employees utilize different calling patterns – name 9+ dialing) I’ve got a couple of questions…

Everything was running smoothly while on v1.2.x The bumps started with the version upgrade. I’ve confirmed that the digitmap strings within the Poly’s haven’t been changed, which leads me to the conclusion that 1.4.x is now handling them differently. If I remember correctly a Gateway should have the ability to either handle digit-mapping internally OR force the call-agent to handle them. (such as in the case of capturing phone-card data). Can you tell me if v1.4 changed its stance on digitmap handling, and if so, is there a setting within Asterisk that can revert it back to 1.2x behavior (which I assume is “internal” processing)?

Thanks.

I can’t say without examining your environment more. But this is my best guess.

What you may have done is used 404 response. Some phones can be set to use that. Essentially, it means that every time you press a digit, an invite is sent. If the digits don’t match anything (extension or feature) in the dial plan, the Asterisk box sends back a 404 response. (not found)

As you keep dialing, the phone keeps sending more digits to the Asterisk box. ie:
3
404
30
404
301
ringing 301…

It makes a lot of invite traffic. A lot of which is considered trash, but it allows you to keep the dial plan in the Asterisk box. As soon as you dial something that matches the dial plan, the call proceeds.

It’s possible you were doing things that way, and the 404 response has been broken (or changed) in the new version. I really can’t say for sure.