Article Original Creation Date: 2010-12-10
Overview
When the ACS tickles the device, the device challenges the ACS. We can follow this as two log lines in the ACS-api.log file (one for the tickle, one for the challenge).
However, when there are subsequent challenges, this is not logged anymore, the only thing we observe is a time-out error in that file, after 5s (our tickle time-out value).
We need an easy way of discriminating between an off-line modem, and one that refuses (always) the ACS, today both are logged as time-out. Only sniffing allows us to find out what really happens.
Is there a workaround for obtaining this information in our actual version? Do we have a description of this retry process? Does the ACS retry authenticating until the time-out passes or only a fixed number of times? Is that configurable? Logged somewhere?
Environment
- Solaris 10
- Oracle 10
- Tomcat 5.25
- SG4.0.12
- WL 9.2 MP1
Root Cause
The ACS will retry a failed connection request at most one time. A single retry is turned on via the CONNECTION_REQUEST_RETRY_TIMEOUT property in ACS.properties. This property sets the wait time between attempts.
During an attempt, we first make an unauthenticated ConnectionRequest to the device, with a timeout taken from the CONNECTION_REQUEST_TIMEOUT property in ACS.properties.
We then expect a 401 response with the nonce, auth method, etc, to which we can reply with auth credentials. We do so with a 2-second timeout (not configurable). This 2-second timeout explains why the timeout in the logs you attached occurs after 2.3 seconds, and not 5 seconds.
You should certainly see the 401 response in the logs when auth is attempted:
HTTP GET to CPE device was not successful with authentication, code 401
This should be a clear distinction between an auth challenge and a timeout (device not online). See the log snippet I just attached for the full context. This log includes a second attempt after setting CONNECTION_REQUEST_RETRY_TIMEOUT to 400.
Resolution
It is about the ConnectionRequestPassword and ConnectionRequestUsername setup in a device that provides problems with certain device brands firmware and the logging messages could be more clear, to better determine between an offline device and one that simply refuses these parameter settings.
The solution, for now, there seems a quirk in the Jakarta HTTP client library that causes behavior that makes it difficult to distinguish between an offline device and an auth challenge, but by looking at the logs makes it possible:
Both cases will start with a: "Trying an HTTP GET" log message, as in the attached example, but if the device is offline, the second line "auth challenge header is..." will not be seen.
If the authentication is successful, the message will be "HTTP GET to CPE device was successful with authentication". You should, therefore, be able to grep for "auth challenge header is" and generate a list of IP addresses, then subtract those IP addresses that appear in the "was successful with authentication" log message, leaving you with the list of devices that have incorrect credentials.
A fix for this issue has been targeted for the SG4.2.1 release.