Design containment into technology implementations

This is a review and assessment of an actual incident. 

Not in the too distant past, an on-call security analyst, Tim, received a call from Customer Support.  It seemed all tablet PCs at all satellite facilities had failed.  This was a significant business continuity event since the users of these devices played a critical role in each of the over 200 locations affected.  Tim was called because the salient symptom was an inability to connect to resources on the company network.

Tim immediately checked the domain account shared by all the failing tablets.  It was locked out.  A user at one of the facilities had exceeded the allowable number of failed login attempts for the tablet’s Active Directory auto-login account.  Once the account was locked out, and since it was used by all company tablets, the account (and the tablets) began to fail across the enterprise. 

Tim reset the account and scheduled a root cause analysis

During the analysis, the team identified two issues.  First, the failed logins occurred due to an unusual event on a tablet.  To avoid going into too much detail, I’ll just say that an operating system anomaly caused logins to fail.  Engineering took this as remediation item.

The second issue was the ability of a single login failure to take down every tablet in every facility.  In my opinion, this was a bigger issue than the failed OS process.

The tablets are dedicated to a specific purpose and run a special application shared by several employees at each location, employees without network accounts.  So the tablets auto-login to Active Directory when powered on or restarted using a “service account”.  It was this service account that failed.  When the tablet failed to connect the first time, the user simply continued to restart it hoping for different results.  (The users do not have the password.  The account authenticates automatically when the application is launched.)

When the tablets were initially rolled to the facilities, the design team sought to keep management simple.  This included using a single account for network access, with a random 20 character password known only to Security.  It was also decided to not set the account to never lock out so brute force attacks—a successful brute force attack would have allowed an attacker to take control of any or all tablets—would fail.  What seemed like a good idea at the time missed one important consideration, containment of the effects of a locked account.

It’s easy to miss these types of design issues.  Teams often neglect designing system continuity into new implementations.  If someone had asked if there were risks associated with using a single account, beyond the obvious potential security vulnerabilities, the account lockout scenario would probably have come up. 

The root cause analysis process resulted in the following actions:

  • Resolve the operating system anomaly.
  • Assign tablet service accounts by location instead of one account for the entire tablet population.  If an account is locked out in the future, only one facility will be affected.  The team is still hoping the never lockout setting is not necessary.
  • Initiate continuity design training for technical design teams, similar to existing security awareness training.  This will start people thinking about how to mitigate continuity event scenarios beyond catastrophic events.

Related:

Copyright © 2009 IDG Communications, Inc.

Make your voice heard. Share your experience in CSO's Security Priorities Study.