Avoiding split-brain scenarios for security solutions

When one half of your security solution is not aware of what the other half is doing.

data science certification brain with data
Thinkstock

When architecting a fault tolerant or high availability solution, one of the conditions that stymie designers is a split-brain configuration.

If you’re not familiar with the terminology, it essentially happens when two or more resources are supposed to be synchronized but somehow loses referential integrity, operate independently, and begin storing and processing information without synchronizing content. This leads to a split-brain configuration where the data is different, not easily reconcilable, and none of the resources are the record of authority.

Recovering from a split-brain scenario is typically difficult. It requires the merging of database records from transaction logs or making a judgment call to lose some information while the solution is recovered and placed back into a high availability state. Neither of which is desirable to the end user and unfortunately a potential flaw in the vendor’s design or client’s implementation. This leads us to our story, client and vendor security; how to avoid Split-Brain scenarios for security solutions.

Minimizing split-brain problems

The first problem we need to address is when a Split-Brain scenario can occur. We do not typically think of security solutions as Tier 1 applications but modern implementations of Identity and Access Management (IAM), Privileged Access Management (PAM), Firewalls, Intrusion Prevention System (IPS) etc., have all become just as important any other Information Technology (IT) resource.

If they fail open, then threats can circumvent your defenses and potentially own the environment. If they fail closed, then environments can experience an unexpected outage disrupting operations. Neither scenario is acceptable. To that end, these technologies have been elevated to Tier 1 status and must be operational all the time, with minimal to no downtime, and have the infrastructure in place to stay fault tolerant and highly available. It is important to note that fault tolerance and high availability have different requirements in a deployment.

Therefore, as a Tier 1 service, multiple databases may be needed, disaster recovery may require special considerations, and a single point of failure from a service to network switch cannot be the reason for an outage. These must be designed in as a part of the solution and any technology deployed immune from a Split-Brain condition.

The second problem to consider when minimizing the risk of a Split-Brain scenario is the enterprise readiness of resources. While this may seem like a “can of very expensive worms,” choosing a database design or other hardware based on a third-party vendor with no formal support is not the best decision for a Tier 1 application.

For example, with fault tolerance, if you are dependent on a server with a single power supply then you obviously have a single point of failure. Realistically, it is cost prohibitive to cover every use case, but this is why server manufacturers typically put dual power supplies in servers. In addition, for high availability, if the design does use multiple database instances for replication, which is normally the root cause of split brain scenarios outside of solutions that use file replication, why would you consider an application that does not natively support recovery, backup, and reconciliation and have technical support in case of a critical situation?

To be blunt, many open source databases are not ideal for Tier 1 applications because of these limitations unless someone can truly provide technical support to match or you have the in-house expertise to manage the requirements. This is where a vendor’s design and promises typically to exceed the real-world expectations of the client. While a single fault can introduce a Split-Brain scenario, designs should consider use cases for both (fault tolerance and high availability) that can lead to this predicament. The trick is the balance between both, and the cost and resiliency with the underlying technology and its support.

Ultimately secure

Finally, security solutions themselves need to be secure from potential threats. This means that data at rest, in a lab, and live in operations needs to be protected. While this may seem outside of a Split-Brain scenario, let’s explore how it is extremely relevant. Consider a backup of a Tier 1 database.

For many solutions, the restoration of the database may cause operational issues if runtime changes stored in the database effect business continuity. For example, consider an enterprise-ready password management solution. The backup is a snapshot in time of all the current passwords and as the backup ages, it differs from credentials used in production. If a full restore is performed, stored passwords would not be equal operational values and you have a Split-Brain problem that needs to be reconciled.

Typically, password managers rectify this problem by changing all the passwords, so they are now in sync, but the backup reflects the problem. This implies that databases used in Tier 1 applications cannot be simple backups or they can have a split-brain problem if a restoration is required. They need to replicate in real time, backup in real time, and be dependent on time-based versioning to avoid this problem. And, if the data is used for anything else, its encryption, protection, and prevention from misuse are considered so that static copies, in case of a vulnerability, do not provide a source of data for threat actors.

If you need an example of this, think of the breach that occurred against Uber. The production data compromised was out of date but still had plenty of relevant information that was a high risk for consumers. Yes, it was a backup database, but it was not properly protected and all data from a tier one application should be protected regardless of utilization. While this is not a Split-Brain problem, detaching a production tier one database and testing or recovering it can certainly create one.

Failures in your highly availability security solution can lead to split brains

As with any story, there is a moral. When considering tier 1 security applications, consider the use cases that will create split-brain scenarios. They are undesirable, and the architecture and technology choices made by your business, and the supplying vendor, need to avoid these problems. This is where the expectations of the client and the vendor also create a split-brain problem and the expectations need to be analyzed up front before a real production problem occurs.

Copyright © 2018 IDG Communications, Inc.

How to choose a SIEM solution: 11 key features and considerations