A year ago—perhaps a bit more—big data was just starting to take its place among the industry's most-used buzz terms. Today everyone talks about it as a potentially powerful piece of enterprise security. But there are still plenty of practitioners struggling to get the concept, much as they struggled to figure out cloud security a few years ago.
But Preston Wood, Zions Bancorporation's CISO and executive VP of security, finds it puzzling that so many find big data such a struggle.
[Also read The security risks and rewards of big data<]
He's been using big data, by one name or another, to bolster his security program for decades. In recent years, Wood and his team have embarked on major overhauls to their program to better process data that moves more freely and quickly in and out of the network. By adopting such tools as Hadoop, they've greatly increased the amount of data they can analyze at one time. And they've figured out how to do it in something close to real time, cutting it down from the full-day task of the past.
This is the story of how Zions pulled it off.
What's Old Is New
Though the term "big data" is new, Zions has been applying the concept since the 1990s, when it began using its immense supply of information (its security tools and devices alone produce about 3 terabytes of data per week) to make sense of its security posture. "We had a big data strategy before it was called big data," Wood says.
The company certainly has plenty of data to draw from. It has eight banking operations and 500 physical locations throughout the western United States. It was an early adopter of security information and event management (SIEM) technology, using it to better analyze its data flow.
When it comes to big data, experts tend to focus on how it can be used to boost revenue; to a lesser extent, they may note and assess the security risks of big warehouses of (potentially) valuable business intelligence and analytics. But Zions did something different: It decided to make the big data approach a central piece of its security, rather than looking at the information as just another potential hole in its defenses.
The company's massive data stores are used to make better sense of the activity on its network. If someone on the inside or outside is poking around, trying to break into the company's systems, the clues are there, waiting to be sifted from the larger data supply.
To better analyze the data and put it to work in the security department, Wood and company became early adopters of SIEM technology. Among other things, SIEM allowed the security department to:
- aggregate data from multiple sources, including network, security, servers, databases and applications. That provided the ability to consolidate monitored data and avoid missing critical events.
- break events into smaller buckets that can be studied for similarities, which may point to attack activity.
- produce alerts the moment abnormal activity appears.
But by 2008, Zions hit a wall with SIEM. The data supply had become too big and complex to handle. It was now taking months and even years to piece together an actionable picture. The sheer force of data accumulation and the frequency of analysis of events had simply overwhelmed SIEM.
"It's not that SIEM was obsolete and needed to be replaced with something else," Wood says. "It's that we needed something to augment SIEM. It was great for telling the data what to do, but it couldn't tell us what to do."
The Problem of Scale
The team went looking for the missing piece of the puzzle and soon found it in Hadoop.
Open-source Hadoop technology is the engine that drives many of today's more successful big-data security programs. Companies use it to gather, share and analyze massive amounts of structured and unstructured data flowing through their networks. Wood swears by it.
"Now, SIEM is for some data sources just a feed into the security data warehouse," Wood says. Hadoop became the central ingredient in building that warehouse. The company began moving to Hadoop in 2010. Within a year, the team was using the platform exclusively. The positive results came fast and furious. Since Zions' myriad security tools and devices produce several terabytes of data per week, loading a day of logs into the system would be a daylong process. Now it's almost happening in real time.
That's crucial in a world where the bad guys have developed speedy methods of attacking company data and networks. Hadoop can process well over a hundred data sources at a time, uncovering pings on the perimeter, malware infecting parts of the network, social engineering attempts such as spear phishing, and more.
For many companies, Hadoop has also made big-data security affordable, according to Adrian Lane, CTO and security analyst at Securosis. "The cloud has made big data more accessible and affordable. Free tools like Hadoop have been a significant driver. It always comes down to money—what's cheaper," he says.
How Hadoop Works
The Apache Hadoop site describes the technology as "a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models." It's designed to scale up from single servers to thousands of machines, each offering local computation and storage. "Rather than rely on hardware to deliver high availability, the library itself is designed to detect and handle failures at the application layer, delivering a highly available service on top of a cluster of computers, each of which may be prone to failures."
Hadoop includes the following modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Other Hadoop-related projects at Apache include:
- Avro: A data serialization system.
- Cassandra: A scalable multi-master database with no single points of failure.
- Chukwa: A data-collection system for managing large distributed systems.
- HBase: A scalable, distributed database that supports structured data storage for large tables.
- Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout: A scalable machine-learning and data-mining library.
- Pig: A high-level data-flow language and execution framework for parallel computation.
- ZooKeeper: A high-performance coordination service for distributed applications.
Do Your Homework
As with any technology, Hadoop adopters need to be aware of vulnerabilities in the tool itself, as well as the myriad compatibility and configuration problems that can crop up with any such tool.
"Like some of the GRC [governance, risk and compliance] installs we've seen, this can bomb enormously and be a massive money waste," says Alex Hutton, Zions' director of technology and operations risk and governance.
His advice? Do your homework before rushing in. Take all the necessary time to flesh out a detailed road map for the data you're looking to process, carefully review how Hadoop will behave with the rest of your network, and develop a clear taxonomy model and strict metrics for it to follow.
Hutton says Zions achieves that by using a combination of custom controls and the vocabulary for event recording and incident sharing (VERIS) framework, which provides a common language for describing security incidents in a structured and repeatable manner.
"Custom controls and VERIS are our ontologies for metrics. FAIR [factor analysis of information risk] is our risk ontology. Specific metrics support the conceptual categories these ontologies describe," he says.
If you don't have these things, Hutton adds, you are not ready to use big data as a security tool.
Because Wood's team did all its homework before rolling out the new warehouse, Zions enjoyed a relatively smooth deployment. As long as other companies also do their homework, they can hope for similar success, Hutton says.