• United States



Contributing writer

Big Data Investigations: Opportunity and Risk

May 17, 201314 mins
Big DataCybercrimeData and Information Security

Experts say large-scale security analytics can cut through the noise to find key intelligence. But connecting the dots can lead to legal trouble

British Telecom had a problem: The company was suffering an ongoing series of security breaches — the physical, not cyber, kind. Thieves were stealing the company’s underground copper cable.

Obviously, for a service provider like BT, the problem was not just about the cost of replacing the cable. It was also about customer relations. “It was damaging the brand,” said Bryan Fite, BT’s U.S. and Canada security & mobility portfolio manager, noting that every time there was a theft, customers lost service. A report in The Register said metal theft was costing taxpayers 700 million pounds per year.

This theft did not involve data. But it was data that solved the problem — Big Data analytics. Fite said BT had effective tools to investigate the crimes, but wasn’t using them to full advantage. It had multiple sensor networks that could tell when people were on tracks or on cables, a fault system to tell when a cable was cut and closed-circuit TV monitors as well. “But all those were isolated, stand-alone,” he said.

Using Big Data analytics, “allowed us to throw all that into an analytics engine. We did and they (law enforcement) busted a lot of the rings.”

In one of those cases, two men were sentenced this past February to 16 months in jail after they admitted to stealing hundreds of meters of copper cabling from locations in Teddington and Sussex.

“When you overlay sensors, that’s a good use of technology,” Fite said.

Big Data analytics were also at play in the recent conviction of two Steubenville, Ohio high-school football players for raping a 16-year-old girl. Richard A. Oppel Jr., writing in the New York Times, noted that, “The verdict came after four days of testimony that was notable for how Ohio prosecutors and criminal forensics investigators analyzed hundreds of text messages from more than a dozen cellphones and created something like a real-time accounting of the events surrounding the incident and aftermath.”

While hundreds of text messages do not amount to Big Data in terms of volume, the analytics do. Drawing connections among otherwise disparate data was not being done even a few years earlier.

Also see: “Big Goals for Big Data

Indeed, Big Data has revolutionized marketing and business operations, so it makes sense that it is also revolutionizing investigations, which are, after all, about collecting and analyzing information. Big Data analytics should make them faster, easier and more accurate, right?

Perhaps, but with some caveats. Big Data offers big opportunities to improve investigations, according to numerous CSOs and CISOs, but they say it also brings new responsibilities and big risks. As is often the case, technology tends to outrace the ability of people and systems to manage and control it, and the ability of government to regulate it effectively.

Also see: “Big Data protects Intel’s info

Risks you don’t see coming

Kim Jones, senior vice president and CSO of Vantiv, a payment processing firm based in Cincinnati, welcomes the ability to access, aggregate and analyze much more information, saying it should let him, “walk through the details of an incident with greater clarity and certainty than in past, and more quickly. I believe those opportunities exist, and the tool sets are available to make them happen.”

But, his enthusiasm is tempered by the reality that different sets of data that used to be segregated can, when combined and aggregated, “create security, privacy and regulatory problems within our environment. Individually, items are fine, but when they’re aggregated, they’re not,”

An example, he said, is different pieces of data about a person contained in multiple databases that are meant to be kept separate. “But if I have one person who has authorization for all of that data, and can pull it into an aggregator, I may create a scenario where I have data that is more sensitive than the individual parts,” he said. “HIPAA (Health Insurance Portability and Accountability Act) talks about this, where data separate is not PII (Personally Identifiable Information) but when you pull it together, it is.

“I believe 95 percent of the companies out there are not up to speed on that,” Jones said.

Not that Big Data is the newest buzzword on the block. It has been widely covered in the mainstream media for its marketing value. It has even reached the point where Svetlana Sicular, research director at Gartner, wrote in a recent blog post that according to the “Gartner Hype Cycle curve,” Big Data has passed the “peak of inflated expectations” and fallen into the “trough of disillusionment.”

This, she hastened to add, does not mean Big Data is obsolete or even has declining relevance — only that the view of its users is “maturing” to a more realistic view of its value. But when it comes to investigations, there is general agreement that the ability of enterprises and government regulators to control and manage it still has a ways to go to achieve maturity.

So far, Big Data is not a major tool, at least directly, of the federal Department of Health and Human Service’s (HHS) Office of Civil Rights (OCR), which investigates alleged violations of HIPAA.

OCR Director Leon Rodriguez said the role of his agency is to take more of a “macro” look at how breaches occur and what kind of risks and vulnerabilities led to them, rather than crunch and analyze large amounts of data.

Who has the responsibility?

Big Data analytics, Rodriguez said, is the responsibility of medical providers and/or their business associates who store and handle Protected Health Information (PHI). They are required to use certain safeguards to protect that information, and also to report breaches of 500 or more records to HHS and the media.

In the past, Rodriguez said, the main sources of information about violations were patients. “But they only have pinhole view of what’s going on. What’s changed is that we are now getting large-scale breach reports involving millions of records. We were never in that environment before. But it is good, because it comes at a time when more and more health data is being stored electronically and aggregated,” he said.

Rodriguez said his agency needs the technical capability to understand what health providers and data custodians are doing, but, “we’re really looking at your business process rather than what was in that data that was breached.”

Still, even if some of the initial hype was overdone, Big Data has ever-expanding value.

What was considered Big two years ago would now be considered Medium, and in a few more years will be considered relatively insignificant. IBM notes that every day, “we create 2.5 quintillion bytes of data — so much that 90 percent of the data in the world today has been created in the last two years alone”

Todd Marlin, writing on Ernst & Young’s Forensic Brief blog, observed that, “Today, an hour’s worth of business for a typical big-box retail chain can create millions of transactional records. The entirety of data from the private sector doubles every 14 months.

“Consider that when your organization leaves the league of petabytes in storage and moves to exabytes (that’s about one thousand petabytes), you are then working at an organization that stores more data than the entirety of human civilization until about 20 years ago,” he wrote.

Data where you didn’t see it coming

It is not just a lot more of the same data that has been collected for generations either. It comes from sources that did not exist even a decade ago: sensors in everything from smart cars to smart appliances, TVs and weather stations; utility smart meters; health care biosensors that can monitor everything from heart rate to the effect of medications on the body; HVAC monitors; traffic sensors; ATM transactions; posts to social media sites; geotagged digital pictures and videos; purchase transaction records; cell phone GPS signals; clickstream; log files and more.

There are tool sets, some of them open-source, like Apache Hadoop, that can gather, share and analyze the constant rush of structured and unstructured data flowing through networks — they offer speed and the ability to draw connections among seemingly random, unstructured sets of data.

And the ability to access and analyze all that data leads to intelligence. Kim Jones likes to talk about the differences among data, information and intelligence. One of his favorite examples is a seemingly random — at first — 10-digit number.

“Maybe it’s just a number in excess of three billion,” he said. “Maybe it’s an overseas telephone number. Maybe it’s a 10-digit barcode of something. Or maybe it breaks down to a U.S. telephone number, which in this case is what it is.

“If I add that to other pieces of information that may exist out there, such as the first three numbers — 301 — being the area code for Maryland and the fact that I used to live in Maryland back in the late ’90s, you might be able to do some predictive analysis and extrapolate that this is my old phone number.”

Bob Rudis, director of enterprise information security and risk management at Liberty Mutual, bristles at the buzzword “Big Data,” preferring “large-scale, aggregated security analytics” instead, but said he does see organizations, “including the one I work for, embracing the potential of the advancements in security-oriented data analytics to help speed up and generally improve forensic investigations.

“Something that may have taken an organization a few hours or days to get intelligence on can take minutes with the right people, processes and technology.”

Rudis said Liberty Mutual is also, “part of a regional, cross-sector group that is working to develop a way for member organizations to share their security-oriented data into one large system that would then be able to do very large-scale analytics across organizations for one purpose — being able to share known attack indicators as well as see if there are already indicators on those networks.”

Eddie Schwartz, CISO at RSA, said Big Data turns the traditional model of investigating and defending against attacks on a network “on its head by adding new content, context and analytic methods.

Schwartz said Big Data allows a, “predictive and proactive model,” that by focusing on the entire operation of a business, including transactions, can identify or even anticipate attacks.

And insurance companies investigating an accident can now combine data from automobile sensors with weather readings and traffic data, to get a better understanding of the conditions surrounding a claim.

Simply having tools and data not enough

But those investigative advantages come with more demands and more risks.

Simply having the technology doesn’t guarantee effective use of Big Data. Stefen Smith, CSO at SecureForce, agrees with Kim Jones that most enterprises are “not up to speed” when it comes to Big Data analytics.

The tool sets now available, which besides Hadoop include EMC’s Greenplum, Teradata, HP’s Vertica and Palantir, offer plenty of value, he said, but need a significant amount of human expertise to be used effectively, since they all are different technologies that are focused on different areas.

“To find data related to an insider threat or regulatory compliance, things have to be configured to find what’s important to the organization,” he said. “Until somebody is able to deploy these disparate technologies, it’s going to be tough for organizations to achieve success.”

One vendor, Smith said, has an “awesome suite,” but on its website makes the point that it needs the expertise of “data scientists. So, you’re talking about needing people with advanced degrees who know how to find patterns and look for it and organize it.”

Bob Rudis agrees. “It’s not really about the tools,” he said. “It’s about the people and processes.”

That includes, he said, backing (including money and policy directives) of senior management, smart security people who know what questions to ask, smart data analytics people who know how to ask those questions and solid governance and maintenance models in place to ensure tools and processes are kept up-to-date.

“All that,” he said, “plus storage — lots and lots of storage.”

BT’s Bryan Fite emphasizes the human element as well. “Big Data doesn’t work if you don’t have humans handling it. You can’t buy technology and get rid of humans.”

Then there are the risks and responsibilities. The fact that the tools are available to aggregate and analyze Big Data means regulators and the courts increasingly expect those involved in discovery proceedings to make use of them.

Heather Clancy, writing on Smart Planet, noted that, “analytics and ‘big data’ technology is making e-discovery software smarter, helping legal departments avoid costly fines associated with failing to produce all relevant documents related to lawsuits or other government investigations.”

But failure to use it, she wrote, “can also be a huge liability. Consider the 2008 case of Qualcomm and Broadcom, which were embroiled in a patent dispute. Along the way, things got ugly when the judge fined Qualcomm $8.5 million for withholding evidence.”

In law enforcement investigations, the reality of Big Data means collecting more than just the laptop computer of a suspect. The list also includes loose hard drives, modems, routers, digital cameras, games consoles and, of course any smartphones or tablets.

A shifting legal strategy

Kim Jones notes that it is also changing legal strategy. “It has long been the practice when one side gets data requests for trial or prosecution, to deluge the other side with data, under the assumption that they’ll never find what they’re looking for. But Big Data means they can find it. Even worse, given the analytic capability of the tools, they might find more than you thought they would.”

“When I think about its application to investigations, it may lead to more investigations,” he said.

And then there is the risk of violating personal privacy. As experts have noted, the almost magical ability of Big Data analytics to draw connections from seemingly random, disconnected bits of data can also be a curse.

David Navetta, in a post on Information Law Group, illustrates that risk. A person who consents to have his personal information collected and used for marketing purposes may find that his information ends up in the hands of a data broker.

If that person buys a deep fryer, and that information ends up in the hands of, “a health insurance company, whose algorithms put people who purchase deep fryers into a high risk category, in the world of Big Data, the initial, relatively innocuous data disclosure (that was consented to), could suddenly serve as the basis to deny a person health care (or result in higher health care rates),” Navetta wrote.

The solution to that, according to a number of experts, is to anonymize the data. That, in fact, is among the guidelines of the Office for Civil Rights of the Department of Health and Human Services. Navetta notes in his post that HHS, “sets forth two methods to achieve de-identification under HIPAA: expert determination and ‘safe harbor’ de-identification (which involves removing 18 types of identifiers from health data).”

That may not be good enough, however. Navetta wrote that, “In one infamous example, as part of a contest to create a better movie recommendation engine, Netflix released an anonymized data set containing the movie rental histories of approximately 480,000 of its customers. Researchers established that they could re-identify some of the Netflix customers at issue by accessing and analyzing publicly available information concerning movie ratings performed by such customers.”

Bob Rudis appreciates the difficulty. “My organization has had legal involved since Day One of cross-organization sharing,” he said. “Any non-U.S. organization, or domestic one with international employees and customers, will have to ensure they are anonymizing well, which is really hard to do when you have so many attributes from so many systems and devices brought together.”

Rudis said he believes the risk of privacy violations, “is significant enough that any organization looking to put in large-scale security data analytics should also budget for increased insurance to cover any fines or lawsuits that emerge.”