Big Data Investigations: Opportunity and Risk

Experts say large-scale security analytics can cut through the noise to find key intelligence. But connecting the dots can lead to legal trouble

1 2 Page 2
Page 2 of 2

Kim Jones notes that it is also changing legal strategy. "It has long been the practice when one side gets data requests for trial or prosecution, to deluge the other side with data, under the assumption that they'll never find what they're looking for. But Big Data means they can find it. Even worse, given the analytic capability of the tools, they might find more than you thought they would."

"When I think about its application to investigations, it may lead to more investigations," he said.

And then there is the risk of violating personal privacy. As experts have noted, the almost magical ability of Big Data analytics to draw connections from seemingly random, disconnected bits of data can also be a curse.

David Navetta, in a post on Information Law Group, illustrates that risk. A person who consents to have his personal information collected and used for marketing purposes may find that his information ends up in the hands of a data broker.

If that person buys a deep fryer, and that information ends up in the hands of, "a health insurance company, whose algorithms put people who purchase deep fryers into a high risk category, in the world of Big Data, the initial, relatively innocuous data disclosure (that was consented to), could suddenly serve as the basis to deny a person health care (or result in higher health care rates)," Navetta wrote.

The solution to that, according to a number of experts, is to anonymize the data. That, in fact, is among the guidelines of the Office for Civil Rights of the Department of Health and Human Services. Navetta notes in his post that HHS, "sets forth two methods to achieve de-identification under HIPAA: expert determination and 'safe harbor' de-identification (which involves removing 18 types of identifiers from health data)."

That may not be good enough, however. Navetta wrote that, "In one infamous example, as part of a contest to create a better movie recommendation engine, Netflix released an anonymized data set containing the movie rental histories of approximately 480,000 of its customers. Researchers established that they could re-identify some of the Netflix customers at issue by accessing and analyzing publicly available information concerning movie ratings performed by such customers."

Bob Rudis appreciates the difficulty. "My organization has had legal involved since Day One of cross-organization sharing," he said. "Any non-U.S. organization, or domestic one with international employees and customers, will have to ensure they are anonymizing well, which is really hard to do when you have so many attributes from so many systems and devices brought together."

Rudis said he believes the risk of privacy violations, "is significant enough that any organization looking to put in large-scale security data analytics should also budget for increased insurance to cover any fines or lawsuits that emerge."

Copyright © 2013 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
Microsoft's very bad year for security: A timeline