Panel Says Data Mining for Terrorists is an Exercise in Futility

Report, commissioned in part by the DHS, also warns of potential privacy problems

The kind of pattern-seeking data mining and behavioral surveillance technologies that are being used by several federal agencies to identify potential terrorists are far too unreliable to be of any real value, according to a report issued by the National Research Council.

The continued and unchecked use of such tools also poses potential privacy problems for individuals, the NRC said in its 376-page report, which was prepared at the request of the U.S. Department of Homeland Security (DHS) and the National Science Foundation.

In light of the findings, the 21-member committee that conducted the study is recommending that agencies using or planning to adopt such tools for counterterrorism purposes should first be required to thoroughly evaluate their effectiveness, lawfulness and impact on privacy. The committee also called on Congress to consider revising national privacy laws in order to ensure better protection for U.S. residents.

The NRC and the National Academy of Sciences, the National Institute of Medicine and the National Academy of Engineering make up what is known as the National Academies, which advise the government on science and technology issues.

The findings detailed in the NRC's report hammer home concerns that have been voiced by many privacy advocates, said Marc Rotenberg, executive director of the Electronic Privacy Information Center (EPIC) in Washington.

"What the [NRC] has concluded is that there needs to be much more effective oversight of these programs," Rotenberg said. "It's a very timely and significant report." He noted that despite the privacy concerns, the government has gone ahead with many data mining programs in the name of countering terrorism. But the NRC's report raises questions about whether such programs really work, Rotenberg said.

As of January 2007, there were nearly 200 data mining programs planned or already operating throughout the federal government. Among them were the Automated Targeting System at the DHS for assigning "terror scores" to U.S. citizens and the Transportation Security Administration's Secure Flight program for analyzing data about airline passengers. The FBI has several data mining initiatives underway, including some that target terrorists.

One of the most controversial programs was the Total Information Awareness (TIA) initiative that was quietly launched in 2002 by the Defense Advanced Research Projects Agency but then abandoned in 2003 after Congress stopped funding for it following a public outcry.

William Perry, co-chair of the NRC committee that wrote the new report, said in a prepared statement that technology should be used as needed to combat terrorism. "However, the threat does not justify government activities that violate the law, or fundamental changes in the level of privacy protection to which Americans are entitled," he added.

The NRC committee didn't look specifically at any counterterrorism-related data mining initiatives, nor did it conduct any direct evaluations of behavioral surveillance tools being used by agencies. Instead, the report is based on a generalized study of the effectiveness of such technologies in identifying potential terrorists.

What the report highlights are the severe limitations of automated data mining techniques for counterterrorism purposes and their potential privacy impacts, said committee member Fred Cate, who is the director of the Center for Applied Cybersecurity Research at Indiana University.

Automated data mining tools typically work by searching through mountains of data in large databases for unusual patterns of activity, which are then used to predict future behavior. The tools have proved to be useful for commercial applications such as detecting payment card fraud and predicting purchasing trends, Cate said.

"We can look at 50,000 people buying television sets and know that many of them are going to be buying a DVD at the same time," Cate said. But using the same techniques to try to identify a potential terrorist is futile because there simply isn't enough historical data upon which to base any predictions, he claimed, adding that there is little information available about patterns that could reliably point to terrorist activity.

On the consumer side, "you have millions of examples of the target data you want to emulate, so you know certain patterns look like fraud," Cate said. "With terrorists, we fortunately don't have too many examples."

And unlike shoppers, terrorists are likely to make deliberate attempts to hide their activities, making it even harder to pick them out using an automated pattern-matching program, according to Cate. As a result, data mining tools generate an unacceptably high rate of false positives when used in counterterrorism applications, he said.

Such tools can prove useful in situations in which they are given specific pieces of information such as a suspect's name and asked to look for other data, such as purchases made or places visited by the suspect. That could help show if there is any basis for further action, Cate said.

There are similar problems with many behavioral surveillance tools, Cate contended. Such tools are supposed to help counterterrorism efforts by measuring physiological states, including facial expressions, body temperatures and body language, in order to predict terrorist activity. But there is no evidence that the tools work at all, Cate said. He recommended that at the most, they should be used for preliminary screening purposes only.

On top of the technical concerns are the potential privacy implications of data mining and behavioral surveillance, the NRC said in its report. The pieces of information that are used to mine data often are compiled from numerous sources and databases, some of which could be outdated or contain poor-quality data, according to Cate.

As a result, he said, data mining is error-prone and the high rate of false positives could lead to unnecessary intrusions into personal privacy. Going forward, Cate said, there need to be safeguards to ensure that any data being collected for counterterrorism uses is fresh, compiled from reliable sources and within the scope of the inquiry being conducted.

SUBSCRIBE! Get the best of CSO delivered to your email inbox.