Threat intelligence overload

Getting through the obstacle of the big data problem

big data confusing overload spiral falling
Credit: Thinkstock

In computer forensics terms, Indicators of Compromise (IoC) artifacts, such as IP addresses, domain names, email addresses, or URLs observed in log data and placed in a SIEM can be correlated to IoCs in the threat intelligence data. These matches indicate a possible serious computer compromise and intrusion.

Today’s number of “active” IoCs seen in threat intelligence data, though, are now 25 million and growing at a rate of 39 percent a month, according to Anomali (formerly ThreatStream) CEO Hugh Njemanze.

"Today’s security information and event management (SIEM) tools were never meant to perform correlation on this scale. Those organizations that try end up with searches that never start, never finish, affect other search and reporting capabilities, and in some cases, results in data base corruption," Njemanze said.

Threat intelligence came about as a way to try and collect useful information about what threats are out there. Njemanze compared the threats to a network with the millions of people who fly every day.  

"Some shouldn’t be allowed to fly, so we have 'no fly' lists.  There are a lot of actors on the Internet, and when they visit your website or network, they may have legitimate intentions. If they don’t, those IP addresses are recorded for future reference," he explained. 

[ ALSO ON CSO: 5 steps to incorporate threat intelligence into your security awareness program ]

This compiled list of IP addresses, emails, phishing signatures, or compromised computer files is essentially similar to a no-fly list. They are 'black lists' that security teams can use to determine whether 'they' have a nefarious track record. It's a useful concept but one that has resulted in complete and utter overload.

"A lot of the malicious systems are innocent systems that are compromised by a bad guy. They have been repaired and are no longer malicious, but they are still on the list," Njemanze said.

IoCs started in several years ago to the sum of tens, maybe hundreds of thousands. Njemanze said, "Last year there were 10 million and today over 70 million IoCs." Anomali projects that the growth will have curated more than 100 million IoCs by the end of 2016 with nearly 40 million categorized as active.

The issue for security practitioners is that none of the security devices are designed to inspect that number. "They are having a huge problem getting the devices that are needed to leverage the threats and to do it without choking. Security practitioners are getting a lot of false positives simply because the data may have been accurate at one time even though it is now outdated," Njemanze said.

An attacker wants to fool someone into thinking they are looking for a legitimate website. At a glance, most users won't notice the subtle differences between what is real and what is a copy. For example, Njemanze said, "If someone were copying the Wells Fargo website, they might replace one 'l' for an 'i'. This is what happened in the Anthem breach, where Wellpoint was spelled with two '1s'."

Millions of domain names are generated every day, but legitimate sites are ones that have been around for years. These are indicators that a well-tuned algorithm can use to distinguish the wheat from the chaff, Njemanze said.

The problem  is not only for practitioners, but also for the tools, which are also overloaded, so Njemanze said, "They offload the processing so that they  sit in front of a SIEM with hundreds of millions of transactions a day, but  the threat indicators in conjunction with that are not there."  

These concerns are even further compounded for SMBs who receive an anomaly report. "Someone may be managing a network, but they don’t have a large or mature enough network to run and analyze a SIEM," Njemanze said.

For those who are new to security, "Collecting logs and seeing the way anomalies work could be your first week on the job," Njemanze said, so for newbies a haunting question that keeps them up at night is often, "How do you know you have a problem?"

In truth, there are entire teams of analysts that respond to things that look like incidents because they are false positives. Far too often, when you find that many of these responses turn out to be a wild goose chase, detecting or discovering a breach is all the more difficult. By the time the problem is discovered, the malicious actor has usually been in there for a long time (an average of six months according to Njemanze) and the issue is usually detected by an external party.

This article is published as part of the IDG Contributor Network. Want to Join?

To comment on this article and other CSO content, visit our Facebook page or our Twitter stream.
Insider: Hacking the elections: myths and realities
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.