IT is a fast-moving field—ideas arise, reach prototype, and go to market quicker than it takes for the average clinical trial to be cleared; yet this one concept within information security—defense requires greater visibility than can be obtained from any single network and to have a fighting chance we should reciprocally distribute data on the attacks and attackers we identify, remains an unresolved debate. I've heard the same thing in information security for quite some time: "We need to share more data."
I've seen both sides of the argument play out like this:
-- Those in favor of sharing show that although they've had some limited success, the process has been difficult to build out and integrate, and the results are mixed due to insufficient data. More data sharing seems like an excellent idea, but they can only conjecture what the curve on the return on investment at higher levels.
-- Those against sharing demonstrate a few early experiments where they have publicly collaborated on data sharing, been burned by the public data being used as counter-intelligence, and promptly returned to either not sharing at all, or sharing within a very limited group.
Now, there is indeed some sharing of security data happening out there right now, in varying degrees of scope and success.
- Public and semi-public clearinghouses such as MalwareDomainList.com, OSVDB.org and ShadowServer.org are excellent first-tier examples. Operated by volunteers and with data submitted manually by the public, they provide an excellent free source of single-scope threat intelligence. Immediately apparent, however, is how limited the data is, and organizations must construct their own processes and technology to consume it effectively.
- Private data sharing arrangements exist between large organizations, usually within a single industry vertical. Information about the scope and effectiveness of these arrangements is unavailable outside of impenetrable layers of NDAs and any comparison of effectiveness of their methods is impossible. During my career I have seen some of these data sharing systems in action. They are almost always more advanced and automated than the publicly available ones—but they still seem unambitious in comparison to what I perceive as possible within the field.
- Government bodies have an advantage over the private sector by way of mandate to share information (on paper at least, a government body has no "competitors" to use their data as leverage against them). The irony here is that while many government efforts in this field should naturally become publicly available (though subscription or otherwise), they are largely carried out under the auspices of the defense sector, leaving both the data and the data sharing technology hidden away under the cloak of classification.
- Finally we have the security vendors who provide an intelligence feed service (for those that can afford it). These vary in quality and scope to a great degree from merely lists of 'suspicious IP addresses' to details on unpublished 0-Day exploits the vendor has purchased from private researchers.
So, after a decade of discussion and attempts to reach critical mass in the move to a sufficiently effective level of data sharing, we still find ourselves at this impasse. I witnessed no end of major security CEOs calling out the need for more data sharing at RSA this year, yet they still make you pay for access to their intelligence services. Certainly, one notable CEO who was infamously reluctant to release any information about a breach his own company suffered, had just released an article a few weeks prior indicating his support for enhanced data sharing as the only way to turn the tide of battle in the theater of risk we face today.
I think we've hit the problem right on the head: The progress we haven't made in security data sharing isn't because of limitations in technology or legal implications (both of which can be overcome with little effort). People don't want to share because of those old faithful standbys still gnawing at the human mind: fear and greed. Fear of how whatever we share may be used against us, greed for anything we can get for free, or better yet, monetize the transaction. We're not going to make this go away overnight; if we're going to find an effective middle ground towards a security data sharing network that is effective for all, we are going to have to find ways to route around these two mindsets.
So far, the only answer I've arrived at for this challenge is that stressing the importance of 'enlightened self interest' may be the only useful argument in this debate—the idea that 'if I help others, it furthers my own goals' seems to be a perfectly reasonable compromise.
We certainly exist in an age where the tide is turning against publicly-available information; the information age has made information valuable beyond measure and things of value are instinctively hoarded away in private. To reach some level of effectiveness in what is increasingly an economic arms race between attackers and defenders in the security field, we're going to have to address the situation more atomically than the either/or debate has so far.
Data sharing has a long and respected history in the scientific world (and even there, the waters are muddy); information security is evolving to the point where we need real data to make real discoveries; hyperbole and anecdote have carried us as far as they can. Enterprise information security exists to protect the enterprise but unrestricted data sharing will likely never go hand-in-hand with that goal; there has to be a middle road.
Sharing doesn't mean giving things away
- Any good data sharing solution is going to result in you receiving more than you give; a system that doesn't achieve that is fundamentally broken in one or more ways.
- A data sharing solution that allows one participant to gain a fundamental business advantage over other parties is likely broken. This does not preclude a particular party from benefiting more from the sharing arrangement by virtue of making better use of the information in a more agile fashion, only that the system itself must not be demonstrably stacked in favor of a particular participant (or group of participants)
- Ideally, the data submitted should not be leveraged for an advantage against the party that contributed it; whenever possible it should not even be possible to identify the party that contributed it.
It's not all-or-nothing
- Within the security realm there are a great number of layers of data within the field. Being selective about what is shared and to what level of detail is perfectly reasonable.
- Paranoia leads us to start from a default-deny position, and try and justify what can be opened up after the fact. For anyone who has ever filed a FOIA request, it is easy to see how this is a method that gains few results for a great deal of work. Instead we should consider the alternative of starting out from the viewpoint of 'everything is good to share' and then selectively removing the things identified as not OK to share (besides, when was the last time you went through a good data classification exercise—this should be good practice!).
- There is no requirement to dive in at the most detailed levels of sharing right away. Collaboration can begin purely with summarized statistical data and built out from there. In nearly every case, contributing something is better than nothing at all.
What's the real problem?
It would be amiss of me to stand here and claim that universal public security data sharing will fix all our woes overnight (though I'm certainly saying it would give us more of a fighting chance!). There are significant hurdles to encounter and overcome when dealing with data and intelligence sharing that need to be addressed by any organization entering a data sharing arrangement.
Valuable intelligence becomes less valuable widely distributed
The most critical piece of information leading up to any attack is how much knowledge of the attack the enemy possesses. Open information sharing networks will be infiltrated by attackers, without a doubt. This should not be construed as a failure of the system if the system is robust enough to absorb this. Returning to the assertion that success in information security long-term is a matter of economics, the more time we can occupy the opponent in trying to find a staging location for their attacks that are already not publicly known, the more of his resources we waste. (As long as the system does not enable the attacker to infer detailed information about what a particular target knows, to stay a step ahead of them.)
A Practical Example: An attacker has set up a dedicated host for use against a specific target company. Some time later, information about that host appears in the public threat feed; the attackers now knows with some certainty that the target is aware of his actions from that host.
Data that cannot be acted upon is NOT worthless
The more open an intelligence source, the more generic the format it must be communicated in. Public sources of threat intelligence are published in the lowest-common denominator format—text files of IP address, CSV files, etc. For many security organizations using these feeds, they process the information manually via analysts performing searches across logs.
Although the information security product market is awash with technologies for security management and response, their ability to consume external intelligence information ranges largely from cursory to non-existent.
Our attackers are already sharing data about us. They scour public knowledge for target information, every press release giving some insight into activities and circumstances at the targeted organization, every LinkedIn profile a cornucopia of marks to infiltrate, every public mailing list posting another data point on what lies behind the firewall. When combined with directly acquired information from the target, detailed and directed plans of attack are easy to formulate; and they share their findings, a lot. Whatever the arguments for and against public information sharing on the defensive side, we can all agree that our own intelligence grid is still woefully inadequate—most organizations have a level of knowledge into who is currently targeting their operation beyond the mysterious 'them', 'cybercriminals', 'nation states' and other digital bogeymen.
[Also read Security metrics: Critical issues]
If we are to ever build an effective communal intelligence grid between all internet-connected legitimate organizations, that agreeably answers the many pros and cons surrounding this goal, I foresee the path to its construction looking something like the following:
- Getting organizations to overcome the reluctance to share detailed information outside their borders will require more detailed, incremental programs of information sharing; ones that start out with simple statistical sharing (like the Verizon VERIS framework) and then ramp up through programs of threat agent information (information about unsuccessful attack attempts, Indicator of Compromise information from discovered compromised hosts). Full data sharing of issues like breach details, successful threat actor attribution, et al, will logically remain within a more limited audience.
- The technology for data sharing needs to adapt to enable more complex levels and methods of sharing; more ambitious standards for communication of shared data, while not immediately necessary for the early stages of emergent sharing arrangement, serve to illustrate what can be possible and encourage further expansion of sharing arrangements with the promise of more advanced security data analytics down the line. The current necessity of every organization having to roll its own solution for consuming intelligence data highlights the need for more ratified standards than just plaintext and CSV for the communication and processing of data and encourages vendors to support these standards to add another feature list checkmark on their product comparisons.
- Adoption of tokenization and anonymization techniques and standards that can be implemented without significant effort will be an important factor in allowing organization to collaborate without undue legal or operational liability. Some level of assurance that the information shared will not (nay, cannot) be used against the contributing organization directly, is a requirement only the most reckless would ignore.
- As the range of data necessary to formulate effective and adaptive intelligence that can be applied automatically within the security program and fuel the predicted wave of more advanced 'Big Data' security research, former 'soft concepts such as exposures, attack surfaces and threat models will likely become immersed into the area of semantic data processing (disclosure of bias: this is my own current area of research focus) with the goal of enabling some level of predictive processing to occur as security intelligence is consumed into the workflow.
The private sector information security world is continually re-treading a path taken by the defense intelligence community decades ago; but where HUMINT bears the greatest fruit in their world, SIGINT is key for us. More so, in the private sector, we have a more limited supply of actual, breathing, Human Intelligence available to us: security analysts need these force multipliers to ever stand a chance of being able to effectively cross-reference the vast number of security markers pouring out of their monitoring systems (against even the most limited of security intel sources) into a stream of directly actionable information that can keep pace with the opposition.
We've spent well over a decade now debating the need for more shared security data as the sanest way to raise the cost of entry and lower the return on investment for criminals and spies alike. In the last year, we've seen this idea go from a murmur to a party line as even the most unlikely of sources turn to the rallying call. The issue is far from settled however, and an implementation worthy of the promise yet to be created. What is important is that efforts are now underway to try and improve the situation, people are being convinced to give this idea a try and see for themselves whether it succeeds or not.
"Fail Early, Fast Fast, Fail Often" is a popular idea in the Agile Of All Things nowadays; let's see that applied to more attempts at making the promises of a shared pool of security data arrive while we're all still in business to see it.
Conrad Constantine is a Research Team Engineer at AlienVault. Over the last decade and a half, Constantine has been on the front lines of defense work in telecom, medical and media corporations, not least of which being at ground zero for the 2011 RSA Breach. He is a firm believer that incident response must become an accessible and effective discipline, available to all. He's striving to bring the mysteries of open source intelligence generation, and defensive agility, to those willing to take the leap from fear to action—mostly via the medium of code (with Visio diagrams thrown in for good measure).