Autonomy and the death of CVEs?

Is the manual process of reporting bugs holding back the advent of automated tools?

detection radar computer bug threats identify breach  by the lightwriter kao studio getty
thelightwriter / kao studio / Getty Images

How many potholes did you encounter on your way into work today?  And how many of them did you report to the city?

Vulnerability reporting works much the same way. Developers find bugs – and vulnerabilities – and don’t always report them. That’s because of the manual process to diagnose and report each one. And that manual process might be holding automated tools back.

Software is assembled

Software is assembled from pieces, not written from scratch. And when you build and deploy an app, you also inherit the risk of each of those pieces. For example, A 2019 Synopsys reports 96% of code bases [caution: email wall] they scanned included open source software, and up to 60% contain a known vulnerability.

And risks don’t stop there. Open source and third-party components are heavily used when you operate software. For example, 44% of indexed sites use the Apache open source web-server. A single exploitable vulnerability in the Apache webserver would have serious consequences for all of those sites.

How do you determine if you’re using a known vulnerable building block?  You consult a database. They go by different names, but at the root of many of them is the MITRE CVE database.

Entire industries have been created just to check databases for known vulnerabilities. For example:

  • Software Component Analysis (e.g., BlackDuck, WhiteSource) tools for developers check build dependencies.
  • Container scanners (e.g., TwistLock, Anchore) can check your built docker image for out-of-date libraries.
  • Network scanners (e.g., Nessus, Metasploit) check your deployed infrastructure for known vulnerabilities.

But here is the key question: where do these databases get their information?

Where vulnerability information comes from

Today, most vulnerability databases are created and maintained by huge amounts of manual effort. MITRE’s CVE database is the de facto standard but is populated by committed men and women who research bugs, determine their severity, and follow the manually reporting guidelines for the public good.

If there’s one thing we know, though, human processes don’t scale well. The cracks are beginning to show.

Here’s the problem: automated tools like fuzzing are getting better and better at finding new bugs and vulnerabilities. And automated vulnerability discovery tools don’t work well with current manual process to triage and index vulnerabilities.

Google’s automated fuzzing

Consider this: automated fuzzing farms can autonomously uncover hundreds of new vulnerabilities each year. Let’s look at Google ClusterFuzz, since their statistics are public.

In Chrome:

  • 20,442 bugs were automatically discovered by fuzzing.
  • 3,849 – 18.8% – of them are labeled as a security issue.
  • 22.4% of all vulnerabilities were found by fuzzing (3,849 found by fuzzing divided by the 17,161 total security-critical bugs).

Google also runs oss-fuzz, where they use their tools on open source projects. So far, oss-fuzz has found over 16,000 defects, with 3,345 of them labeled as security related (20%!).

Many of the security-critical bugs are never reported or given a CVE number. Why? Because it’s labor intensive to file a CVE and update the database. But the fuzzer creates an input that triggers the bug. And that input sheds light for possible attackers to both locate the bug and demonstrate how to trigger it.

So:

  • We have tools that can find thousands of defects a year
  • Many are security-critical. In the two above, about 20%. That means hundreds of new vulnerabilities are being discovered a year.
  • There is no way to automatically report and index these bugs
  • Yet we depend on these indexes like the MITRE CVE database to tell us whether we’re running known vulnerable software.

Earlier this year Alex Gaynor raised the issue of fuzzing and CVEs, with a nice summary of responses created by Jake Edge. There wasn’t a consensus on what to do, but I think Alex is pointing out an important issue.

I wouldn’t be surprised if you could make a few thousand bucks a year taking Google’s OSS feed, reproducing the results, and claiming a bug bounty.

We’ve evolved before...

How we index known vulnerabilities has evolved over time. I think we can change again.

In the early 1990s, if you wanted to track responsibly disclosed vulnerabilities, you’d coordinate with CERT/CC or similar. If you wanted the firehouse of new disclosures, you’d subscribe to a mailing list like bugtraq on security focus. Over time, vendors recognized the importance of cybersecurity, and would then create their own database of vulnerabilities. It evolved to a place where system administrators and cybersecurity professionals had to monitor several different lists, which didn’t scale well.

By 1999 the disjoint efforts were bursting at the seams. Different organizations would use different naming conventions and assign different identifiers to the same vulnerability. It started to become really difficult to answer whether vendor A’s vulnerability was the same as vendor B’s. You couldn’t answer the question “how many new vulnerabilities are there each year?”

In 1999, MITRE had an “aha” moment and came up with the idea of a CVE List (common vulnerability enumeration). A CVE (Common Vulnerabilities and Exposures) is intended to be a unique identifier for each known vulnerability. To quote MITRE, a CVE is:

  • The de facto standard for uniquely identifying vulnerabilities
  • A dictionary of publicly known cybersecurity vulnerabilities
  • A pivot point between vulnerability scanners, vendor patch information, patch managers, and network/cyber operations

MITRE’s CVE list has indeed become the standard. Companies rely on CVE information to decide how quickly they need to roll out a fix or patch. MITRE has also developed a vocabulary for describing vulnerabilities, called the “Common Weakness Enumeration” (CWE). We needed both, and they served a great solution for the intended purpose: make sure everyone is speaking the same language.

CVE’s can help executives and professionals alike better identify and fix known vulnerabilities quickly. For example, consider Equifax. One reason Equifax was compromised was because they had deployed a known vulnerable version of Apache Struts. And that vulnerability was listed 9 weeks prior in the CVE database. If Equifax had consulted the CVE database, they would have discovered they were vulnerable a full 9 weeks before the attack.

Cracks are widening in the CVE system

The CVE system is OK but doesn’t scale to automated tools like fuzzing. These tools can identify new flaws at dramatically new scale. That’s not hyperbole: remember Google OSS-Fuzz – just one company running a fuzzer – identified over 3000 new security bugs over 3 years.

But many of those flaws are never reported to a CVE database. Instead, companies like google focus on fixing the vulnerabilities, not reporting them. If you’re a mature DevOps team, that’s great; you just pull the latest update on your next deploy. But very few organizations are mature DevOps where they can upgrade all software they depend on overnight.

I believe we’re hitting an inflection point where real-life, known vulnerabilities are becoming invisible to automated scanning. In the beginning, we mentioned entire industries exist to scan for known vulnerabilities at all stages of the software lifecycle: development, deployment, and operations.

Companies want to find vulnerabilities, but also are often incentivized to downplay any potential vulnerability that isn’t well-known as super critical. Companies want to understand the severity of an issue, but judging severity is often context-dependent.

Research hasn’t quite caught up with the problem, but it needs to. There are several challenges

First, the word “vulnerability” is really squishy, and sometimes in the eye of the beholder. Just saying “on the attack surface” isn’t enough; the same program can be on the attack surface sometimes, but not others. For example, ghostscript is a program for interpreting postscript and PDF files. It may not seem on the attack surface, but it’s used in mediawiki (the wiki that powers wikipedia) to potentially process malicious user input. How would you rate the severity of a ghostscript vulnerability in a way meaningful to everyone?

Second, even the actual specifications of a vulnerability are squishy. A MITRE CVE contains very little structured information. There isn’t any machine-specified way to even determine if a bug found qualifies for a CVE. It’s really up to the developer, which is appropriate when developers are actively engaged and can investigate the full consequences of every bug. It’s not great otherwise.

Third, the naming for various types of vulnerabilities – or in MITRE-speak, “weaknesses,” is squishy.  CWE’s were intended to become the de-facto standard for how we describe a vulnerability, just like CVE’s are for listing specific flaws. But today automated tools can find buffer overflows and demonstrate them, and correctly label them with CWE types for input validation bug, a buffer overflow, or out-of-bounds write, each of which can be argued is technically correct.

Overall, I believe we need to rethink CVE’s and CWE’s so that automated tools can correctly decide a label, and automated tools for calculating a severity. Developers don’t have time to investigate every bug, and what may be a security consequence. And they’re focused on fixing the bug before them; not making sure anyone using the software has the latest copy.

We also need a machine-checkable way of labeling the type of bug to replace the informal CWE definition. Today CWE’s are designed for humans to read, but that’s too underspecified for a machine to understand. Without this, it’s going to be hard for autonomous systems to go the extra mile and hook up to a public reporting system.

In addition, we need to think about how we prove whether a vulnerability is exploitable. In 2011 we started doing research into automated exploit generation with the goal to show whether a bug could result in control flow hijack. [We turned off OS-level defenses that might mitigate an attacker exploiting a vulnerability such as ASLR. The intuition is that the exploitability of an application should be considered separately from whether a mitigation makes exploitation harder.] In the 2016 DARPA Cyber Grand Challenge, all competitors needed to create a “Proof of Vulnerability,” such as showing you could control a certain number of bits of execution control flow. Make no mistake: this is early work and there is a lot more to be done to automatically create exploits, but it was a first step.

One question, though, is whether “Proofs of Vulnerabilities” are for the public good. The problem: just because you can’t automatically prove a bug is exploitable (or even manually) doesn’t mean the bug isn’t really security critical.

For example, in one of our research papers we found over 11,000 memory-safety bug in Linux utilities, and could create a control flow hijack for 250 – about 2%. That doesn’t mean the other 98% are unexploitable. It doesn’t work that way. Automated exploit generation confirms a bug is exploitable, but it doesn’t reject a bug as unexploitable. It also doesn’t mean that the 250 discovered were exploitable in your particular configuration.

We saw similar results in the Cyber Grand Challenge. Mayhem could often find crashes for really exploitable bugs, but Mayhem wasn’t able to create an exploit. The same was reported by other teams. Just because an automated tool can’t prove exploitability doesn’t mean the bug isn’t security critical.

One proposal

I believe we need to set up a machine checkable standard for when a bug is likely of security interest. For example, Microsoft has a “!exploitable” plugin for their debugger, and there is a similar version for GDB. These tools are heuristics: they can have false positives and false negatives.

We should create a list – similar to CVEs – where fuzzers can submit their crashes, and each crash is labeled as likely exploitable or not. This may be a noisy feed. But the goal isn’t for human consumption – it’s to give a unique identifier. And those unique identifiers can be useful to autonomous systems that want to make decisions. It can also help us identify the trustworthiness of software. If a piece of software has 10 bugs that have reasonable indications that they are real vulnerabilities, but no one has proved it, would you still want to field it?

I don’t think the answer is to bury them, but to index them.

I also don’t think I’m alone. Google, Microsoft, and others are prioritizing their developer workflow more and more on autonomous systems. It makes sense to make that information available to everyone who fields the software as well.

I started this article asking the question on whether autonomy will be the death of CVEs. I don’t think so. But I do think autonomous systems will need a separate data source – something updated much faster and designed for machines – than a manually curated list to be effective.

Key takeaways

  • Executives should continue to use scanners for known vulnerabilities but understand they don’t represent the complete picture.
  • The appsec community should think hard about how to better incorporate tools like fuzzers into the workflow. We’re potentially missing out on a huge number of critical bugs and security issues.
  • One proposal:
    • Add structure to CVE and CWE databases that is machine parsable and usable.
    • Create a system where autonomous systems can report problems, and other autonomous systems can consume the information. This isn’t the same fidelity as human-verified “this is how you would exploit it in practice”, but it would help us move faster.

Agree or disagree? Let me know.

Related:

Copyright © 2019 IDG Communications, Inc.

7 hot cybersecurity trends (and 2 going cold)