• United States



Scrubbing the Source Data – Part 1 – NVD (National Vulnerability Database)

Apr 04, 20077 mins
Data and Information SecurityIT Leadership

When I publish my analyses, I actively encourage folks to challenge my results, challenge my methodology, perform their own analysis and generally just keep me honest.  Sadly, while many may be willing to cast doubts on the results, few follow up with any analysis on their own.

In a few cases though, someone does do a little data gathering on their own and come up with their own results using publicly available sources such as vendor web sites,, Secunia or  With that in mind, I thought it might be useful to share some of the issues I’ve found that I msut allow for when performing my own analyses.

First, A Caveat

I’m going to point out some challenges in using some of the data sources out there that people should be aware of.  Please do not interpret that as disrespect for the sources or the excellent people who help accumulate the data.  I believe the Mitre team, the Secunia team and the NIST teams, for example, are all doing an excellent job in their charters.  I equally believe that some products and vendors represent unique challenges in terms of vulnerability data that make them harder to track for these teams.

For example, here is what the NIST team says on their statistics page at

Important Note: Linux distributions are often made up of a large collections of independently developed software and it is sometimes difficult to determine which software packages should be considered part of the operating system and which should be considered independent but merely included along with the operating system. In addition, some vulnerabilities occur within the Linux kernel and for those vulnerabilities we do not enumerate all of the hundreds of Linux distributions. Thus, the statistics related to Linux must be interpreted carefully. We will be working to provide better statistics for Linux distributions.

What is it Hard?

The question of why accurate data is challenging to acquire on Linux distributions for these site is not that hard to understand.  Let’s take a fictional example.  Say that someone identifies a vulnerability in the Linux 2.4 kernel.  We can identify many distributions that use the 2.4 kernel, so we might assume that those distributions are affected, but we can’t be sure.  Red Hat, for example, customizes their kernels somewhat before shipment.  Others too, might not be including that kernel module or may have already fixed the problem through their own process.  The only way to know for sure is to have someone check the source for each individual distribution and verify applicability, or, simply wait to see if a vendor fixes the issue and validates that it applies to them.

Examining NVD Data

The NVD is particularly useful because NIST incorporates the full CVE list of vulnerabilities and then they work to integrate “all publicly available U.S. Government vulnerability resources and provides references to industry resources”, plus they make it available as an XML download.  One of the fields that they populate lists products affected.  Here is their description (from  “Names a product that is succeptible to this vulnerability and serves as a wrapper tag for the versions of this product that are specifically affected.”

So, in theory, one could search the XML for affected products/vendors and get some results for how many vulnerabilities affected them.

I’m going to test that by going in the other direction.  First, I will go to a vendor’s security web site and gather all applicable security notices and extract the vulnerabilities that the vendor acknowledges and fixes.  Next, I will examine each of those vulnerabilities in the NVD XML and see how many of them are correctly associated with the vendor in question.

Windows Server 2003

As a baseline for Microsoft products, I used the vulnerabilities that have been fixed for Windows Server 2003 since its release in April 2003.  You could build your own list from the source:

To keep the numbers sensibly low ( and thus reduce the amount of grunt work ), I examined the vulnerabilities fixed in the first year after the product was released, or 45 vulnerabilities.  What I found was that 100% of the vulnerabilities in the database correctly identified a Microsoft product as an affected.  You can see the details on this page.

Apple Mac OS X v10.4 (Tiger)

As a baseline for Apple products, I used the vulnerabilities that have been fixed for Mac OS X v10.4 (Tiger) since its release in April 2005.  You could build your own list from the source:

Similar to the Windows effort, I examined the vulnerabilities fixed in the first year after Mac OS X Tiger was released, or 95 vulnerabilities.  What I found was that 75.8% of the vulnerabilities in the database correctly identified an Apple product as an affected, or 23 out of 95 not correctly associated.  You can see the details on this page.

Red Hat Enterprise Linux 4 WS

As a proxy for Linux distributions, I used the vulnerabilities that have been fixed for Red Hat Enterprise Linux 4 WS since its release in February 2005.  You could build your own list from the source:

Similar to the other efforts, I examined the vulnerabilities fixed in the first year after RHEL4WS was released, or 421 vulnerabilities.  What I found was that only 10.7% of the vulnerabilities in the database correctly identified a Red Hat product as affected, or 376 out of 421 not correctly associated.  You can see the details on this page.

Ubuntu 6.06 LTS

Not wanting to let my Linux distribution results rest on a single distribution, I also looked at Ubuntu LTS, which was released in June 2006.  You cuold build your own vulnerability list from the source:

Ubuntu LTS hasn’t yet been released for a year, so I took all the vulnerabilities that Ubuntu has fixed for the product in the (approximately) 10 months since the product released, or 235 vulnerabilities.  When I checked the vulnerabilities in the NVD, I found that only 2.1% correctly include an association with any Ubuntu product.  See the details on this page.

Findings from this NVD Data Scrub

For convenience, I have summarized the findings in a table:

It is not surprising to me that 100% of the Microsoft vulnerabilities had the appropriate association in the National Vulnerability Database (NVD) – after all, there is typically a lot of coverage when either a new vulnerability is disclosed or when patches are released by Microsoft.

I had no expectations on the Apple Mac data, so finding that 75% of vulnerabilities in Mac OS X Tiger had an appropriate association in the NVD wasn’t a surprise, but was informative, especially when compared with the two Linux distributions examined. 

Fundamentally, I knew going in many of the vulnerabilities in LInux distributions would not necessarily be associated with any given vendor’s distro.  However, I was surprised with how low the accuracy was, for both RHEL4WS (10.7%) and Ubuntu 6.06 LTS (2.1%).

Given these accuracy levels for vulnerabilities after the vendor has acknowledged it and provided a fix, it doesn’t seem like too much of a stretch to also conclude that using this data to analyze unpatched data would be equally challenging.

Finally, I think this exercise helps demonstrate that anyone leveraging public data sources needs to have a good understanding of both the strengths and the weaknesses that any given data source may have, with respect to what one is trying to analyze or measure, and include steps in their methodology that accomodates accordingly.

Best Regards ~ Jeff

Jeff Jones is a 24-year security industry professional that has spent the last several years at Microsoft helping drive security and privacy progress as part of the Trustworthy Computing group. In this role, Jeff draws upon his security experience to work with enterprise CSOs and Microsoft's internal security teams to drive practical and measurable security improvements into Microsoft process and products. Prior to Microsoft, Jeff was the vice president of product management for security products at Network Associates where his responsibilities included PGP, Gauntlet and Cybercop products, and several improvements in the McAfee product line. These latest positions cap a career focused on security, managing risk, building custom firewalls and being involved in Darpa security research projects while part of Trusted Information Systems. Jeff is a frequent global speaker and writer on security topics ranging from the very technical to more high level, CxO-focused topics such as Security TCO and metrics. Jeff is also a contributor the Microsoft Security Blog ( and writes on a wide range of personal interests (e.g. books, poker, gaming) at