4 best practices to avoid vulnerabilities in open-source code

Open-source code in public repositories might contain malware or unintentional vulnerabilities. Here's how to best manage finding and mitigating potential problems.

open box / abstract code / open-source code
CustomDesigner / Bagotaj / Getty Images

This year presented even more challenges for ensuring the integrity and security of open-source ecosystems. Open source has been the greatest boon to developers in that virtually anyone can use and customize it, typically at no cost, and contribute to the community. What has been a means of ensuring greater transparency, security and promoting developer collaboration across projects has also paved ways for adversaries to profit off the cause.

As a security researcher, I came across and analysed incidents this year where over 700 typo-squatting RubyGems packages served no purpose other than mining bitcoins. Then there’s the popular case of Octopus Scanner, malware that had silently injected its tentacles into at least 26 GitHub projects. These incidents underscore the fact that any open system that is accessible to the public is also accessible to adversaries and prone to abuse.

The examples above focus on malicious components. What about legitimate open-source packages with security vulnerabilities that go unnoticed?

A vulnerable or malicious package that makes its way into popular repositories, and eventually into your software supply chain, can wreak havoc for your customers. Vulnerable and malicious components have been detected in popular open-source repositories such as npm, PyPI, NuGet and Fedora.

“In past years, we have seen that in terms of total vulnerabilities identified in open-source packages across the ecosystems, Node.js and Java have traditionally shown the greatest number of new vulnerabilities each year,” said the authors of Snyk’s State of Open Source Security Report 2020.

The report also suggests that security efforts implemented early in the software development proces are responsibile for there being fewer new vulnerabilities reported in 2019 than in 2018. “If this trend continues, it could be a positive sign that efforts to improve the security of open-source software are starting to pay off,” the report continued.

Here are some best practices for managing open-source code securely.

1. Know your software

The 2020 DevSecOps Community Survey conducted by Sonatype [full disclosure: Sonatype is my employer] reveals that most companies – even those with some level of DevOps practices built into their workflow, lack complete visibility into all the open-source components their software applications are using and what vulnerabilities apply to them.

“When a vulnerability is announced in an open-source project, you should be asking two questions immediately: Have we ever used that open-source component, and (if yes) where is it?” said the report’s authors.

A separate Sonatype survey of over 5,000 developers showed that only 45% of organizations with mature DevOps practices keep a complete software bill of materials (SBOM) for their applications. “The findings reveal up to 74% of the organizations with ‘immature practices’ would have no means of knowing if a newly disclosed vulnerability in an open-source component is even applicable to their software,” said the report. This means that organizations with immature practices that keep a complete SBOM would not know if they had used vulnerable open-source code or where to find newly announced vulnerabilities within their environments.

Given the vast volume of vulnerabilities being published every day on NVD, GitHub and other hosting sites, it would be very hard for developers and security professionals to keep up with this data without some automated solution. History shows that most organizations wait until after a security incident has taken place to step up their security efforts. However, as the old saying goes, an ounce of prevention is better than a pound of cure.

Implementing security early on by adopting a “shift left” approach when it comes your software development lifecycle can have tenfold returns and increase overall awareness for your developers.

2. Resolve dependency issues

Veracode’s 2020 State of Software Security report highlights a common software security issue. Rather than developers themselves, “interconnected dependencies” indirectly introduce latent risks within your applications that may slip under the radar of most developers. “Our data reveals that most flawed libraries end up in code indirectly. Forty-seven percent of the flawed libraries in applications are transitive – in other words, they are not pulled in directly by developers, but are being pulled in by the first library (42% are pulled in directly, 12% are both). This means that developers are introducing much more code, and often flawed code, than they might be anticipating,” read the report.

However, correcting this problem doesn’t seem to be a major undertaking according to Veracode: “Addressing the security flaws in these libraries is most often not a significant job. Most library-introduced flaws (nearly 75%) in applications can be addressed with only a minor version update. Major library upgrades are not usually required! This data point suggests that this problem is one of discovery and tracking, not huge refactoring of code.”

3. Automate code scanning to find unknown unknowns

The Octopus Scanner incident and other forms of open-source ecosystem abuse, such as typosquatting, have prompted repository maintainers like GitHub to enforce automatic scanning of open source projects they host. As reported this year, GitHub now integrates CodeQL-based automatic scanning of its open source repositories.

Justin Hutchings, GitHub senior product manager, told The Register, "It turns out that capability is extremely useful in security. Most security problems are bad data flow or bad data usage in one way or another."

In addition to identifying hidden vulnerabilities and bugs, scanning regularly sweeps open source projects looking for signs of data leakage, such as private keys and credentials inadvertently made public by the contributor. Since last year, a few vendors integrated automated scanning efforts into their products to identify malware published to legitimate open source repositories. These techniques incorporate behavioural analysis with machine learning to proactively hunt for “counterfeit components.”

Experimental open-source scanners (such as npm-scan) published by independent developers on a smaller scale have also emerged and serve similar purposes of detecting vulnerable components using heuristics. On the RubyGems front, continuous monitoring and analysis efforts of ReversingLabs is what led to the discovery of the 700-plus malicious components mentioned earlier.

Enabling such widespread security audits using automation tools can help augment trust and integrity issues within the open-source ecosystem, before the components make their way into your supply chain.

4. Beware of licensing risks

The key perk of using open-source software is the freedom offered by its permissive licenses. Should you discover a bug in an open-source package that hasn’t been fixed, you could fix it yourself rather than waiting on the vendor. You could tailor an open-source application in your project as you see fit and ship a customized version to your customers.

However, it takes a little more skill to be aware of any potential licensing conflicts that may arise from using open-source components. The 2020 Open Source Security and Risk Analysis Report published by Synopsys states:

“Declared license conflicts arise when a codebase contains open source components whose licenses appear to conflict with the overall license of the codebase. For example, code under the GNU General Public License v2.0 (GPLv2) will generally pose a conflict issue when compiled into a normally distributed piece of commercial software. But the same code is not a problem in software that is considered software-as-a-service, or SaaS.”

These conflicting terms can create confusion for developers using the same open-source applications in slightly different contexts. Some automation solutions can recognize numerous licenses and potential conflicts arising from them in addition to vulnerabilities and malicious components.

A Black Duck report found that 67% of the 2019 audited codebases contained components with license conflicts. That percentage was much higher for some industries such as internet and mobile apps (93%) “The GPL is one of the more popular open-source licenses, and its various versions can create license conflicts with other code in codebases. In fact, five of the top 10 licenses with conflicts were the GPL and its variants,” stated the report.

Copyright © 2020 IDG Communications, Inc.

7 hot cybersecurity trends (and 2 going cold)