Dependency confusion explained: Another risk when using open-source repositories

Attackers are now using a newly discovered flaw that allows them to trick development environments into pulling malicious packages. Here's what you need to know.

Triangular alert with exclamation mark amid abstract binary and sketches of scattered books.
WhataWin / Bigmouse108 / Getty Images

What is dependency confusion

Dependency confusion is a newly discovered logic flaw in the default way software development tools pull third-party packages from public and private repositories. Attackers can take advantage of this issue to trick a development environment to pull a malicious package the attackers published in a community repository instead of a custom package hosted in a private repository.

The complexities of the software supply chain

According to a 2020 study from Synopsys, over 99% of commercial applications used by enterprises contain open-source code and such code makes up at least 70% of their code base overall. This is thanks to the large ecosystem of third-party components and packages available for all programming languages. Java has the Central Repository, JavaScript has npm, Python has PyPI (Python Package Index), Ruby has RubyGems and so on. All of these are community maintained public repositories that development tools pull packages from when they are defined as dependencies.

The complex relationships among packages mean that pulling one component as a dependency into an application can result in importing tens or hundreds of others. Security researchers have long warned that this can be exploited by attackers, especially since the repositories are not well policed.

In the past, hackers have compromised developers of legitimate packages and injected malicious code into them, resulting in many other legitimate packages that had those as dependencies also being affected. Attackers also uploaded components with names that were similar to legitimate ones in the hope that developers might mistype the name when defining their application dependencies—an attack known as typosquatting.

How dependency confusion attacks work

In response to past attacks, public repository maintainers have taken additional steps including multi-factor authentication, banning certain package name variants, adding digital signatures and better policing the ecosystems. Recently, security researcher and bug hunter Alex Birsan devised a new technique that's hard to defend against.

Birsan realized that when building applications, developers will pull code packages and libraries from public repositories, but also private components that were developed in-house and are hosted in their company's private repositories or local feeds. Most package managers like JavaScript's npm, Python's pip and Ruby's gem and other development environments allow users to define additional sources for packages particularly for this reason. So, the question Birsan asked himself was: How do these tools handle situations when a package exists both in a public and local feed with the same name? Which one is chosen?

It turns out that by default many tools download and execute the package with the higher version number. This means that if attackers know that an application lists a package that doesn't exist in the public community-maintained index as a dependency, they can just publish a poisoned package with that same name and a higher version number, potentially forcing the target's package manager client to download and install it. It's also common knowledge that the installation of such packages can allow remote code execution on the local machine.

Birsan came up with the idea in the summer of 2020 when he was working with a friend on finding bugs in PayPal as part of the company's bug bounty program. Their research led them to review some code that was developed and used internally at PayPal but was hosted on a public repository.

"The code was meant for internal PayPal use, and, in its package.json file, appeared to contain a mix of public and private dependencies—public packages from npm, as well as non-public package names, most likely hosted internally by PayPal," Birsan said in a blog post that detailed the new attack technique, which he dubbed dependency confusion. "These names did not exist on the public npm registry at the time."

The researcher came up with a plan to create some packages with code that would phone back to him if they were ever executed on a server and then publish them on the npm index with the names of the private dependencies listed in the PayPal code. The code was designed to collect some basic information about the machine it got executed on, such as hostname, IP address and the username, and then exfiltrate it using DNS queries, since DNS is not filtered in many organizations.

The attack was successful and earned Birsan a $30,000 bounty from PayPal, giving him the idea to try the attack against other organizations that also have public bug bounty programs and therefore allow such testing. It turns out that obtaining private package names is not that hard and can be done by scanning websites for JavaScript files or public repositories on GitHub.

Birsan managed to use this technique to successfully execute code on servers belonging to over 35 organizations, including PayPal, Apple, Shopify, Netflix, Uber and Yelp, collecting bug bounties from many of them. "The success rate was simply astonishing," the researcher said. "From one-off mistakes made by developers on their own machines, to misconfigured internal or cloud-based build servers, to systemically vulnerable development pipelines, one thing was clear: Squatting valid internal package names was a nearly sure-fire method to get into the networks of some of the biggest tech companies out there, gaining remote code execution, and possibly allowing attackers to add backdoors during builds."

How to mitigate dependency confusion attacks

The most straightforward way to mitigate this problem is not to use hybrid configurations where both a public and a public feed is used, but to configure package managers to always use a private feed that is under the control and supervision of the development organization and where every package is scrutinized and verified. Some package managers, like Python's pip, have configuration options for this, such as --index-url, but it also has --extra-index-url, which only adds an index alongside the default public one and shouldn't be used.

The problem is that there are some third-party tools such as JFrog Artifactory that are used for package management and allow mixing private and public repositories into a virtual feed and are also affected by the same behavior. In response to Birsan's report, Microsoft, which also offers a package hosting service called Azure Artifacts, released a whitepaper with mitigation instructions.

"Even with this configuration, if your feed allows public packages to override private packages, a substitution attack may still be possible," Microsoft said in the paper. "You should either ensure your feed is configured to disallow this, claim your private packages' names on the public index, or use another mitigation."

Another mitigation is to use the namespace or scope feature that some package managers allow. These are prefixes that are owned by a user or organization and are applied to all their packages published in the public repositories.

"Scopes are a way of grouping related packages together, and also affect a few things about the way npm treats the package," the npm documentation reads. "Each npm user/organization has their own scope, and only you can add packages in your scope. This means you don't have to worry about someone taking your package name ahead of you. Thus it is also a good way to signal official packages for organizations."

This means that if PayPal would, for example, own the @paypal scope on npm and all its private package dependency definitions would include the scope in addition to the package name, no one would be able to hijack it on the public npm index. Organizations might already be using scopes internally, but for this mitigation to be efficient they also need to claim them on the open-source repositories. For example, Birsan registered several packages on npm under a scope privately used by Atlassian because it wasn't claimed.

Supply chain attacks in the open-source ecosystem

Because the issue appears to be widespread and mitigating it requires making significant changes to the configuration of development tools, package managers and workflows, it's likely it will take a long time until all impacted organizations learn about the attack and put defenses in place. Within 48 hours of Birsan's public report, DevOps automation and open-source governance company Sonatype found over 275 copycat npm packages published by different authors that imitated the researcher's attack.

The large ecosystem of open-source software components and the complex interdependencies between them makes software developers a very attractive target. According to a 2020 report by GitHub, JavaScript applications have 10 direct dependencies on average, but because those dependencies can in turn have other dependencies and so on, the median number of transitive dependencies for a JavaScript application is actually 683. Meanwhile, Sonatype observed a 430% increase in the number of upstream software supply chain attacks over the past year using various techniques.

Copyright © 2021 IDG Communications, Inc.

How to choose a SIEM solution: 11 key features and considerations