PyTorch suffers supply chain attack via dependency confusion

A rogue packet on the machine learning framework allowed the attacker to exfiltrate data, including SSH keys.

A lost businessman wanders amid conflicting directional signs through the fog.
Gremlin / Getty Images

Users who deployed the nightly builds of PyTorch between Christmas and New Year's Eve likely received a rogue package as part of the installation that siphoned off sensitive data from their systems. The incident was the result of an attack called dependency confusion that continues to impact package managers and development environments if hardening steps are not taken.

"If you installed PyTorch nightly on Linux via pip between December 25, 2022, and December 30, 2022, please uninstall it and torchtriton immediately, and use the latest nightly binaries (newer than December 30, 2022)," the PyTorch maintainers said in a security advisory.

The malicious torchtriton library

PyTorch is a framework for developing machine learning applications in the fields of computer vision and natural language processing that is a continuation of the older and no longer maintained Torch library. PyTorch was originally developed by Meta AI, the artificial intelligence laboratory of Meta, Inc., but is now an open-source project maintained by the PyTorch Foundation under the Linux Foundation's umbrella.

As with most Python programs, PyTorch can be installed via pip, a package management tool and installer that uses the public PyPi (Python Package Index) as its main repository. However, like most package management tools, pip allows users to define additional repositories, a feature commonly used by organizations to host internally developed components that are used in their applications and are not meant for public release.

PyTorch's dependency chain—additional packages that are downloaded during its installation—includes a library called torchtriton that was hosted on PyTorch's own index for nightly builds. Until December 25, there was no torchtriton library on PyPi, so pip looked for it on PyTorch's alternate repository.

However, an attacker decided to register the torchtriton package name on PyPi and upload a malicious package, which in turn tricked the installation routine for PyTorch's nightly builds to download the rogue version from PyPi. The PyTorch stable builds were not affected.

"Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository," the PyTorch maintainers said. "This design enables somebody to register a package by the same name as one that exists in a third-party index, and pip will install their version by default."

The malicious torchtriton package was designed to collect information about the system, such as used nameservers, computer hostname, current username, working directory and environment variables. It also read the contents of /etc/hosts (internally defined hosts), /etc/passwd (local users list), files in the user's home directory, .gitconfig directory and .ssh directory which includes SSH keys. All this information was then uploaded to a remote server via encrypted DNS queries—a stealthy way to exfiltrate data.

The PyTorch maintainers published a command that admins can use to scan their systems for the malicious torchtriton version. If the rogue package is found, it should be removed immediately, and other steps should be taken to change any potentially compromised credentials or keys.

PyTorch removed its dependency on torchtriton and replaced it with pytorch-triton, a renamed version of the previous package. While this package is still served from its own index, the project maintainers also registered the pytorch-triton name on PyPi so a similar hijack can't occur in the future.

All PyTorch builds with a torchtriton dependency have been removed from distribution and the torchtriton package has been removed from PyPi in the meantime. However, based on automated tracking by supply chain security Snyk, the rogue torchtriton package was downloaded 2,717 times—2,500 downloads occurred on December 26 alone. This indicates how many systems were potentially impacted.

Furthermore, the torchtriton package description on PyPi was: "This is not the real torchtriton package but uploaded here to discover dependency confusion vulnerabilities." The domain where the data was exfiltrated has a notice that reads: "Hello, if you stumbled on this in your logs, then this is likely because your Python was misconfigured and was vulnerable to a dependency confusion attack. To identify companies that are vulnerable the script sends the metadata about the host (such as its hostname and current working directory) to me. After I've identified who is vulnerable and reported the finding all of the metadata about your server will be deleted."

While this suggests the rogue package was possibly the creation of a bug hunter looking to prove a vulnerability, exfiltrating home directory files, SSH keys, and Git configurations from systems is suspicious behavior.

Dependency confusion remains a risk

Exploiting the automated logic in package managers that decides which package version should be downloaded and from where is not new. This is an attack technique known as dependency confusion that was first disclosed by security researcher Alex Birsan in 2021 and impacts not just pip and PyPi, but also npm and potentially other package managers.

However, the issue is not as simple as the package manager prioritizing a public index over a private one. In fact, the pip developers have been discussing this issue since July 2020, before Birsan even highlighted his attack publicly, and have pointed out that pip actually treats indexes equally. However, the package manager will try to find the best match for the package across all specified indexes, which can include factors such as package version and compatibility tags.

Npm addressed the issue by introducing the concept of "scope," which offers a way to bundle packages together under the same organizational namespace. Organizations can register a scope on the public npm repository to make sure only they can publish packages under that namespace and then have all their packages, public or internal use the scope. In that way internal packages don't have to be published on npm or no dummy entries need to be created for them on npm to prevent hijacking either, since no one else can publish packages under that same scope.

Meanwhile, pip developers don't seem willing to introduce repository/index priorities, leaving that functionality up to external development tools such as devpi, Pipenv, or JFrog Artifactory.

Copyright © 2023 IDG Communications, Inc.

7 hot cybersecurity trends (and 2 going cold)