Malicious package flood on PyPI might be sign of new attacks to come

The PyPI package flood is just the latest in a string of attacks on public repositories with the intent to plant malicious code.

Profile photo of a developer / programmer reviewing code on monitors in his workspace.
Roman Samborskyi / Shutterstock

Over the weekend an attacker has been uploading thousands of malicious Python packages on the public PyPI (Python Package Index) software repository. If executed on a Windows system, these packages will download and install a Trojan program hosted on Dropbox.

Flooding public package repositories with malicious packages is not entirely new. Last year researchers detected a group of 186 packages from the same account on the JavaScript npm repository that were designed to install cryptomining software on Linux systems. However, according to researchers on Twitter, this new incident on PyPI was much larger in scope and involved over 5,000 packages, as the attacker kept pushing new ones as the PyPI maintainers were finding and removing the already published ones. So this might be a sign of future attacks to come.

According to software supply chain management company Sonatype, which has been tracking the attack, this seems more like a spray-and-pray attack with the intent to poison search results for various terms on PyPI rather than a targeted attack against users of existing legitimate packages. However, it's also plausible that the attackers might be flooding PyPI with a high number of packages to bury a few "potent typosquats" in the noise, Sonatype staff researcher Ax Sharma tells CSO. Typosquatting is an attack technique that is common on public repositories, where attackers will publish packages with names very similar to legitimate packages, usually including a common typo or slight variation in the name that developers are likely to fall for.

The names used by the thousands of packages uploaded so far seem randomly generated by mixing up popular terms including company names such as PayPal or Nvidia, but also common prefixes like py- for Python or lib for library. "Confusion arises in cases when a developer may 'know' offhand the name of a popular library, for example, from previously having read docs or given a library’s prominence in the industry, but then being presented with two versions of the package when navigating to PyPI," Sharma explains. "For example, a 'mydesiredtool' (real) vs. a 'py-mydesiredtool' (malicious typosquat), with the latter implicitly touting itself as the Python-specific version."

Another attack technique that such a flood could be a cover for is dependency confusion. It involves attackers registering malicious packages on public repositories using the names of packages that they determine from other sources that companies use internally. Those internally developed packages are usually hosted on internal repositories and are pulled in when building applications along, but they're not published on public repositories like PyPI because they're not meant for public consumption.

Package installers and management tools -- pip in the case of Python -- have their own internal package selection logic when faced with two packages of the same name from two different defined repositories and will often favor the one with the higher version number, which might be the malicious one published by attackers on PyPI. It's hard to say if any of the thousands of packages uploaded over the past few days during this campaign were named after internal packages a company uses.

Even if this package flood is not used as a distraction for a more subtle and targeted attack, the search result poisoning aspect alone is dangerous. A developer looking for a package that performs a particular task could stumble onto one of these malicious packages.

Luckily, the attackers in this case were not very clever and published the packages from the same email address and used the same package description for them all. This helped the PyPI maintainers quickly identify and remove them. "Using different, randomly generated descriptions or descriptions for each malicious package stolen from a few different legitimate packages would make this a much more convincing campaign for the victim, and a much more difficult task for defenders to track,” says Sharma. "Combining this tactic with techniques such as using different (malicious) author accounts, email addresses, and IP addresses for publishing these packages makes these campaigns even harder to monitor."

The malicious payload and indicators of compromise

The file in the malicious PyPI packages used in this attack contained a payload encoded in base64 for obfuscation, which involved the execution of a PowerShell command on Windows systems. The command was designed to reach out to a Dropbox URL and download an executable file called Esquele.exe, save it locally with the name WindowsCache.exe and then execute it. While the URL that was serving the file was taken down by Dropbox, the Sonatype researchers recovered the payload from VirusTotal where someone uploaded it.

According to Sonatype, it is a Windows Trojan program with information stealing capabilities. It was also observed with the name update.exe and the attacker who launched the attack uses the moniker EsqueleSquad and even included their website in the packages.

Defending against malicious package floods

One defense when dealing with a flood of submissions from the same source is to enforce rate limiting -- putting a limit on the number of submissions over a particular length of time. However, rate limiting is not as efficient if, like Sharma said, future attackers decide to use multiple author accounts, different IP addresses, and so on. Cleaning up after an attack that deals with different package names, different descriptions and different metadata in general is much more time consuming, and PyPI is run by volunteers.

According to Sharma, a better solution is to introduce something like the verified namespaces that the Maven Central repository for Java uses. For example, the namespace for the Tomcat server on Central is org.apache.tomcat -- this is the reverse of the real domain name and if anyone wants to publish any package under that namespace they have to prove control over that domain name at the DNS level, which only the Apache Software Foundation (ASF) can do. Even perform email verification for new accounts would be an important step and other methods that would make the automatic creation of a large number of accounts more difficult.

Copyright © 2023 IDG Communications, Inc.

7 hot cybersecurity trends (and 2 going cold)