Open-source packages with large language model (LLM) capabilities have many dependencies that make calls to security-sensitive APIs, according to a new Endor Labs report. Credit: getty As applications increasingly include prepackaged software components that take advantage of generative AI capabilities based on large language models (LLMs), the danger of vulnerabilities making it into production looms greater than ever, according to research by cybersecurity company Endor Labs. A new report dubbed "State of Dependency Management 2023" notes that even though application developers use only a fraction of these packages -- components such as libraries and modules designed for easy use and installation into software programs -- they have numerous dependencies and may make risky API calls. "LLMs are a great support for many day-to-day programming tasks. However, it is important for developers to verify the output provided by LLMs before including it in production code," said Henrik Plate, security researcher at Endor Labs and the author of the report. For the research, Endor Labs used the Census II data set from the Linux Foundation and Harvard, the company’s in-house API categories and vulnerability database, open source Github repositories, and packages published in the npm and PyPI package repositories. Onslaught of LLM/AI enabled packages While tracking newly published packages uploaded to the npm and PyPI repositories that made calls to the OpenAI API, Endor Labs found, since the launch of ChatGPT's API in January 2023, more than 636 new PyPI and npm packages created to use the API. Additionally, 276 already existing packages added support for ChatGPT API. The research also noted that this is just a subset of the total number of ChatGPT-enabled packages, as the number of private projects experimenting with LLMs is even bigger. When Github repositories for its Top 100 AI projects were scanned, they were found to reference, on average, 208 direct and transitive dependencies. Eleven percent of the projects were found relying on 500 plus dependencies. Fifteen percent of these Github repositories contain 10 or more known vulnerabilities. The package distributed by Hugging Face Transformers (the architecture that ChatGPT is based on) has over 200 dependencies, which include four known vulnerabilities. Dependencies make calls to security-sensitive APIs Fifty-five percent of applications tracked by Endor make calls to security-sensitive APIs -- programming interfaces that link to critical resources which, if compromised, could affect the security of an asset. That number grows to 95%, however, when the dependencies of software component packages are tracked. "Every considerable application includes dependencies that call into a big share of JCL's -- Java Class Library, which comprises the core APIs provided by the Java runtime -- sensitive APIs," Plate said. The research further revealed that 71% of Census II java packages call five or more categories of security sensitive APIs when all the dependencies are considered. "Applications often use only a small portion of the open-source components they integrate, and developers rarely understand the cascading dependencies of components," Plate added. "In order to satisfy transparency requirements while protecting brand reputation, organizations need to go beyond basic SBOMs." Just knowing which components are included for production isn’t effective anymore -- understanding which functions the components use is critical too, according to the report. LLM still bad at malware detection Endor Labs used LLM models from OpenAI and Google Vertex AI to evaluate how they can be used to help classify malware. For the evaluation, both LLMs were presented with identical code snippets as prompts to rate their malicious potential on a scale of 0-9. "We were interested to learn how consistent their results were for 3,374 test cases. On considering a scoring difference of 0 or 1 to be agreement, we found they agreed on 89% of the cases," the report said. But both the models fell considerably short at effectively classifying the malware and produced a huge number of false positives. While OpenAI GPT3.5 accurately classified 3.4% of the code snippets, Vertex AI text-bison was 7.9% accurate. "The main culprit was minified/packed JavaScript, i.e., JavaScript code that was more or less heavily changed in order to save space/bandwidth when transmitting it to a user's browser," Plate said. "Unfortunately, minification and packing is pretty common and do not only exist in npm packages but also in Python packages that bundle some sort of UI. Very often, the LLMs classified such code as malicious just because it looks obfuscated." "With this number of false positives, the feedback of LLMs becomes almost useless, which is a pity because the feedback for non-obfuscated code is oftentimes very good," Plate added. He further noted that while preprocessing such code can work at times to reduce false positives, the obfuscation example indicates LLMs struggle with complex programming logic. Adversaries can use this limitation to evade detection, which may lead to undetected malware, false negatives. "In my opinion, the biggest takeaway from this analysis is that general purpose LLMs, like GPT, shouldn't be relied upon for specialized purposes," said Katie Norton, an analyst at IDC. "It is still a tricky area for using generative AI because oftentimes when identifying malware you are looking for unknowns, which is something you can't train a model on." Related content news UK government plans 2,500 new tech recruits by 2025 with focus on cybersecurity New apprenticeships and talent programmes will support recruitment for in-demand roles such as cybersecurity technologists and software developers By Michael Hill Sep 29, 2023 4 mins Education Industry Education Industry Education Industry news UK data regulator orders end to spreadsheet FOI requests after serious data breaches The Information Commissioner’s Office says alternative approaches should be used to publish freedom of information data to mitigate risks to personal information By Michael Hill Sep 29, 2023 3 mins Government Cybercrime Data and Information Security feature Cybersecurity startups to watch for in 2023 These startups are jumping in where most established security vendors have yet to go. By CSO Staff Sep 29, 2023 19 mins CSO and CISO Security news analysis Companies are already feeling the pressure from upcoming US SEC cyber rules New Securities and Exchange Commission cyber incident reporting rules don't kick in until December, but experts say they highlight the need for greater collaboration between CISOs and the C-suite By Cynthia Brumfield Sep 28, 2023 6 mins Regulation Data Breach Financial Services Industry Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe