Researchers from software supply chain security firm Rezilion have investigated the security posture of the 50 most popular generative AI projects on GitHub. They found that the more popular and newer a generative AI open-source project is, the less mature its security is. Rezilion used the Open Source Security Foundation (OpenSSF) Scorecard to evaluate the large language model (LLM) open-source ecosystem, highlighting significant gaps in security best practices and potential risks in many LLM-based projects. The findings are published in the Expl[AI]ning the Risk report, authored by researchers Yotam Perkal and Katya Donchenko.

The emergence and popularity of generative AI technology based on LLMs has been explosive, with machines now possessing the ability to generate human-like text, images, and even code. The number of open-source projects integrating these technologies has grown significantly. For example, there are currently more than 30,000 open-source projects on GitHub using the GPT-3.5 family of LLMs, despite OpenAI only debuting ChatGPT seven months ago.

Despite their demand, generative AI/LLM technologies introduce security issues ranging from the risks of sharing sensitive business information with advanced self-learning algorithms to malicious actors using them to significantly enhance attacks. Earlier this month, the Open Worldwide Application Security Project (OWASP) published the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence. Examples of vulnerabilities included prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution.

What is the OpenSSF Scorecard?

The OpenSSF Scorecard is a tool created by the OpenSSF to assess the security of open-source projects and help improve them. The metrics it bases the assessment on are different facts about the repository such as the number of vulnerabilities it has, how often it's maintained, and if it contains binary files. By running Scorecard on a project, different parts of its software supply chain will be checked, including the source code, build dependencies, testing, and project maintenance.

The purpose of the checks is to ensure adherence to security best practices and industry standards. Each check has a risk level associated with it, representing the estimated risk associated with not adhering to a specific best practice. Individual check scores are then compiled into a single aggregate score to gauge the overall security posture of a project.

Currently, there are 18 checks that can be divided into three themes: holistic security practices, source code risk assessment, and build process risk assessment. The Scorecard assigns an ordinal scale between 0 to 10 and a risk level score for each check. A project with a score nearing 10 indicates a highly secure and well-maintained posture, whereas a score approaching 0 represents a weak security posture with inadequate maintenance and increased vulnerability to open-source risks.