The Security Risks Associated with Open-Source Large Language Models (LLMs)
A surge in the popularity of Large Language Models (LLMs) and Generative AI, such as GPT engines, has taken the AI industry by storm. Retail individuals and corporations are eager to explore and leverage this new technology. However, as LLMs gain traction in various use cases, it becomes crucial to address the security aspects and associated risks, especially when it comes to open-source LLMs.
In a recent research conducted by Rezilion, a renowned automated software supply chain security platform, experts sought to investigate this very issue and their findings offer surprising insights. To ensure a comprehensive analysis, they considered projects that met the following criteria:
- Projects created within the last eight months (approximately from November 2022 to June 2023)
- Projects related to LLM, ChatGPT, Open-AI, GPT-3.5, or GPT-4
- Projects with at least 3,000 stars on GitHub
These criteria ensured the inclusion of major projects in the research.
For their analysis, the experts utilized a framework called OpenSSF Scorecard, developed by the Open Source Security Foundation (OSSF). This framework assesses the security of open-source projects, taking into account factors such as vulnerabilities, maintenance frequency, and the presence of binary files.
The OpenSSF Scorecard evaluates projects based on three themes: holistic security practices, source code risk assessment, and build process risk assessment. Each check within these themes is assigned a risk level, reflecting the estimated risk associated with not adhering to a specific best practice.
It turns out that the majority of these open-source LLMs and projects exhibit significant security concerns, which can be categorized as follows:
1. Trust Boundary Risk
This category encompasses risks related to trust boundaries. It includes issues such as inadequate sandboxing, unauthorized code execution, SSRF vulnerabilities, insufficient access controls, and prompt injections. These risks can enable malicious NLP commands to cross multiple channels, potentially compromising the entire software chain. An example of this is the CVE-2023-29374 vulnerability in LangChain, which is the third most popular open-source GPT project.
2. Data Management Risk
Data leakage and training data poisoning fall under the data management risks category. While these risks are not specific to LLMs, they are relevant to any machine learning system. Training data poisoning refers to the deliberate manipulation of an LLM’s training data or fine-tuning procedures by an attacker to introduce vulnerabilities or biases that undermine the model’s security and reliability.
3. Inherent Model Risk
Inherent model risks stem from limitations in the underlying ML model. These risks include inadequate AI alignment and overreliance on LLM-generated content.
4. Basic Security Best Practices
This category covers issues related to general security best practices, such as improper error handling and insufficient access controls. These issues are not unique to LLMs but can impact any machine learning model.
Remarkably, the security scores of these models are less than impressive. The average score among the assessed projects was just 4.6 out of 10, with an average age of 3.77 months and an average number of stars of 15,909. Projects that gain popularity quickly are at a higher risk compared to those developed over a longer period.
Rezilion not only identified the security issues plaguing these projects but also proposed measures to mitigate these risks and ensure long-term safety. By conducting comprehensive risk assessments and implementing robust security measures, organizations can effectively utilize open-source LLMs while safeguarding sensitive information and maintaining a secure environment.
The research conducted by Rezilion sheds light on the security risks associated with open-source Large Language Models (LLMs). It highlights the need for organizations to prioritize security protocols and take steps to mitigate these risks. By addressing the vulnerabilities and implementing security best practices, companies can harness the potential of LLMs while maintaining a secure ecosystem.
For the latest news in AI and technology, visit GPT News Room.