Ammann, Lukas and Ott, Sara (2024) Analysis of Risks and Mitigation Strategies in RAG. Other thesis, OST Ostschweizer Fachhochschule.
HS 2024 2025-SA-EP-Ammann-Ott-Retrieval Augmented Generation (RAG) Use-cases and Architec.pdf - Supplemental Material
Download (6MB)
Abstract
Large Language Models (LLMs) have become incredibly popular with the introduction of chatbots such as ChatGPT or Gemini. LLMs are very good at Natural Language Processing (NLP), which means they have the ability to understand and communicate in human language. However, they are limited to the knowledge used during training, so it is difficult and resource-intensive to keep them up-to-date and/or to integrate domain-specific knowledge. In addition, LLMs tend to hallucinate and give inaccurate answers when the specific data is not available in the language model.
To address these issues, Retrieval-Augmented Generation (RAG) has been introduced. This novel approach facilitates the incorporation of up-to-date and domain-specific data, while reducing the hallucination of LLMs by providing missing information in a targeted manner. These substantial benefits have led to the popularity of RAG.
While this approach offers significant benefits, at the same time it introduces new security challenges to the development and operation of RAG systems, that need to be addressed. Since this is a relatively new topic, getting an overview of the risks and mitigation strategies can be tedious. The information is scattered across many sources and each risk and mitigation strategy found needs to be evaluated individually to determine if it applies to one's RAG implementation.
We fill this gap by presenting a high-level framework (called a landscape) for systematically identifying and evaluating privacy and security related risks associated with RAG systems. It also outlines potential mitigation strategies tailored to these risks, thereby providing possible approaches for protecting RAG systems. By consolidating and analyzing current research and practice, we provide a risk and mitigation landscape that facilitates risk assessment and helps secure RAG pipelines, thereby supporting the responsible use of this promising technology.
Keywords: Retrieval-Augmented Generation (RAG), Framework, Security Risks, Mitigation Strategies, Large Language Model (LLM)
| Item Type: | Thesis (Other) |
|---|---|
| Subjects: | Technologies > Frameworks and Libraries Technologies > Security |
| Divisions: | Bachelor of Science FHO in Informatik > Student Research Project |
| Depositing User: | OST Deposit User |
| Date Deposited: | 18 Feb 2025 12:28 |
| Last Modified: | 18 Feb 2025 12:28 |
| URI: | https://eprints.ost.ch/id/eprint/1255 |
