Development of a scalable and secure RAG-as-a-Service infrastructure

Ammann, Lukas and Ott, Sara (2025) Development of a scalable and secure RAG-as-a-Service infrastructure. Other thesis, OST Ostschweizer Fachhochschule.

[thumbnail of FS 2025-BA-EP-Ammann-Ott-Development of a scalable and secure RAG-as-a-Service infras.pdf]

Text
FS 2025-BA-EP-Ammann-Ott-Development of a scalable and secure RAG-as-a-Service infras.pdf - Supplemental Material
Download (3MB)

Abstract

Large Language Models (LLMs) have become very popular with the introduction of chatbots such as ChatGPT or Gemini. LLMs are very good at Natural Language Processing (NLP), which means they have the ability to interpret and communicate in human language. However, they are limited to the knowledge used during training, so it is difficult and resource-intensive to keep them up-to-date and/or to integrate domain-specific knowledge. In addition, LLMs tend to hallucinate and give inaccurate answers when the specific data is not available in the language model.

To overcome these limitations, Retrieval-Augmented Generation (RAG) has been introduced. This novel approach facilitates the incorporation of up-to-date and domain-specific knowledge, while reducing the hallucination of LLMs by providing missing information in a targeted manner. These substantial benefits have led to the popularity of RAG.

One of the most pressing concerns in many RAG implementations is the security and privacy of the data involved, especially when handling sensitive or classified information. Ensuring that data remains within authorized boundaries, maintaining full traceability, and preventing unauthorized data exposure are critical requirements.

To address these challenges, we propose an architectural blueprint and core functionality for a secure and scalable RAG-as-a-Service infrastructure. This design emphasizes local data processing and containment within system boundaries, enabling predictable data flows and robust privacy protection. The system incorporates the security risks and mitigation strategies identified in our prior research, ensuring adaptability and resilience through a modular and customizable core framework. Furthermore, the architecture is designed for seamless scalability and to host multiple systems on a single infrastructure. This makes it suitable for a wide range of use cases and deployment scenarios.

The system's core components were developed using a microservice-based design and deployed via Kubernetes to ensure scalability and adaptability. Security was a central concern throughout the implementation process. In addition to encrypting all external traffic, we integrated a modern authentication solution based on the OAuth 2.0 and OpenID Connect standards to safeguard our RAG system. The resulting platform is fully operational and will be used during our hands-on workshop at the IEEE Swiss Conference on Data Science (SDS2025) on June 26, 2025, at the Circle Convention Center, Zurich Airport. Additional steps included comprehensive system testing and thorough preparation for the upcoming workshop.

Keywords: Retrieval-Augmented Generation (RAG), RAG Security, RAG-as-a-Service, Data Security, Privacy, Scalable Architecture, Secure AI Systems, Local RAG Pipeline, Large Language Models (LLMs), Natural Language Processing (NLP), Microservice Architecture, Docker, Kubernetes, Workshop, SDS2025, IEEE

Item Type:	Thesis (Other)
Subjects:	Topics > Security Area of Application > Business oriented Area of Application > Web based Technologies > Programming Languages > Python Technologies > Virtualization > Docker
Divisions:	Bachelor of Science FHO in Informatik > Bachelor Thesis
Depositing User:	OST Deposit User
Date Deposited:	29 Sep 2025 10:47
Last Modified:	29 Sep 2025 10:47
URI:	https://eprints.ost.ch/id/eprint/1297

Actions (login required)

: View Item