Offline Multimodal AI Redaction of Sensitive Data in Audio and Software Artifacts

Kaiser, Etienne and Heiniger, Nico (2025) Offline Multimodal AI Redaction of Sensitive Data in Audio and Software Artifacts. Other thesis, OST Ostschweizer Fachhochschule.

[thumbnail of HS 2025 2026-SA-EP-Kaiser-Heiniger-KI-gestützte Erkennung von sensiblen Informationen in Source.pdf] Text
HS 2025 2026-SA-EP-Kaiser-Heiniger-KI-gestützte Erkennung von sensiblen Informationen in Source.pdf - Supplemental Material

Download (1MB)

Abstract

This research investigates the feasibility of an offline, open-source system for automatic detection and redaction of sensitive data in both audio transcriptions and software artifacts. Cloud-based services for Speech-to-Text (STT) transcription and data redaction raise privacy concerns for organizations handling personally identifiable information (PII) or authentication secrets. The prototype developed combines Faster Whisper for speech recognition with Microsoft Presidio and custom pattern recognizers for entity recognition and redaction. Two experiments evaluate system performance: (1) comparing PII redaction accuracy between text-only and STT-transcribed inputs, and (2) assessing secret detection in log and code files. Results demonstrate 97% recall for text-based PII redaction and 99% for audio-transcribed content, with the STT pipeline introducing a slight masking effect through transcription variations. Secret redaction achieves 90% recall, with challenges in detecting high-entropy tokens like API keys. The findings confirm the viability of offline redaction pipelines while identifying domain-specific limitations that allow further research into fine-tuned machine-learned detection models.

Item Type: Thesis (Other)
Subjects: Topics > Internet Technologies and Applications > Voice Recognition
Topics > Communication Systems
Area of Application > Security
Technologies > Programming Languages > Python
Metatags > INS (Institute for Networked Solutions)
Divisions: Bachelor of Science FHO in Informatik > Student Research Project
Depositing User: OST Deposit User
Date Deposited: 26 Feb 2026 09:04
Last Modified: 26 Feb 2026 09:04
URI: https://eprints.ost.ch/id/eprint/1369

Actions (login required)

View Item
View Item