Kaiser, Etienne and Heiniger, Nico (2025) Offline Multimodal AI Redaction of Sensitive Data in Audio and Software Artifacts. Other thesis, OST Ostschweizer Fachhochschule.
HS 2025 2026-SA-EP-Kaiser-Heiniger-KI-gestützte Erkennung von sensiblen Informationen in Source.pdf - Supplemental Material
Download (1MB)
Abstract
This research investigates the feasibility of an offline, open-source system for automatic detection and redaction of sensitive data in both audio transcriptions and software artifacts. Cloud-based services for Speech-to-Text (STT) transcription and data redaction raise privacy concerns for organizations handling personally identifiable information (PII) or authentication secrets. The prototype developed combines Faster Whisper for speech recognition with Microsoft Presidio and custom pattern recognizers for entity recognition and redaction. Two experiments evaluate system performance: (1) comparing PII redaction accuracy between text-only and STT-transcribed inputs, and (2) assessing secret detection in log and code files. Results demonstrate 97% recall for text-based PII redaction and 99% for audio-transcribed content, with the STT pipeline introducing a slight masking effect through transcription variations. Secret redaction achieves 90% recall, with challenges in detecting high-entropy tokens like API keys. The findings confirm the viability of offline redaction pipelines while identifying domain-specific limitations that allow further research into fine-tuned machine-learned detection models.
| Item Type: | Thesis (Other) |
|---|---|
| Subjects: | Topics > Internet Technologies and Applications > Voice Recognition Topics > Communication Systems Area of Application > Security Technologies > Programming Languages > Python Metatags > INS (Institute for Networked Solutions) |
| Divisions: | Bachelor of Science FHO in Informatik > Student Research Project |
| Depositing User: | OST Deposit User |
| Date Deposited: | 26 Feb 2026 09:04 |
| Last Modified: | 26 Feb 2026 09:04 |
| URI: | https://eprints.ost.ch/id/eprint/1369 |
