Hate Speech Detection with LLM

Peng, Kailing and Derungs, Lukas (2024) Hate Speech Detection with LLM. Other thesis, OST Ostschweizer Fachhochschule.

Text
HS 2024 2025-SA-EP-Derungs-Peng-Hate Speech Detection with LLM.pdf - Supplemental Material
Download (2MB)

Abstract

The increasing acceptance of hate expression, particularly through the internet and social media, has significantly amplified the presence of online hate speech. This phenomenon negatively impacts psychological health and can incite violence, necessitating effective detection methods to prevent such content from being posted or shared. A recent trend involves leveraging the large context windows of new Large Language Models (LLMs) to enable In-Context Learning (ICL) where correct examples showcased in the prompt window. Studies have demonstrated the ability of LLMs in hate speech detection using few-shot strategies, comparing their performance to fine-tuned models. However, bias is a prevalent issue in pre-trained LLMs, requiring identification and mitigation specific to the task or dataset.
In this study, an experiment is conducted on text-based hate speech detection using ICL for two LLMs (GPT-4.o mini and Llama 3.1 8B) alongside a fine-tuned BERT model, forming a voting ensemble. The objective of this study is to assess whether bagging with majority voting can balance the strengths and weaknesses of individual models, thereby mitigating specific biases and achieving fairer, more accurate categorization of hate speech, offensive language and normal language.
The study results demonstrate that, while all LLMs exhibit different biases toward various labels (e.g., normal language being marked as offensive), majority voting effectively reduces bias and improves accuracy in categories where the participating models were close in performance. However, in scenarios where the performance of the participating models varies drastically, for example in the category offensive in this study, majority voting does not outperform the best single model. Furthermore, the voting ensemble encountered cases where a draw occurred, comprising about 5.4% of the total 13’229 data entries. Separate considerations were made for these ambiguous cases - either excluding them from the evaluation or replacing them with the best model’s decision. Excluding ambiguous data entries helped improve accuracy and reduce bias in misclassification but did not fully achieve the task's purpose of reaching a comprehensive result. This study underscores the potential of ensemble approaches in enhancing the fairness and accuracy of hate speech detection systems.

Item Type:	Thesis (Other)
Subjects:	Topics > Other Area of Application > Statistics Technologies > Communication
Divisions:	Bachelor of Science FHO in Informatik > Student Research Project
Depositing User:	OST Deposit User
Contributors:	Contribution Name Email Thesis advisor Politze, Daniel UNSPECIFIED
Date Deposited:	18 Feb 2025 12:29
Last Modified:	18 Feb 2025 12:29
URI:	https://eprints.ost.ch/id/eprint/1265

Actions (login required)

: View Item