ToxiShield
Collaborators: Md. Awsaf Alam Anindya, Dr. Anindya Iqbal, Dr. Amiangshu Bosu, Dr. Jaydeb Sarker
Accepted to ACM International Conference on the Foundations of Software Engineering, 2026
ToxiShield is a real-time framework that detects, classifies toxicity in GitHub pull request comments and mitigates them by suggesting a semantically equivalent, non-toxic alternative. We trained lightweight models such as BERT-base-uncased to prepare our toxicity detector. For classification into subcategories of toxicity, and detoxification we used Large language Models (LLMs). Using a teacher-student framework, we reaped the benefits of proprietary LLMs such as GPT-4o by distilling their knowledge into much smaller open-source models such as Llama 3.2. Our results showed that the smaller open-source models outperformed their teacher models across multiple metrics of text style transfer. To make our framework usable, we condensed the entire workflow into a simple browser extension that can work seamlessly with GitHub.
Enjoy Reading This Article?
Here are some more articles you might like to read next: