ToxiShield | Showvik Biswas

Collaborators: Dr. Amiangshu Bosu, Dr. Anindya Iqbal, Dr. Jaydeb Sarker, Awsaf Alam

Under Review for Publication at FSE 2026

ToxiShield is a real-time framework that detects, classifies toxicity in GitHub pull request comments and mitigates them by suggesting a semantically equivalent, non-toxic alternative. We trained lightweight models such as BERT-base-uncased to prepare our toxicity detector. For classification into subcategories of toxicity, and detoxification we used Large language Models (LLMs). Using a teacher-student framework, we reaped the benefits of proprietary LLMs such as GPT-4o by distilling their knowledge into much smaller open-source models such as Llama 3.2. Our results showed that the smaller open-source models outperformed their teacher models across multiple metrics of text style transfer. To make our framework usable, we condensed the entire workflow into a simple browser extension that can work seamlessly with GitHub.

Enjoy Reading This Article?