ToxiShield
Aug 2024 – Present
Collaborators: Dr. Amiangshu Bosu, Dr. Anindya Iqbal, Dr. Jaydeb Sarker, Awsaf Alam
Under Review for Publication at FSE 2026
ToxiShield is a real-time framework that detects, classifies toxicity in GitHub pull request comments and mitigates them by suggesting a semantically equivalent, non-toxic alternative. We trained lightweight models such as BERT-base-uncased to prepare our toxicity detector. For classification into subcategories of toxicity, and detoxification we used Large language Models (LLMs). Using a teacher-student framework, we reaped the benefits of proprietary LLMs such as GPT-4o by distilling their knowledge into much smaller open-source models such as Llama 3.2. Our results showed that the smaller open-source models outperformed their teacher models across multiple metrics of text style transfer. To make our framework usable, we condensed the entire workflow into a simple browser extension that can work seamlessly with GitHub.
- software engineering
- prompt designing
- large language models
- nlp
•
•
•
ColocEM (Undergraduate Thesis)
Aug 2023 – Present
Collaborators: Dr. Mohammad Saifur Rahman, Dr. Md. Abul Hassan Samee
We are working on Stereo-Seq data, which provides high resolution transcriptomics data in both genome and cellular space, to model and explain the complex cell-cell communication mechanism occuring in the brain tissue of an injured axolotl telencephalon. Our work involves leveraging the degree of spatial coexistence of several cell types along with ligand-receptor analysis to predict gene expressions using a neural network model.
- bioinformatics
- cell-cell communication
•