research

past and ongoing research work

ToxiShield

Aug 2024 – Mar 2026

Collaborators: Md. Awsaf Alam Anindya, Dr. Anindya Iqbal, Dr. Amiangshu Bosu, Dr. Jaydeb Sarker

Accepted to ACM International Conference on the Foundations of Software Engineering, 2026

ToxiShield is a real-time framework that detects, classifies toxicity in GitHub pull request comments and mitigates them by suggesting a semantically equivalent, non-toxic alternative. We trained lightweight models such as BERT-base-uncased to prepare our toxicity detector. For classification into subcategories of toxicity, and detoxification we used Large language Models (LLMs). Using a teacher-student framework, we reaped the benefits of proprietary LLMs such as GPT-4o by distilling their knowledge into much smaller open-source models such as Llama 3.2. Our results showed that the smaller open-source models outperformed their teacher models across multiple metrics of text style transfer. To make our framework usable, we condensed the entire workflow into a simple browser extension that can work seamlessly with GitHub.

ColocEM (Undergraduate Thesis)

Aug 2023 – Dec 2025

Collaborators: Dr. Mohammad Saifur Rahman, Dr. Md. Abul Hassan Samee

We worked on Stereo-Seq data, which provides high resolution transcriptomics data in both genome and cellular space, to model and explain the complex cell-cell communication mechanism occuring in the brain tissue of an injured axolotl telencephalon. Our work involves leveraging the degree of spatial coexistence of several cell types along with ligand-receptor analysis to predict gene expressions using a neural network model.