research

past and ongoing research work

ToxiShield

Aug 2024 – Present

ToxiShield is a real-time framework that detects, classifies toxicity in GitHub pull request comments and mitigates them by suggesting a semantically equivalent, non-toxic alternative. We trained lightweight models such as BERT-base-uncased to prepare our toxicity detector. For classification into subcategories of toxicity, and detoxification we used Large language Models (LLMs). Using a teacher-student framework, we reaped the benefits of proprietary LLMs such as GPT-4o by distilling their knowledge into much smaller open-source models such as Llama 3.2. Our results showed that the smaller open-source models outperformed their teacher models across multiple metrics of text style transfer. To make our framework usable, we condensed the entire workflow into a simple browser extension that can work seamlessly with GitHub.

ColocEM (Undergraduate Thesis)

Aug 2023 – Present

Collaborators: Dr. Mohammad Saifur Rahman, Dr. Md. Abul Hassan Samee

We are working on Stereo-Seq data, which provides high resolution transcriptomics data in both genome and cellular space, to model and explain the complex cell-cell communication mechanism occuring in the brain tissue of an injured axolotl telencephalon. Our work involves leveraging the degree of spatial coexistence of several cell types along with ligand-receptor analysis to predict gene expressions using a neural network model.