Dhruv Madhwal

☕️

Dhruv Madhwal

Graduate Student

About Me

I’m a grad student at ASU’s CoRAL Lab, building agentic AI and retrieval systems. I’ve worked on ML and data science projects at Samsung, Carelon, and several fast-paced startups, tackling problems across domains like healthcare analytics, recommendation systems, and consumer-focused technologies. Before ASU, I earned an MSc in Physics and BE in Electronics from BITS Pilani Goa, blending analytical rigor with technical expertise. I’m interested in both foundational AI research and its practical applications, and I’m actively looking for opportunities that bridge the two.

Download CV

Interests

Information Retrieval
AI Agents
Large Language Models
Artifical Intelligence
Deep Learning

Education

MS Computer Science
Arizona State University
MSc. Physics
Birla Institute of Technology and Science, Goa Campus
BE Electronics and Instrumentation
Birla Institute of Technology and Science, Goa Campus

📚 My Research

Hi! I’m a Graduate Researcher at the CoRAL Lab at ASU. My work focuses on information retrieval, agentic LLM architectures, and large scale information synchronization.

Multi‑Hop Reasoning Agent — CoRAL Lab, ASU Feb 2025–Present

Building a model‑agnostic multi‑hop QA agent for open‑ and closed‑book settings, integrating RAG, question decomposition, inference‑time scaling, and self‑verification. A custom LLM‑as‑Judge replaces EM/ROUGE and off‑the‑shelf LLM graders, directly assessing factual grounding, logical consistency, and chain coherence, and revealing where current benchmarks under-measure reasoning quality.

Benchmarks/Datasets: FanOutQA, Musique, Frames, Quest.
Stack: LangGraph, AutoGen, LangChain.
Techniques: RAG, decomposition, inference‑time scaling, self‑verification loops
Datasets: FanOutQA, MuSiQue, Frames, QUEST
Goal: Beat SOTA and reveal current benchmark weaknesses.

InfoboxIQ: Text to Infobox synchronization (Wikipedia) — CoRAL Lab, ASU Feb 2025–Present

A multi-stage LLM pipeline for information synchronization (text‑to‑table). Takes a Wikipedia article and its infobox template and produces an updated, evidence‑grounded infobox. We also introduce an evaluation suite verifies that the synchronized table is faithful, complete, and non‑hallucinatory.

Pipeline (6–8 stages): Preprocess, key/property breakdown, QA‑SRL extraction, KG triple generation, KG merge and conflict resolution, infobox creation.
Dataset: ~90K article–infobox pairs across ~40 Wikipedia categories with manually annotated key schemas.
Evaluation: per‑key accuracy, coverage/completeness, hallucination rate, overall text‑to‑table sync quality.

🛠️ Technical Skills

Programming Languages & Frameworks: Python, C/C++, MATLAB, Flask, FastAPI

Machine Learning: PyTorch, TensorFlow, Keras, scikit-learn, Transformers, Hugging Face, OpenCV, pandas, NumPy

LLM/Agent & RAG Stack: LangChain/LangGraph, AutoGen, ChromaDB, Pinecone, FAISS

Data Engineering & Databases: Kafka, Spark, Airflow, MySQL, Postgres, MongoDB

Cloud & DevOps: AWS, Docker, Git, CI/CD, MLflow

🎯 Selected Projects

🧠 Machine Unlearning in Small Language Models

Teaching small LMs (~3–4B params) to forget specific facts without retraining from scratch while preserving general ability. Used lightweight procedures like gradient-ascent updates and random-label fine-tuning to make compact LMs “forget on demand” with minimal collateral damage. Also support quantized inference to run efficiently on commodity GPUs.

Key Features:

Targeted forgetting: Gradient Ascent unlearning and Random-Labeling + SFT procedures.
Model preservation: Retain general QA performance while removing specific facts.
Robust eval suite: Automated BLEU / ROUGE-L / BERTS scores, per-fact unlearn/retain tagging, ablations (label similarity), and spot manual validation.
Quantization & efficiency: FP16 plus 4-bit / 8-bit inference and LoRA/PEFT adapters to reduce VRAM and speed experimentation

Technologies: PyTorch, Hugging Face Transformers, LoRA/PEFT, bitsandbytes (4-/8-bit), Small LMs (Llama-3.2-3B-Instruct, Phi-3.5-mini-instruct, Nemotron-Mini-4B-Instruct)

🔍 InQuery ML: SQL-Native ID Image Fraud Detection

Built an end-to-end ID-image fraud detector with a lightweight CNN, achieving ~92% accuracy on held-out data. The model is exposed inside SQL via a PostgreSQL PL/Python UDF, so analysts can score images for fraud using only SQL—no Python or separate service calls required.

Key Features:

SQL-only workflow: Fraud scoring happens in a query (SELECT label, confidence FROM predict_fraud(image_b64)), enabling analysts to operationalize ML without leaving SQL
Postgres-native inference: PL/Python UDF returns (class, confidence) for in-database predictions and auditability at the DB layer
TorchServe integration: Custom handler (base64 → tensor → prediction → JSON) for portable, production-style serving
Analyst-ready queries: Views and filters to flag newly issued IDs predicted fraudulent and identify repeat submitters
Performance benchmarking: Documented trade-offs—row-wise UDF calls are simple but slower; batched inference is preferred for high volume

Technologies: PyTorch, TorchServe, PostgreSQL (PL/Python), Computer Vision

📡 Edge-to-Cloud Face Recognition on AWS

Real-time face recognition for a camera stream using AWS IoT. Faces are detected at the edge with MTCNN (via a Greengrass component), only cropped faces are sent to the cloud, and FaceNet (in Lambda) returns identity + confidence. Reduced bandwidth and keeping raw frames local.

Key Features:

Edge detection, cloud recognition: Low-latency loop that keeps heavy vision local and runs identity matching in the cloud
Privacy & efficiency: Only face crops leave the device, raw video never leaves the edge
Reliable messaging: Robust IoT messaging with delivery guarantees and request/response correlation
Operational visibility: Metrics and logs for end-to-end health checks and troubleshooting

Technologies: AWS IoT Core, Greengrass v2, AWS Lambda, Amazon SQS, CloudWatch, MTCNN, FaceNet, Python

Check out my work experience!