Projects |

Dishcovery

Sun, 26 Oct 2025 00:00:00 +0000

Dishcovery started from a simple truth: I love food, and I’m lucky enough to be an adventurous eater with zero dietary restrictions. But I quickly learned that coordinating meals with a group is a completely different story.

As an international student at ASU, I kept running into the same problem when trying to grab food with friends. Everyone has constraints—someone needs vegetarian or Jain options, someone is lactose intolerant, someone is on a strict budget, and no one wants to travel too far. Finding a place that satisfies all of those constraints before everyone gets hangry is incredibly frustrating.

Existing apps help you find restaurants, but they don’t help you find the exact dish that works. So, I built Dishcovery to let you search the way you actually think.

Search the Way You Actually Think

Instead of browsing menus manually, you can search for exactly what you need:

“vegan ramen under $15 within 2 miles”

“gluten-free dessert with no dairy”

You get specific dishes with prices, dietary tags, and locations.

How It Works

Under the hood, it combines:

Menu scraping & OCR to digitize restaurant offerings
LLM-based parsing to identify ingredients and map dietary tags
Structured data + natural language search to connect users to the right meal

Current Scope

The current MVP is intentionally limited in scope: it covers ~30 restaurants around ASU as a focused pilot rather than pretending to be a polished production platform. Right now, the bigger goal is to validate the idea, improve the quality of the results, and learn where automation matters most. I’m actively working on expanding restaurant coverage and making more of the ingestion, extraction, and tagging pipeline automatic and reliable.

If you want to try it yourself, check it out . And if you’re curious about the inner workings, you can take a closer look .

Multi-Hop Reasoning Agent

Sun, 01 Jun 2025 00:00:00 +0000

Large language models can answer many questions directly, but they often struggle when a question requires several connected steps. Research shows that breaking complex questions into smaller sub-questions can improve performance, especially when each step is supported by retrieved evidence instead of relying only on model memory ( , , ).

Why Multi-Hop Reasoning?

This project builds on that idea by creating a multi-agent question answering system for complex, multi-step questions. Instead of sending one large prompt to a model, the system decomposes the task, plans intermediate steps, retrieves evidence when needed, and synthesizes a final answer.

The goal is to produce answers that are grounded and traceable rather than purely guessed from model memory. This is useful for questions that require combining facts across domains such as sports, geography, literature, or current events.

For example, consider the question:

What country is home to the city where the author of Pride and Prejudice was born?

A direct answer requires multiple connected steps:

Identify the author of Pride and Prejudice.
Answer: Jane Austen.
Find where Jane Austen was born.
Answer: Steventon, Hampshire.
Determine which country Steventon is in.
Answer: England.
Synthesize the final answer.
Final answer: England.

This kind of question is simple for a person, but it illustrates why multi-hop reasoning matters: the answer is not found in a single fact. The system has to connect several pieces of information in the right order.

How the Agent Works

The agent coordinates several specialized components: a planner that decides how to approach the question, a retrieval step that gathers supporting information, a code execution step for calculations or data processing, and a synthesis step that produces the final answer.

It supports both closed-book reasoning and retrieval-augmented answering using RAG and live web search. It also includes state tracking and stall recovery, allowing the agent to continue through longer reasoning chains without getting stuck.

Implementation

I implemented the system in Python using LangGraph and LangChain, with structured outputs through Pydantic and asynchronous execution using asyncio. I also built a Streamlit interface that shows the agent’s intermediate steps, making it easier to inspect how the final answer was reached.

Overall, this project was an experiment in making LLM-based question answering more reliable by combining decomposition, retrieval, tool use, and multi-agent orchestration.

If you want to try it yourself, check out the demo . And if you’d like to take a closer look at the implementation, you can find the code .

Machine Unlearning in Small Language Models

Wed, 25 Dec 2024 00:00:00 +0000

Large language models can memorize facts during training, but removing a specific piece of knowledge after training is not straightforward. Retraining a model from scratch is expensive, and deleting the original data does not guarantee that the model will stop producing what it learned.

Why Unlearning Matters

Machine unlearning studies how to make a model intentionally forget targeted information while preserving its general abilities. This matters because models may need to forget private data, copyrighted material, outdated facts, or unsafe responses.

Prior work such as showed that approximate unlearning can reduce a model’s ability to recall specific content while keeping much of its general performance intact. Other recent work, such as , has explored unlearning as a way to remove harmful behavior, copyrighted content, and memorized knowledge from language models.

What I Explored

In this project, I explored machine unlearning for small language models in the 3B-4B parameter range. I focused on two techniques: random labeling and gradient ascent.

With random labeling, the model is fine-tuned on incorrect or randomized answers for the fact it should forget.
With gradient ascent, the training objective is reversed so the model becomes worse at recalling the targeted information.

Efficiency Through Lightweight Adaptation

To keep the process efficient, I used PEFT and LoRA adapters with quantization. This allowed me to change the model’s behavior without fully retraining it.

Evaluation

A major challenge was making the model forget the target fact without damaging its broader question-answering ability. To evaluate this, I tested whether the model stopped producing the targeted answer while also measuring general performance using BLEU, ROUGE-L, BERTScore, and , a benchmark designed to measure whether models produce truthful answers rather than imitating common falsehoods.