<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>PyTorch |</title><link>https://dhruvmadhwal.github.io/tags/pytorch/</link><atom:link href="https://dhruvmadhwal.github.io/tags/pytorch/index.xml" rel="self" type="application/rss+xml"/><description>PyTorch</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 25 Dec 2024 00:00:00 +0000</lastBuildDate><image><url>https://dhruvmadhwal.github.io/media/icon.svg</url><title>PyTorch</title><link>https://dhruvmadhwal.github.io/tags/pytorch/</link></image><item><title>Machine Unlearning in Small Language Models</title><link>https://dhruvmadhwal.github.io/projects/scikit/</link><pubDate>Wed, 25 Dec 2024 00:00:00 +0000</pubDate><guid>https://dhruvmadhwal.github.io/projects/scikit/</guid><description>&lt;p&gt;Large language models can memorize facts during training, but removing a specific piece of knowledge after training is not straightforward. Retraining a model from scratch is expensive, and deleting the original data does not guarantee that the model will stop producing what it learned.&lt;/p&gt;
&lt;h2 id="why-unlearning-matters"&gt;Why Unlearning Matters&lt;/h2&gt;
&lt;p&gt;Machine unlearning studies how to make a model intentionally forget targeted information while preserving its general abilities. This matters because models may need to forget private data, copyrighted material, outdated facts, or unsafe responses.&lt;/p&gt;
&lt;p&gt;Prior work such as
showed that approximate unlearning can reduce a model’s ability to recall specific content while keeping much of its general performance intact. Other recent work, such as
, has explored unlearning as a way to remove harmful behavior, copyrighted content, and memorized knowledge from language models.&lt;/p&gt;
&lt;h2 id="what-i-explored"&gt;What I Explored&lt;/h2&gt;
&lt;p&gt;In this project, I explored machine unlearning for small language models in the 3B-4B parameter range. I focused on two techniques: random labeling and gradient ascent.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;With random labeling, the model is fine-tuned on incorrect or randomized answers for the fact it should forget.&lt;/li&gt;
&lt;li&gt;With gradient ascent, the training objective is reversed so the model becomes worse at recalling the targeted information.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="efficiency-through-lightweight-adaptation"&gt;Efficiency Through Lightweight Adaptation&lt;/h2&gt;
&lt;p&gt;To keep the process efficient, I used PEFT and LoRA adapters with quantization. This allowed me to change the model’s behavior without fully retraining it.&lt;/p&gt;
&lt;h2 id="evaluation"&gt;Evaluation&lt;/h2&gt;
&lt;p&gt;A major challenge was making the model forget the target fact without damaging its broader question-answering ability. To evaluate this, I tested whether the model stopped producing the targeted answer while also measuring general performance using BLEU, ROUGE-L, BERTScore, and
, a benchmark designed to measure whether models produce truthful answers rather than imitating common falsehoods.&lt;/p&gt;</description></item></channel></rss>