LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Figure 1: Schematic overview of unveiling the coreset effect for LLM unlearning.

This is the official code repository for the paper LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks.

Coreset Selection Indices

We provide the indices obtained from different coreset selection methods for WMDP and MUSE.

WMDP (RANDOM): wmdp/data_indices/{split}_random_indices_{percentage}_{seed}.json
WMDP (others): wmdp/data_indices/{split}_{method}_indices_{percentage}.json
MUSE (RANDOM): muse/muse_bench/baselines/data_indices/{split}/random_indices_{percentage}_{seed}.json
MUSE (others): muse/muse_bench/baselines/data_indices/{split}/{method}_{percentage}_{seed}.json

Note the following options:

{split} refers to the specific dataset split (bio or cyber for WMDP ; news or books for MUSE).
{method} refers to coreset selection methods which can be grand,moderate,mink.
{percentage} is the percentage of data selected from the original forget set (0.01,0.05,0.1,1).
{seed} specifies the seed used in RANDOM selection. Options are 38270,42362,58771,34176, 20971.

You can navigate to these folders to find the selected indices.

Unlearn Models

RMU

For unlearning on WMDP, navigate to the folder wmdp and run:

CUDA_VISIBLE_DEVICES=0,1 python forget_rmu.py seed=$seed percentage=$percent epochs=$epochs forget_split=$split

For unlearning on MUSE, navigate to the folder muse/muse_bench/baselines and run:

bash scripts/unlearn_{split}_rmu.sh

NPO

For unlearning on WMDP, navigate to the folder wmdp and run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python forget_nr.py seed=$seed percentage=$percent forget_loss=npo_grad_diff forget_split=$split

For unlearning on MUSE, navigate to the folder muse/muse_bench/baselines and run:

bash scripts/unlearn_{split}_npo.sh

Note the options in these commands are the same as mentioned above.
For MUSE, you can change other hyperparameters inside the bash script. Set epochs and percent according to the hyperparameters mentioned in the paper.

Additionally, for non-random coreset selection, please change the variables path and index_file, which are responsible for the path to saving the unlearned models and the path to the coreset selection index files.

Mode Connectivity

To find the linear mode connectivity between coreset unlearned models and the full forget set unlearned models, first save the unlearned models in the path specified in the unlearning scripts and then run:

accelerate launch --num_processes=4 --main_process_port=13211 --gpu_ids=$gpu_ids evals/check_LMC_npo_dist.py percentage=${percent} seed=${seed} forget_split=${split} method=${method}

The above is for NPO unlearned models. For RMU unlearned models, simply use check_LMC_rmu_dist.py.

Set the following options:

gpu_ids: Specify GPU IDs (For e.g. 0,1,2,3)
percent: the coreset selection ratio from which the coreset unlearned model was obtained: 0.01,0.05,0.1

Other options are same as specified before.

Relearning Attacks

Navigate to the folder retrain_attack/T-Vaccine/ and run retrain_wmdp.sh. You can change the model_path to specify the path to which model to unlearn.

Enhanced GCG

Navigate to the folder unlearning-vs-safety/ and run enhanced_gcg_wmdp.sh. You can change variables inside the bash script to specify the unlearned model.

Cite This Work

If you found our code or paper helpful, please cite our work~

@article{pal2025llm,
  title={LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks},
  author={Pal, Soumyadeep and Wang, Changsheng and Diffenderfer, James and Kailkhura, Bhavya and Liu, Sijia},
  journal={arXiv preprint arXiv:2504.10185},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
muse/muse_bench/baselines		muse/muse_bench/baselines
retrain_attack/T-Vaccine		retrain_attack/T-Vaccine
unlearning-vs-safety		unlearning-vs-safety
wmdp		wmdp
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Coreset Selection Indices

Unlearn Models

RMU

NPO

Mode Connectivity

Relearning Attacks

Enhanced GCG

Cite This Work

About

Uh oh!

Releases

Packages

Languages

OPTML-Group/MU-Coreset

Folders and files

Latest commit

History

Repository files navigation

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Coreset Selection Indices

Unlearn Models

RMU

NPO

Mode Connectivity

Relearning Attacks

Enhanced GCG

Cite This Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages