![]() Figure 1: Schematic overview of unveiling the coreset effect for LLM unlearning. |
This is the official code repository for the paper LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks.
We provide the indices obtained from different coreset selection methods for WMDP and MUSE.
WMDP (RANDOM): wmdp/data_indices/{split}_random_indices_{percentage}_{seed}.json
WMDP (others): wmdp/data_indices/{split}_{method}_indices_{percentage}.json
MUSE (RANDOM): muse/muse_bench/baselines/data_indices/{split}/random_indices_{percentage}_{seed}.json
MUSE (others): muse/muse_bench/baselines/data_indices/{split}/{method}_{percentage}_{seed}.json
Note the following options:
{split}
refers to the specific dataset split (bio
orcyber
for WMDP ;news
orbooks
for MUSE).{method}
refers to coreset selection methods which can begrand
,moderate
,mink
.{percentage}
is the percentage of data selected from the original forget set (0.01
,0.05
,0.1
,1
).{seed}
specifies the seed used in RANDOM selection. Options are38270
,42362
,58771
,34176
,20971
.
You can navigate to these folders to find the selected indices.
For unlearning on WMDP, navigate to the folder wmdp
and run:
CUDA_VISIBLE_DEVICES=0,1 python forget_rmu.py seed=$seed percentage=$percent epochs=$epochs forget_split=$split
For unlearning on MUSE, navigate to the folder muse/muse_bench/baselines
and run:
bash scripts/unlearn_{split}_rmu.sh
For unlearning on WMDP, navigate to the folder wmdp
and run:
CUDA_VISIBLE_DEVICES=0,1,2,3 python forget_nr.py seed=$seed percentage=$percent forget_loss=npo_grad_diff forget_split=$split
For unlearning on MUSE, navigate to the folder muse/muse_bench/baselines
and run:
bash scripts/unlearn_{split}_npo.sh
Note the options in these commands are the same as mentioned above.
For MUSE, you can change other hyperparameters inside the bash script. Set epochs and percent according to the hyperparameters mentioned in the paper.
Additionally, for non-random coreset selection, please change the variables path
and index_file
, which are responsible for the path to saving the unlearned models and the path to the coreset selection index files.
To find the linear mode connectivity between coreset unlearned models and the full forget set unlearned models, first save the unlearned models in the path specified in the unlearning scripts and then run:
accelerate launch --num_processes=4 --main_process_port=13211 --gpu_ids=$gpu_ids evals/check_LMC_npo_dist.py percentage=${percent} seed=${seed} forget_split=${split} method=${method}
The above is for NPO unlearned models. For RMU unlearned models, simply use check_LMC_rmu_dist.py
.
Set the following options:
gpu_ids
: Specify GPU IDs (For e.g. 0,1,2,3)percent
: the coreset selection ratio from which the coreset unlearned model was obtained:0.01
,0.05
,0.1
Other options are same as specified before.
Navigate to the folder retrain_attack/T-Vaccine/
and run retrain_wmdp.sh.
You can change the model_path
to specify the path to which model to unlearn.
Navigate to the folder unlearning-vs-safety/
and run enhanced_gcg_wmdp.sh.
You can change variables inside the bash script to specify the unlearned model.
If you found our code or paper helpful, please cite our work~
@article{pal2025llm,
title={LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks},
author={Pal, Soumyadeep and Wang, Changsheng and Diffenderfer, James and Kailkhura, Bhavya and Liu, Sijia},
journal={arXiv preprint arXiv:2504.10185},
year={2025}
}