Skip to content

OPTML-Group/MU-Coreset

Repository files navigation

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

teaser
Figure 1: Schematic overview of unveiling the coreset effect for LLM unlearning.

This is the official code repository for the paper LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks.

Coreset Selection Indices

We provide the indices obtained from different coreset selection methods for WMDP and MUSE.

WMDP (RANDOM): wmdp/data_indices/{split}_random_indices_{percentage}_{seed}.json
WMDP (others): wmdp/data_indices/{split}_{method}_indices_{percentage}.json
MUSE (RANDOM): muse/muse_bench/baselines/data_indices/{split}/random_indices_{percentage}_{seed}.json
MUSE (others): muse/muse_bench/baselines/data_indices/{split}/{method}_{percentage}_{seed}.json

Note the following options:

  • {split} refers to the specific dataset split (bio or cyber for WMDP ; news or books for MUSE).
  • {method} refers to coreset selection methods which can be grand,moderate,mink.
  • {percentage} is the percentage of data selected from the original forget set (0.01,0.05,0.1,1).
  • {seed} specifies the seed used in RANDOM selection. Options are 38270,42362,58771,34176, 20971.

You can navigate to these folders to find the selected indices.

Unlearn Models

RMU

For unlearning on WMDP, navigate to the folder wmdp and run:

CUDA_VISIBLE_DEVICES=0,1 python forget_rmu.py seed=$seed percentage=$percent epochs=$epochs forget_split=$split

For unlearning on MUSE, navigate to the folder muse/muse_bench/baselines and run:

bash scripts/unlearn_{split}_rmu.sh

NPO

For unlearning on WMDP, navigate to the folder wmdp and run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python forget_nr.py seed=$seed percentage=$percent forget_loss=npo_grad_diff forget_split=$split

For unlearning on MUSE, navigate to the folder muse/muse_bench/baselines and run:

bash scripts/unlearn_{split}_npo.sh

Note the options in these commands are the same as mentioned above.
For MUSE, you can change other hyperparameters inside the bash script. Set epochs and percent according to the hyperparameters mentioned in the paper.

Additionally, for non-random coreset selection, please change the variables path and index_file, which are responsible for the path to saving the unlearned models and the path to the coreset selection index files.

Mode Connectivity

To find the linear mode connectivity between coreset unlearned models and the full forget set unlearned models, first save the unlearned models in the path specified in the unlearning scripts and then run:

accelerate launch --num_processes=4 --main_process_port=13211 --gpu_ids=$gpu_ids evals/check_LMC_npo_dist.py percentage=${percent} seed=${seed} forget_split=${split} method=${method}

The above is for NPO unlearned models. For RMU unlearned models, simply use check_LMC_rmu_dist.py.

Set the following options:

  • gpu_ids: Specify GPU IDs (For e.g. 0,1,2,3)
  • percent: the coreset selection ratio from which the coreset unlearned model was obtained: 0.01,0.05,0.1

Other options are same as specified before.

Relearning Attacks

Navigate to the folder retrain_attack/T-Vaccine/ and run retrain_wmdp.sh. You can change the model_path to specify the path to which model to unlearn.

Enhanced GCG

Navigate to the folder unlearning-vs-safety/ and run enhanced_gcg_wmdp.sh. You can change variables inside the bash script to specify the unlearned model.

Cite This Work

If you found our code or paper helpful, please cite our work~

@article{pal2025llm,
  title={LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks},
  author={Pal, Soumyadeep and Wang, Changsheng and Diffenderfer, James and Kailkhura, Bhavya and Liu, Sijia},
  journal={arXiv preprint arXiv:2504.10185},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published