[ICML 2025] SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

This is an official implementation of the paper "SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning", accepted by ICML 2025. 📝 Paper 🤗 CoIN-ASD Benchmark

Installation

Our environment is set up with CUDA 12.1. To ensure a smooth installation, it is recommended to also use CUDA 12.1.

conda create -n sefe python=3.10 -y
conda activate sefe
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"
pip install flash-attn==2.6.3 --no-build-isolation

Data Organization and Structure

To obtain the original images and annotation data for CoIN, please refer to the official CoIN repository. We organize the downloaded files in the following directory structure:

./playground/data/CoIN
├── ScienceQA
│   └── [Original Data of ScienceQA]
├── TextVQA
│   └── [Original Data of TextVQA]
├── ImageNet
│   └── [Original Data of ImageNet]
├── GQA
│   └── [Original Data of GQA]
├── VizWiz
│   └── [Original Data of VizWiz]
├── COCO
│   └── [Original Data of COCO]
├── OCRVQA
│   └── [Original Data of OCRVQA]
└── annotations
    ├── ScienceQA
    │   ├── train.json
    │   └── test.json
    ├── TextVQA
    │   ├── train.json
    │   └── test.json
    ├── ImageNet
    │   ├── train.json
    │   └── test.json
    ├── GQA
    │   ├── train.json
    │   └── test.json
    ├── VizWiz
    │   ├── train.json
    │   └── test.json
    ├── Grounding
    │   ├── train.json
    │   └── test.json
    ├── VQAv2
    │   ├── train.json
    │   └── test.json
    └── OCRVQA
        ├── train.json
        └── test.json

Notes:

Original Data Directories: The placeholders [Original Data of XXX] represent the datasets (primarily images) downloaded directly from benchmarks such as ScienceQA and TextVQA. These are maintained in their default directory structures.
COCO Folder: Although the CoIN benchmark does not directly include COCO, the Grounding and VQAv2 tasks utilize images from the COCO dataset. Therefore, a COCO folder is included.
Annotations: The train.json and test.json files within the annotations directory contain annotations provided by CoIN or modified by our ASD. For consistency, all test sets originally named val.json in the CoIN repository have been renamed to test.json.

CoIN-ASD

The CoIN-ASD/prompts directory contains all prompts used to create the CoIN-ASD benchmark. The created annotations for CoIN-ASD can be downloaded from our HuggingFace page. After downloading, please organize the data according to the directory structure described in the "Data Organization and Structure" section above.

Note that for training data, we provide multiple versions with different values of hyperparameter $X$. For example, when $X$ is set to $20$, the corresponding JSON file is named train_x20.json. To use a specific version, modify the --data_path parameter in the corresponding training script (.sh file) under ./scripts/Train/ directory.

Pre-trained Weights

Before starting the training process, you need to download three pre-trained models:

We organize the downloaded models in the following directory structure:

./pretrained_weights
├── vicuna-7b-v1.5
├── clip-vit-large-patch14-336
└── llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5

Training and Evaluation

Once the data transformation is complete and structured correctly, you can initiate training by running ./scripts/Train/Train_all.sh. This script will automatically invoke ./scripts/Eval/Eval_all.sh after training each task to evaluate all learned tasks. For further details, please refer to the corresponding files.

Citation

@inproceedings{chen2025sefe,
  title={SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning},
  author={Chen, Jinpeng and Cong, Runmin and Zhao, Yuzhi and Yang, Hongzheng and Hu, Guangneng and Ip, Horace Ho Shing and Kwong, Sam},
  booktitle={ICML},
  year={2025}
}

Acknowledgement

This repository is built upon the LLaVA and CoIN projects. We would like to express our gratitude to the authors for their contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CoIN-ASD/prompts		CoIN-ASD/prompts
llava		llava
scripts		scripts
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
key_elements_identification.py		key_elements_identification.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICML 2025] SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

Installation

Data Organization and Structure

CoIN-ASD

Pre-trained Weights

Training and Evaluation

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jinpeng0528/SEFE

Folders and files

Latest commit

History

Repository files navigation

[ICML 2025] SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

Installation

Data Organization and Structure

CoIN-ASD

Pre-trained Weights

Training and Evaluation

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages