Mitigating Object Hallucinations via
Sentence-Level Early Intervention

中文 | English

Shangpin Peng*¹, Senqiao Yang*², Li Jiang³, Zhuotao Tian¹^✉️

¹Harbin Institute of Technology, Shenzhen
²The Chinese University of Hong Kong
³The Chinese University of Hong Kong, Shenzhen

*Equal contribution
^✉️Corresponding author: [email protected]

🎊 News

[2025.07.30] 🔍 Our work has been featured and explained by 52CV, check it out here.
[2025.07.21] 📖 All code, data, and models are released!
[2025.06.26] 🎉 Our SENTINEL is accepted by ICCV 2025!

🚀 Overview

SENTINEL introduces an automatic, sentence‑level early intervention strategy to prevent and mitigate object hallucinations in multimodal large language models (MLLMs). Key advantages:

Annotation‑free: No human labeling required.
Model-agnostic: Compatible with any MLLM architecture.
Efficient: Lightweight LoRA fine‑tuning.

🔑 Key Features

Early intervention halts hallucination propagation. We find that hallucinations of MLLMs predominantly arise in early sentences and propagate through the rest of the output. SENTINEL interrupts this chain early to maximize mitigation.

In-domain contextual preference learning without human labels. SENTINEL constructs hallucinated/factual samples via detector cross-validation and builds context-aware preference data without relying on proprietary LLMs or manual annotations.

Context matters: rich coherence drives robustness. By prioritizing context-coherent positive samples over hallucinated ones, SENTINEL significantly boosts generalization.

Iterative contextual bootstrapping for diverse hallucination-free contexts. Our pipeline dynamically grows non-hallucinated contexts and expands coverage across varied scenes, improving robustness across generations.

State-of-the-art results across benchmarks. SENTINEL achieves up to 92% reduction in hallucinations and outperforms prior SOTA methods across Object HalBench, AMBER, and HallusionBench, while maintaining or improving general task performance.

📚 Dataset

We present the SENTINEL Dataset, a in-domain multimodal preference dataset for mitigating object hallucination constructed without human annotation.

Dataset details

The SENTINEL dataset records the preference pairs of the LLaVA-v1.5, LLaVA-v1.6, Qwen2-VL, and Qwen2.5-VL families, enabling robust and scalable hallucination mitigation without the need for external supervision.

It contains the following components:

image_data.jsonl file

This file contains a selection of open-source images extracted from the Visual Genome dataset. It includes only three fields: image_id, image_path, and question, and is used to construct preference training data for hallucination suppression in image captioning tasks.

Note: If you want to use the data from this file, please make sure to replace the image_path field with the path to your local copy of the Visual Genome dataset.
<model_name>.json files

These files represent the preference training datasets generated after the training data construction step, with each file corresponding to a specific model.

They include the necessary fields for C-DPO training, such as: "question", "context", "y_win", and "y_lose".

📦 Model Weights

We provide the model weights mentioned in our paper, all of which are trained using LoRA. These weights can be seamlessly plugged into the corresponding base models for inference or further fine-tuning.

Base Model	Training Data Size	LoRA	Download
LLaVA-v1.5-7B	8.6K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL
LLaVA-v1.5-13B	7.0K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL
LLaVA-v1.6-Vicuna-7B	7.0K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL
LLaVA-v1.6-Vicuna-13B	7.0K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL
Qwen2-VL-2B-Instruct	12K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL
Qwen2-VL-7B-Instruct	7.0K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL
Qwen2.5-VL-7B-Instruct	7.0K	✅	🤗 Base / 📄 Data / 🤗 SENTINEL

💻 Environment Setup

Clone this repository and navigate to SENTINEL folder

git clone https://github.com/pspdada/SENTINEL.git --depth=1
cd SENTINEL

Install packages

conda create -y SENTINEL python=3.10
conda activate SENTINEL
pip install -r requirements.txt
pip install -U flash-attn --no-build-isolation

Install additional necessary packages

Details

Download the necessary NLTK package

import nltk
nltk.download("wordnet")
nltk.download("punkt_tab")
nltk.download("cmudict")
nltk.download("averaged_perceptron_tagger_eng")

Download the necessary Spacy package

pip install -U pip setuptools wheel
pip install 'spacy[cuda12x]==3.8.0'
python -m spacy download en_core_web_md # Need for generating training data
python -m spacy download en_core_web_trf # Need for Ovject Halbench evaluation

For the use of the YOLO model:

pip install git+https://github.com/openai/CLIP.git

🔨 Data Generation

Skip this step if you only want to use our released dataset.

(Optional) Check the .env file

You can check the .env file to configure environment variables. This file is automatically loaded at runtime. Most entries are commented out by default, and you can modify them as needed.
Select the model to generate data

You need to choose an MLLM to generate training data specifically tailored for it. We have implemented support for the LLaVA-v1.5, LLaVA-v1.6, and Qwen-VL families.

You can switch the model by modifying the --model parameter in setup_utils.py. For more details, please refer to the generator directory.
Download Visual Genome for images. Download dataset for data generation and put it in dataset.
Generate training data

You can use the following command to generate training data. The generated data will be saved in the ./results directory.
```
python main.py
```
Finish generating
Generated Data Details

The generated data includes two .jsonl files:
- One is <model_name>.jsonl, which is an auxiliary file used for analysis and not for constructing preference pairs. Each line corresponds to the result for one image and includes:
  - sentences_cnt: the total number of sentences describing the image
  - hallu_objects: the total number of hallucinated objects generated during model sampling
  - uncertain_objects: uncertain objects
  - nonhallu_objects: non-hallucinated objects
- The other file is <model_name>_data_pair.jsonl, where each line is a preference sample pair. It includes essential fields such as "image_path", "context", "y_win", "y_lose", as well as additional fields for analysis.
Convert Training Samples into Required Format
- If you want to train LLaVA-v1.5, use get_llava_v15_data_pair.py to perform the conversion, to stay consistent with the original repository.
- If you want to use LLaVA-v1.6, Qwen2-VL, or Qwen2.5-VL, you need to convert the training data into the LLaMA-Factory format. You can use get_llama_factory_data_pair.py for the conversion.

⚙️ Training

Prepare data
- Training Data
  - If you want to reproduce our experiments, you can use the SENTINEL Dataset that we constructed.
  - If you prefer to build your dataset, you can use the data generated in the previous section.
- Image Data
  
  We use images from the Visual Genome dataset for model training. You can download them from Visual Genome and remember where it is.
Training
- LLaVA-v1.5
  
  We modified the code based on the HA-DPO library, which itself is based on the official LLaVA-v1.5 implementation. This choice allows for a fair and convenient comparison with prior work.
  
  Here, we provide a training script to train the model using LoRA. Run the following command to start LoRA training.
```
export INPUT_MODEL=/your/path/to/llava-v1.5-7b/or/13b
export TRAINING_DATA_PATH=/your/path/to/training/data/file
export OUTPUT_NAME=/the/name/of/directory/to/save
export VISUAL_GENOME_PATH=/your/path/to/visual/genome
bash "train/models/dpo_llava.sh"
```
  The final weights will be saved in the ./train/results/${OUTPUT_NAME} directory.
- Other models
  
  For LLaVA-v1.6, Qwen2-VL, or Qwen2.5-VL, we adopt the widely used fine-tuning framework LLaMA-Factory to implement our method, aiming for broader applicability across various scenarios.
  
  Please refer to Train SENTINEL via LLaMA-Factory for the training process.

📈 Evaluation

We strictly follow the official LLaVA evaluation settings to ensure a fair comparison. You can refer to the official guide for details.

For more information, please see our Evaluation README file.

🙏 Acknowledgement

LLaVA: LLaVA-v1.5 is an excellent open-source project on MLLMs.
HA-DPO: Our code for the LLaVA-v1.5 part is based on HA-DPO, an influential work in the field of object hallucination in MLLMs. It provided us with valuable inspiration.
LLaMA-Factory: A unified and efficient fine-tuning framework of LLMs. Our implementations for LLaVA-v1.6, Qwen2-VL, and Qwen2.5-VL are based on this framework.

📝 Citation

If you find our model/code/data/paper helpful, please consider citing our papers 📝 and starring us ⭐️！

@article{peng2025mitigating,
  title={Mitigating Object Hallucinations via Sentence-Level Early Intervention},
  author={Peng, Shangpin and Yang, Senqiao and Jiang, Li and Tian, Zhuotao},
  journal={arXiv preprint arXiv:2507.12455},
  year={2025}
}

📧 Contact us

If you have any questions, comments, or suggestions, please do not hesitate to submit an issue or PR to help advance research in this area.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dataset		dataset
docs		docs
llamafactory		llamafactory
llava		llava
model		model
run		run
train		train
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mitigating Object Hallucinations via
Sentence-Level Early Intervention

🎊 News

🚀 Overview

📌Contents

🔑 Key Features

📚 Dataset

📦 Model Weights

💻 Environment Setup

🔨 Data Generation

⚙️ Training

📈 Evaluation

🙏 Acknowledgement

📝 Citation

📧 Contact us

License

About

Uh oh!

Languages

License

pspdada/SENTINEL

Folders and files

Latest commit

History

Repository files navigation

Mitigating Object Hallucinations via Sentence-Level Early Intervention

🎊 News

🚀 Overview

📌Contents

🔑 Key Features

📚 Dataset

📦 Model Weights

💻 Environment Setup

🔨 Data Generation

⚙️ Training

📈 Evaluation

🙏 Acknowledgement

📝 Citation

📧 Contact us

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

Mitigating Object Hallucinations via
Sentence-Level Early Intervention