Fix DPO Trainer Bug For Qwen2-VL (Issue 2660) #4257

FabianSchuetze · 2025-10-11T12:37:38Z

Fixes # (issue)

Fixes issue #2660 . The DPO trainer doesn't work for Qwen2.5-VL models b/c:

The pre-processing function only ingest tokens worth of one "row" of an image, but the image-encoder is 2-D: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L762
Qwen2.5-VL requires a image_grid_thw argument in the batch inputs but that isn't passed: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L1246

To solve the issue, I have:

added a kwarg that allows skipping pre-processing of data. That allows users to write their own collate-fn, as is common practice in the SFT trainer.
Pass the image_grid_thw` value to the model-args, as is done in the grpo trainer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x ] Did you read the contributor guideline,
Pull Request section?
[x ] Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@baichuanzhou raised the issue initially, and I hit on the same problem when using DPO for Qwen vl.

FabianSchuetze · 2025-10-16T07:28:07Z

@qgallouedec I saw you worked on a few DPO patches. Could you take a look at the PR?

defdet · 2025-10-20T18:31:42Z

Thank you for the PR, it would solve my problem as well. However, doesn't your PR still leave first bug unfixed?

The pre-processing function only ingest tokens worth of one "row" of an image, but the image-encoder is 2-D

The line responsible for the bug is the same: https://github.com/FabianSchuetze/trl/blob/adb34f339895e2f92a33d47195cc46c3c1f3bda9/trl/trainer/dpo_trainer.py#L766

FabianSchuetze · 2025-10-21T07:20:34Z

Thank you for the PR, it would solve my problem as well. However, doesn't your PR still leave first bug unfixed?

The pre-processing function only ingest tokens worth of one "row" of an image, but the image-encoder is 2-D

The line responsible for the bug is the same: https://github.com/FabianSchuetze/trl/blob/adb34f339895e2f92a33d47195cc46c3c1f3bda9/trl/trainer/dpo_trainer.py#L766

Yes and No. I deliberately introduced the skip_prepare_dataset keyword. When that is set, the dataset is not proccessed and you can use your own collate_fn function. That's exactly the same behavior as in the SFTTrainer.

VietHoang1512 · 2025-10-28T04:24:06Z

hi @FabianSchuetze, I am wondering if you could provide an example training script for QwenVL with your DPOTrainer?

FabianSchuetze added 3 commits October 11, 2025 14:08

optinal collate-func and lazy encoding

4a3af9e

check if arg is set

c8760fa

add to concatenated forward

adb34f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix DPO Trainer Bug For Qwen2-VL (Issue 2660) #4257

Fix DPO Trainer Bug For Qwen2-VL (Issue 2660) #4257

FabianSchuetze commented Oct 11, 2025

Uh oh!

FabianSchuetze commented Oct 16, 2025

Uh oh!

defdet commented Oct 20, 2025

Uh oh!

FabianSchuetze commented Oct 21, 2025

Uh oh!

VietHoang1512 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix DPO Trainer Bug For Qwen2-VL (Issue 2660) #4257

Are you sure you want to change the base?

Fix DPO Trainer Bug For Qwen2-VL (Issue 2660) #4257

Conversation

FabianSchuetze commented Oct 11, 2025

Before submitting

Who can review?

Uh oh!

FabianSchuetze commented Oct 16, 2025

Uh oh!

defdet commented Oct 20, 2025

Uh oh!

FabianSchuetze commented Oct 21, 2025

Uh oh!

VietHoang1512 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants