Skip to content

Conversation

@FabianSchuetze
Copy link

Fixes # (issue)

Fixes issue #2660 . The DPO trainer doesn't work for Qwen2.5-VL models b/c:

To solve the issue, I have:

  • added a kwarg that allows skipping pre-processing of data. That allows users to write their own collate-fn, as is common practice in the SFT trainer.
  • Pass the image_grid_thw` value to the model-args, as is done in the grpo trainer.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x ] Did you read the contributor guideline,
    Pull Request section?
  • [x ] Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@baichuanzhou raised the issue initially, and I hit on the same problem when using DPO for Qwen vl.

@FabianSchuetze
Copy link
Author

@qgallouedec I saw you worked on a few DPO patches. Could you take a look at the PR?

@defdet
Copy link

defdet commented Oct 20, 2025

Thank you for the PR, it would solve my problem as well. However, doesn't your PR still leave first bug unfixed?

The pre-processing function only ingest tokens worth of one "row" of an image, but the image-encoder is 2-D

The line responsible for the bug is the same: https://github.com/FabianSchuetze/trl/blob/adb34f339895e2f92a33d47195cc46c3c1f3bda9/trl/trainer/dpo_trainer.py#L766

@FabianSchuetze
Copy link
Author

Thank you for the PR, it would solve my problem as well. However, doesn't your PR still leave first bug unfixed?

The pre-processing function only ingest tokens worth of one "row" of an image, but the image-encoder is 2-D

The line responsible for the bug is the same: https://github.com/FabianSchuetze/trl/blob/adb34f339895e2f92a33d47195cc46c3c1f3bda9/trl/trainer/dpo_trainer.py#L766

Yes and No. I deliberately introduced the skip_prepare_dataset keyword. When that is set, the dataset is not proccessed and you can use your own collate_fn function. That's exactly the same behavior as in the SFTTrainer.

@VietHoang1512
Copy link

hi @FabianSchuetze, I am wondering if you could provide an example training script for QwenVL with your DPOTrainer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants