Issues with DPOTrainer and Qwen2-VL processor

Hey guys, I am trying to dig into the DPO implementation for VLM and I encountered this issue:
[Here](https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L642) in the `process_row` function,
```python
processor, tokenizer = processing_class, processing_class.tokenizer  # the processing class is a processor
processed_features = processor(images=features["images"], text=features["prompt"], add_special_tokens=False)

prompt_input_ids = processed_features["input_ids"][0]
pixel_values = processed_features["pixel_values"][0]
```

 `images` will be turned into `pixel_values` by indexing on the first returned `pixel_values`.(I assume this is because dataset format request that inputs should only have one image). However, after playing with Qwen2-VL's processor, I found that Qwen2-VL's processor always returns a 2D tensor, which would make indexing on the first element essensially indexing on the first row of the pixel values.

If this is the case, I don't think `pixel_values` should be handled that way.

Here's the code I used to test Qwen2-VL's processor:
```python
from PIL import Image
from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained('Qwen/Qwen2-VL-7B-Instruct')
image = Image.open('some image')
outputs = processor(images=[image, image], text="<image><image>", return_tensors="pt")
print(outputs.pixel_values.size())
# should be a 2D tensor, in my case it was torch.Size([3128, 1176])
```
And [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py#L312) I checked the Qwen2-VL's preprocessing logic, it seems that the processor will always return a flattened out 2D tensor.

Wouldn't indexing a 2D image tensor be a problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Issues with DPOTrainer and Qwen2-VL processor #2660

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Issues with DPOTrainer and Qwen2-VL processor #2660

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions