Fix CI issue for vlm_gemma_3n model #4278

kaixuanliu · 2025-10-15T06:41:54Z

When we run test case pytest -rA tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_gemma_3n, it will fail both on CUDA and Intel XPU. Further investigation shows there are 2 reasons:

audio tower part will not update related weight during finetune
when using bf16 datatype, small changes for model weight will be rounded off
This PR fixes this bug.

Signed-off-by: Liu, Kaixuan <[email protected]>

yao-matrix · 2025-10-15T20:52:19Z

@kashif , pls help review, thx very much.

HuggingFaceDocBuilderDev · 2025-10-28T14:38:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

albertvillanova

Thanks for the catch and the fix.

I confirm that this PR fixes the test:

PASSED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_gemma_3n

Maybe we could add the reason why vision/audio towers do not update, something like they are frozen during training?

tests/test_sft_trainer.py

qgallouedec

I'd prefer keeping bf16, as it's usually the precision used. Instead we can simply increase precision

qgallouedec · 2025-10-28T19:31:26Z

tests/test_sft_trainer.py

            per_device_train_batch_size=1,
            gradient_checkpointing=True,
-            model_init_kwargs={"dtype": "bfloat16"},
+            model_init_kwargs={"dtype": "float16"},


Suggested change

model_init_kwargs={"dtype": "float16"},

model_init_kwargs={"dtype": "bfloat16"},

qgallouedec · 2025-10-28T19:31:45Z

tests/test_sft_trainer.py

        # Initialize the trainer
        training_args = SFTConfig(
            output_dir=self.tmp_dir,
            max_length=None,


Suggested change

max_length=None,

learning_rate=0.1, # increase lr to ensure updates are not lost due to bf16 rounding

max_length=None,

Co-authored-by: Albert Villanova del Moral <[email protected]>

fix CI issue for vlm_gemma_3n model

2d79bc3

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu marked this pull request as draft October 15, 2025 06:56

skip audio embedding parameters

3bc860c

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu marked this pull request as ready for review October 15, 2025 07:23

Merge branch 'main' into gemma_3n-ci

a982bba

albertvillanova approved these changes Oct 28, 2025

View reviewed changes

tests/test_sft_trainer.py Outdated Show resolved Hide resolved

qgallouedec reviewed Oct 28, 2025

View reviewed changes

qgallouedec and others added 2 commits October 28, 2025 13:33

Apply suggestion from @albertvillanova

cae71a2

Co-authored-by: Albert Villanova del Moral <[email protected]>

increase lr and keep bf16

37654cf

qgallouedec changed the title ~~fix CI issue for vlm_gemma_3n model~~ Fix CI issue for vlm_gemma_3n model Oct 28, 2025

qgallouedec merged commit a9d33d0 into huggingface:main Oct 28, 2025
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix CI issue for vlm_gemma_3n model #4278

Fix CI issue for vlm_gemma_3n model #4278

Uh oh!

kaixuanliu commented Oct 15, 2025

Uh oh!

yao-matrix commented Oct 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 28, 2025

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

qgallouedec Oct 28, 2025

Uh oh!

qgallouedec Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	model_init_kwargs={"dtype": "float16"},
	model_init_kwargs={"dtype": "bfloat16"},

	max_length=None,
	learning_rate=0.1, # increase lr to ensure updates are not lost due to bf16 rounding
	max_length=None,

Uh oh!

Fix CI issue for vlm_gemma_3n model #4278

Fix CI issue for vlm_gemma_3n model #4278

Uh oh!

Conversation

kaixuanliu commented Oct 15, 2025

Uh oh!

yao-matrix commented Oct 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 28, 2025

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants