fix(examples/hunyuanvideo-i2v): update to MS 2.6.0 and MS 2.7.0 #1385

wtomin · 2025-10-22T03:24:37Z

What does this PR do?

Fixes # (issue)

The bug in text encoder returned tuple index;
The missing GroupNorm in diffusrs.
Adds # (feature)
Updated ms version in readme.md;
Updated inference performance in readme.md.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SamitHuang @vigo999 @Cui-yshoho

gemini-code-assist · 2025-10-22T03:24:56Z

Summary of Changes

Hello @wtomin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the hunyuanvideo-i2v example's compatibility and stability by updating its MindSpore version support to 2.6.0 and 2.7.0. It includes critical bug fixes, such as correcting text encoder output parsing and introducing a custom GroupNorm implementation. Additionally, it ensures the project relies on up-to-date dependencies and provides refreshed performance benchmarks for the new MindSpore environments.

Highlights

MindSpore Version Update: The project now officially supports MindSpore versions 2.6.0 and 2.7.0, including updates to the recommended Ascend Driver, Firmware, and CANN toolkit/kernel versions in the documentation.
Text Encoder Bug Fix: A bug in the text encoder's output handling was resolved by correcting an incorrect tuple index access, ensuring proper retrieval of hidden states.
GroupNorm Implementation: A custom GroupNorm class has been implemented and integrated, addressing a previously missing normalization component crucial for the diffusers model.
Dependency Updates: Key dependencies tokenizers and transformers have been updated to their latest compatible versions (0.21.4 and 4.50.0 respectively).
Performance Benchmarks: New inference performance benchmarks for MindSpore 2.6.0 and 2.7.0 have been added to the README, providing up-to-date performance metrics.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR updates the MindSpore version in the README, fixes a bug in the text encoder, addresses a missing GroupNorm in diffusers, and updates inference performance in the README. The code changes involve modifications to the README, text encoder initialization, utils, and vae blocks.

gemini-code-assist · 2025-10-22T03:26:14Z

examples/hunyuanvideo-i2v/hyvideo/text_encoder/__init__.py

                else:
-                    last_hidden_state = outputs[2][-(hidden_state_skip_layer + 1)]
+                    last_hidden_state = outputs[1][-(hidden_state_skip_layer + 1)]


The index 1 is used here to access the hidden states from the text encoder's output when model_return_dict is False. It's important to ensure that this index is correct for all text encoder types supported. Add a comment explaining why index 1 is chosen and what it represents in the context of the outputs tuple.

If the structure of the outputs tuple changes in future versions of the transformers library, this index might become invalid, leading to runtime errors. Consider adding a check to validate the length of the outputs tuple and raise an error if it doesn't match the expected length.

Suggested change

else:

last_hidden_state = outputs[2][-(hidden_state_skip_layer + 1)]

last_hidden_state = outputs[1][-(hidden_state_skip_layer + 1)]

last_hidden_state = outputs[1][-(hidden_state_skip_layer + 1)] # outputs[1] represents the hidden states

gemini-code-assist · 2025-10-22T03:26:14Z

examples/hunyuanvideo-i2v/hyvideo/text_encoder/__init__.py

                else:
                    last_hidden_state = outputs[self.key_idx]
-                    outputs_hidden_states = outputs[2] if len(outputs) >= 3 else None  # TODO: double-check if use t5
+                    outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None  # TODO: double-check if use t5


Similar to the previous comment, the index 1 is used here to access the hidden states. Add a comment explaining why index 1 is chosen and what it represents in the context of the outputs tuple. Also, consider adding a check to validate the length of the outputs tuple.

Suggested change

outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # TODO: double-check if use t5

outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # outputs[1] represents the hidden states

gemini-code-assist · 2025-10-22T03:26:14Z

examples/hunyuanvideo-i2v/hyvideo/vae/unet_causal_3d_blocks.py

        self.conv2 = CausalConv3d(out_channels, conv_3d_out_channels, kernel_size=3, stride=1)

-        self.nonlinearity = get_activation(non_linearity)()
+        self.nonlinearity = get_activation(non_linearity)


The nonlinearity attribute is assigned the get_activation(non_linearity) function itself, rather than an instance of the activation function. This means the activation function is not being instantiated. Instantiate the activation function by calling get_activation(non_linearity)(). This could lead to unexpected behavior during the forward pass.

Suggested change

self.nonlinearity = get_activation(non_linearity)

self.nonlinearity = get_activation(non_linearity)()

gemini-code-assist · 2025-10-22T03:26:15Z

examples/hunyuanvideo-i2v/hyvideo/utils/modules_utils.py

+class GroupNorm(nn.Cell):
+    r"""Applies Group Normalization over a mini-batch of inputs.
+
+    This layer implements the operation as described in
+    the paper `Group Normalization <https://arxiv.org/abs/1803.08494>`__
+
+    .. math::
+        y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
+
+    The input channels are separated into :attr:`num_groups` groups, each containing
+    ``num_channels / num_groups`` channels. :attr:`num_channels` must be divisible by
+    :attr:`num_groups`. The mean and standard-deviation are calculated
+    separately over the each group. :math:`\gamma` and :math:`\beta` are learnable
+    per-channel affine transform parameter vectors of size :attr:`num_channels` if
+    :attr:`affine` is ``True``.
+
+    This layer uses statistics computed from input data in both training and
+    evaluation modes.
+
+    Args:
+        num_groups (int): number of groups to separate the channels into
+        num_channels (int): number of channels expected in input
+        eps: a value added to the denominator for numerical stability. Default: 1e-5
+        affine: a boolean value that when set to ``True``, this module
+            has learnable per-channel affine parameters initialized to ones (for weights)
+            and zeros (for biases). Default: ``True``.
+
+    Shape:
+        - Input: :math:`(N, C, *)` where :math:`C=\text{num\_channels}`
+        - Output: :math:`(N, C, *)` (same shape as input)
+
+    Examples::
+
+        >>> input = mint.randn(20, 6, 10, 10)
+        >>> # Separate 6 channels into 3 groups
+        >>> m = GroupNorm(3, 6)
+        >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
+        >>> m = GroupNorm(6, 6)
+        >>> # Put all 6 channels into a single group (equivalent with LayerNorm)
+        >>> m = GroupNorm(1, 6)
+        >>> # Activating the module
+        >>> output = m(input)
+    """


The docstring for the GroupNorm class includes example usage with mint.randn. It would be helpful to clarify that mint refers to mindspore.mint to avoid confusion.

SamitHuang · 2025-10-23T08:34:59Z

examples/hunyuanvideo-i2v/README.md


 1. Install
-   [CANN 8.0.RC3.beta1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.beta1)
+   [8.1.RC1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.1.RC1) or [8.2.RC1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.2.RC1)


add CANN prefix

wtomin added 3 commits October 21, 2025 09:56

correct text encoder

25cd660

update groupnorm and vae

6e6fed8

performance and ms version updates

f5c4c34

wtomin requested review from CaitinZhao, SamitHuang and zhanghuiyao as code owners October 22, 2025 03:24

wtomin requested review from Cui-yshoho and vigo999 October 22, 2025 03:25

wtomin added documentation Improvements or additions to documentation bug Something isn't working labels Oct 22, 2025

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

SamitHuang approved these changes Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(examples/hunyuanvideo-i2v): update to MS 2.6.0 and MS 2.7.0 #1385

fix(examples/hunyuanvideo-i2v): update to MS 2.6.0 and MS 2.7.0 #1385

Uh oh!

wtomin commented Oct 22, 2025

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

SamitHuang Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # TODO: double-check if use t5
	outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # outputs[1] represents the hidden states

	self.nonlinearity = get_activation(non_linearity)
	self.nonlinearity = get_activation(non_linearity)()

Uh oh!

fix(examples/hunyuanvideo-i2v): update to MS 2.6.0 and MS 2.7.0 #1385

Are you sure you want to change the base?

fix(examples/hunyuanvideo-i2v): update to MS 2.6.0 and MS 2.7.0 #1385

Uh oh!

Conversation

wtomin commented Oct 22, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

SamitHuang Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants