Skip to content

Conversation

@wtomin
Copy link
Collaborator

@wtomin wtomin commented Oct 22, 2025

What does this PR do?

Fixes # (issue)

  • The bug in text encoder returned tuple index;
  • The missing GroupNorm in diffusrs.
    Adds # (feature)
  • Updated ms version in readme.md;
  • Updated inference performance in readme.md.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SamitHuang @vigo999 @Cui-yshoho

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @wtomin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the hunyuanvideo-i2v example's compatibility and stability by updating its MindSpore version support to 2.6.0 and 2.7.0. It includes critical bug fixes, such as correcting text encoder output parsing and introducing a custom GroupNorm implementation. Additionally, it ensures the project relies on up-to-date dependencies and provides refreshed performance benchmarks for the new MindSpore environments.

Highlights

  • MindSpore Version Update: The project now officially supports MindSpore versions 2.6.0 and 2.7.0, including updates to the recommended Ascend Driver, Firmware, and CANN toolkit/kernel versions in the documentation.
  • Text Encoder Bug Fix: A bug in the text encoder's output handling was resolved by correcting an incorrect tuple index access, ensuring proper retrieval of hidden states.
  • GroupNorm Implementation: A custom GroupNorm class has been implemented and integrated, addressing a previously missing normalization component crucial for the diffusers model.
  • Dependency Updates: Key dependencies tokenizers and transformers have been updated to their latest compatible versions (0.21.4 and 4.50.0 respectively).
  • Performance Benchmarks: New inference performance benchmarks for MindSpore 2.6.0 and 2.7.0 have been added to the README, providing up-to-date performance metrics.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wtomin wtomin requested review from Cui-yshoho and vigo999 October 22, 2025 03:25
@wtomin wtomin added documentation Improvements or additions to documentation bug Something isn't working labels Oct 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR updates the MindSpore version in the README, fixes a bug in the text encoder, addresses a missing GroupNorm in diffusers, and updates inference performance in the README. The code changes involve modifications to the README, text encoder initialization, utils, and vae blocks.

Comment on lines 321 to +322
else:
last_hidden_state = outputs[2][-(hidden_state_skip_layer + 1)]
last_hidden_state = outputs[1][-(hidden_state_skip_layer + 1)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The index 1 is used here to access the hidden states from the text encoder's output when model_return_dict is False. It's important to ensure that this index is correct for all text encoder types supported. Add a comment explaining why index 1 is chosen and what it represents in the context of the outputs tuple.

If the structure of the outputs tuple changes in future versions of the transformers library, this index might become invalid, leading to runtime errors. Consider adding a check to validate the length of the outputs tuple and raise an error if it doesn't match the expected length.

Suggested change
else:
last_hidden_state = outputs[2][-(hidden_state_skip_layer + 1)]
last_hidden_state = outputs[1][-(hidden_state_skip_layer + 1)]
last_hidden_state = outputs[1][-(hidden_state_skip_layer + 1)] # outputs[1] represents the hidden states

else:
last_hidden_state = outputs[self.key_idx]
outputs_hidden_states = outputs[2] if len(outputs) >= 3 else None # TODO: double-check if use t5
outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # TODO: double-check if use t5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous comment, the index 1 is used here to access the hidden states. Add a comment explaining why index 1 is chosen and what it represents in the context of the outputs tuple. Also, consider adding a check to validate the length of the outputs tuple.

Suggested change
outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # TODO: double-check if use t5
outputs_hidden_states = outputs[1] if len(outputs) >= 2 else None # outputs[1] represents the hidden states

self.conv2 = CausalConv3d(out_channels, conv_3d_out_channels, kernel_size=3, stride=1)

self.nonlinearity = get_activation(non_linearity)()
self.nonlinearity = get_activation(non_linearity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The nonlinearity attribute is assigned the get_activation(non_linearity) function itself, rather than an instance of the activation function. This means the activation function is not being instantiated. Instantiate the activation function by calling get_activation(non_linearity)(). This could lead to unexpected behavior during the forward pass.

Suggested change
self.nonlinearity = get_activation(non_linearity)
self.nonlinearity = get_activation(non_linearity)()

Comment on lines +44 to +86
class GroupNorm(nn.Cell):
r"""Applies Group Normalization over a mini-batch of inputs.
This layer implements the operation as described in
the paper `Group Normalization <https://arxiv.org/abs/1803.08494>`__
.. math::
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
The input channels are separated into :attr:`num_groups` groups, each containing
``num_channels / num_groups`` channels. :attr:`num_channels` must be divisible by
:attr:`num_groups`. The mean and standard-deviation are calculated
separately over the each group. :math:`\gamma` and :math:`\beta` are learnable
per-channel affine transform parameter vectors of size :attr:`num_channels` if
:attr:`affine` is ``True``.
This layer uses statistics computed from input data in both training and
evaluation modes.
Args:
num_groups (int): number of groups to separate the channels into
num_channels (int): number of channels expected in input
eps: a value added to the denominator for numerical stability. Default: 1e-5
affine: a boolean value that when set to ``True``, this module
has learnable per-channel affine parameters initialized to ones (for weights)
and zeros (for biases). Default: ``True``.
Shape:
- Input: :math:`(N, C, *)` where :math:`C=\text{num\_channels}`
- Output: :math:`(N, C, *)` (same shape as input)
Examples::
>>> input = mint.randn(20, 6, 10, 10)
>>> # Separate 6 channels into 3 groups
>>> m = GroupNorm(3, 6)
>>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
>>> m = GroupNorm(6, 6)
>>> # Put all 6 channels into a single group (equivalent with LayerNorm)
>>> m = GroupNorm(1, 6)
>>> # Activating the module
>>> output = m(input)
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for the GroupNorm class includes example usage with mint.randn. It would be helpful to clarify that mint refers to mindspore.mint to avoid confusion.


1. Install
[CANN 8.0.RC3.beta1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.beta1)
[8.1.RC1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.1.RC1) or [8.2.RC1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.2.RC1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add CANN prefix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants