Skip to content

Conversation

@sergiopaniego
Copy link
Member

What does this PR do?

Updated the Reducing Memory Consumption guide with some missing techniques and some additional context (linking to transformers resources).

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@albertvillanova @qgallouedec @kashif

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the "Reducing Memory Consumption" guide by expanding it from a placeholder into a comprehensive documentation resource. The update adds missing memory optimization techniques, provides clearer guidance on when each technique applies, and links to external resources for additional strategies.

Key Changes:

  • Added new documentation sections for padding sequences and gradient checkpointing
  • Improved organization by consolidating repeated code snippets and adding clearer technique prerequisites
  • Enhanced existing sections with more precise language and better cross-references

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks. 🤗

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Just a small comment

## Padding Sequences to a Multiple

> [!TIP]
> This technique is supported for **SFT** and **Reward** trainers, and for setups using **FlashAttention** (and its variants).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need FA for this one

@sergiopaniego sergiopaniego merged commit 2a138c7 into main Oct 27, 2025
3 checks passed
@sergiopaniego sergiopaniego deleted the memory-usage-g-update branch October 27, 2025 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants