-
Couldn't load subscription status.
- Fork 2.3k
Update Reducing Memory Consumption guide with more details #4332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the "Reducing Memory Consumption" guide by expanding it from a placeholder into a comprehensive documentation resource. The update adds missing memory optimization techniques, provides clearer guidance on when each technique applies, and links to external resources for additional strategies.
Key Changes:
- Added new documentation sections for padding sequences and gradient checkpointing
- Improved organization by consolidating repeated code snippets and adding clearer technique prerequisites
- Enhanced existing sections with more precise language and better cross-references
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks. 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Just a small comment
docs/source/reducing_memory_usage.md
Outdated
| ## Padding Sequences to a Multiple | ||
|
|
||
| > [!TIP] | ||
| > This technique is supported for **SFT** and **Reward** trainers, and for setups using **FlashAttention** (and its variants). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need FA for this one
Co-authored-by: Kashif Rasul <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
What does this PR do?
Updated the Reducing Memory Consumption guide with some missing techniques and some additional context (linking to transformers resources).
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
@albertvillanova @qgallouedec @kashif