Skip to content

Conversation

@tjruwase
Copy link
Contributor

@tjruwase tjruwase commented Mar 1, 2022

bf16_optimizer implementing optimizer state sharding (a.k.a., zero stage 1)
Integration with pipeline parallelism

@tjruwase
Copy link
Contributor Author

tjruwase commented Mar 1, 2022

@stas00, FYI

self.__check_params(self.module, torch.bfloat16)
if self.zero_optimization_stage() == 0 and not self.pipeline_parallelism:
raise NotImplementedError(
"When not running ZeRO, BF16 training support is only supported for Pipeline parallelism"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I wonder why BF16 is only supported for Pipeline parallelism or ZeRO 1 to ZeRO3, since there is not such limit in prior version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kisseternity, apologies for the confusion here. This is a new bf16+Pipeline parallelism code path that was written in the last minute for BLOOM model training. The existing restrictions in combining with ZeRO are temporary. We plan to harmonize these combinations and eliminate the confusions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kisseternity, apologies for the confusion here. This is a new bf16+Pipeline parallelism code path that was written in the last minute for BLOOM model training. The existing restrictions in combining with ZeRO are temporary. We plan to harmonize these combinations and eliminate the confusions.

Thanks for replying. In that case, I think bf16 can be used without the bf16 optimizer or ZeRO as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these changes do not affect the previous support for bf16+ZeRO.

@mrwyattii mrwyattii deleted the olruwase/bf16-updates branch July 7, 2023 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants