Skip to content

Conversation

@chenguolin
Copy link
Contributor

What does this PR do?

I do not understand for a long time why use_ema_warmup is forced to be True in EMAModel.__init__(). It is really inconvenient for really proferming fixed-decay EMA.

And it is also unreasonable to still use warmup by cur_decay_value = (1 + step) / (10 + step) when use_ema_warmup=False.

Before submitting

Who can review?

General functionalities: @sayakpaul @yiyixuxu @DN6

I do not understand for a long time why `use_ema_warmup` is forced to be `True` in `EMAModel.__init__()`.And it is also unreasonable to still use warmup by `cur_decay_value = (1 + step) / (10 + step)` when `use_ema_warmup=False`.
@sayakpaul
Copy link
Member

I think this becomes a breaking change for folks using this class. It's better to deprecate the bevahiour.

@chenguolin
Copy link
Contributor Author

Maybe adding a deprecating logging? The current decay behavior is unreasonable for me:

  1. we can't set self. use_ema_warmup in the EMAModel initialization, which means input arg use_ema_warmup is useless;
  2. EMA decay is still warmup when self.use_ema_warmup=False.

@sayakpaul
Copy link
Member

Deprecation is the way to go. We cannot introduce a breaking change like this.

@chenguolin chenguolin closed this by deleting the head repository Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants