UNet Flax with FlaxModelMixin #502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

pcuenca merged 40 commits into main from flax-unet-flaxmodelmixin

Sep 15, 2022

Member

pcuenca commented Sep 13, 2022 •

edited

Loading

This is an alternative to #485 that incorporates #493.

We'll probably close #485, but I'm having a strange bug where the configuration is not correctly applied. The json file is correctly read in from_pretrained, but the signature of the __init__ method does not contain all the parameters by the time it's inspected in extract_init_dict.

Thanks to @mishig25 the aforementioned bug was resolved.

pcuenca and others added 8 commits

September 12, 2022 18:23


          First UNet Flax modeling blocks.

67e245c

Mimic the structure of the PyTorch files.
The model classes themselves need work, depending on what we do about
configuration and initialization.


          Remove FlaxUNet2DConfig class.

c3fdbf9


          ignore_for_config non-config args.

1067e34


          Implement FlaxModelMixin

95073e1


          Merge remote-tracking branch 'origin/flax_model_mixin' into flax-unet…

b9f6eb4

…-flaxmodelmixin


          Use new mixins for Flax UNet.

9891e5c

For some reason the configuration is not correctly applied; the
signature of the `__init__` method does not contain all the parameters
by the time it's inspected in `extract_init_dict`.


          Merge remote-tracking branch 'origin/main' into flax-unet-flaxmodelmixin

2d90544


          Import FlaxUNet2DConditionModel if flax is available.

25c615a

pcuenca marked this pull request as draft

September 13, 2022 17:30

HuggingFaceDocBuilderDev commented Sep 13, 2022 •

edited

Loading

The documentation is not available anymore as the PR was closed or merged.

mishig25 reviewed

View reviewed changes

src/diffusers/models/unet_2d_condition_flax.py Outdated Show resolved Hide resolved

mishig25 and others added 19 commits

September 14, 2022 08:23


          Rm unused method framework

91559f3


          Update src/diffusers/modeling_flax_utils.py

f7a0ab2

Co-authored-by: Suraj Patil <[email protected]>


          Indicate types in flax.struct.dataclass as pointed out by @mishig25

d41f2bf

Co-authored-by: Mishig Davaadorj <[email protected]>


          Fix typo in transformer block.

e0ec7bf


          make style

5e7aeea


          Merge remote-tracking branch 'origin/main' into flax-unet-flaxmodelmixin

70ce383


          some more changes

5d81bf8


          make style

1430ab8


          Add comment

6a2a4c1


          Merge remote-tracking branch 'origin/flax_model_mixin' into flax-unet…

8d20417

…-flaxmodelmixin


          Update src/diffusers/modeling_flax_utils.py

2bf0267

Co-authored-by: Patrick von Platen <[email protected]>


          Rm unneeded comment

25ab3ca


          Update docstrings

1e8466e


          correct ignore kwargs

6842d29


          Merge branch 'flax_model_mixin' of https://github.com/huggingface/dif…

4f6b01b

…fusers into flax_model_mixin


          make style

0f26c05


          Update docstring examples

d98e8c7


          Merge branch 'flax_model_mixin' of https://github.com/huggingface/dif…

5a7b784

…fusers into flax_model_mixin


          Make style

5d08577

pcuenca marked this pull request as ready for review

September 14, 2022 17:09

pcuenca requested review from anton-l, patil-suraj and patrickvonplaten

September 14, 2022 17:10

patrickvonplaten reviewed

View reviewed changes

src/diffusers/models/unet_2d_condition_flax.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed

View reviewed changes

src/diffusers/models/unet_2d_condition_flax.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed

View reviewed changes

src/diffusers/models/unet_2d_condition_flax.py Show resolved Hide resolved

patrickvonplaten approved these changes

View reviewed changes

Contributor

patrickvonplaten left a comment

Looks very nice! Think it's just about deleting some dead code and maybe fix the dropouts everywhere :-)

pcuenca added 8 commits

September 15, 2022 09:50


          Remove some commented code and unused imports.

2d896f6


          Add init_weights (not yet in use until #513).

da6ddfd


          Trickle down deterministic to blocks.

e7347c0


          Rename q, k, v according to the latest PyTorch version.

cfca52f

Note that weights were exported with the old names, so we need to be
careful.


          Flax UNet docstrings, default props as in PyTorch.

a48500a


          Fix minor typos in PyTorch docstrings.

b33ef5e


          Use FlaxUNet2DConditionOutput as output from UNet.

b8798ba


          make style

da97b21

Member Author

pcuenca commented Sep 15, 2022

Changed since last review:

Trickle down deterministic
Rename q, k, v
Docstrings (for Flax UNet only)
FlaxUNet2DConditionOutput
init_weights (not yet in use)

pcuenca requested review from mishig25 and patrickvonplaten

September 15, 2022 09:28

patil-suraj suggested changes

View reviewed changes

Contributor

patil-suraj left a comment •

edited

Loading

Looks good in general! Left some comments, more specifically

We should add the dropout layers, I didn't add it in the original repo as it was just for inference.
Make sure that the module and weight names match 1:1 with PyTorch. This is required as we need to provide interoperability with PT and flax models.

src/diffusers/models/attention_flax.py

Comment on lines +30 to +35

+                      # Weights were exported with old names {to_q, to_k, to_v, to_out}
+                      self.query = nn.Dense(inner_dim, use_bias=False, dtype=self.dtype, name="to_q")
+                      self.key = nn.Dense(inner_dim, use_bias=False, dtype=self.dtype, name="to_k")
+                      self.value = nn.Dense(inner_dim, use_bias=False, dtype=self.dtype, name="to_v")
+                      self.proj_attn = nn.Dense(self.query_dim, dtype=self.dtype, name="to_out")

Contributor

patil-suraj Sep 15, 2022

(nit),

since we are using setup here could just use self.to_q = nn.Dense(....) instead of passing name. This will also make it easy to compare flax and pt code when reading.

Member Author

pcuenca Sep 15, 2022

Yes, the original name was self.to_q, I changed it here to make it like the renamed PyTorch version but kept the same weight names.

src/diffusers/models/attention_flax.py

		import jax.numpy as jnp


		class FlaxAttentionBlock(nn.Module):

Contributor

patil-suraj Sep 15, 2022

We should use the same names as the PyTorch modules

Suggested change

      
            class FlaxAttentionBlock(nn.Module):
          
            class FlaxCrossAttention(nn.Module):

src/diffusers/models/attention_flax.py

+                  query_dim: int
+                  heads: int = 8
+                  dim_head: int = 64
+                  dropout: float = 0.0

Contributor

patil-suraj Sep 15, 2022

dropout is not used, we should add the dropout layer here.

src/diffusers/models/attention_flax.py

+                  dim: int
+                  n_heads: int
+                  d_head: int
+                  dropout: float = 0.0

Contributor

patil-suraj Sep 15, 2022

Let's add the dropout layer

src/diffusers/models/attention_flax.py

Comment on lines +85 to +87

+                      self.self_attn = FlaxAttentionBlock(self.dim, self.n_heads, self.d_head, self.dropout, dtype=self.dtype)
+                      # cross attention
+                      self.cross_attn = FlaxAttentionBlock(self.dim, self.n_heads, self.d_head, self.dropout, dtype=self.dtype)

Contributor

patil-suraj Sep 15, 2022

The names should match with pt version for autoconversion.

Suggested change

      
                    self.self_attn = FlaxAttentionBlock(self.dim, self.n_heads, self.d_head, self.dropout, dtype=self.dtype)
          
                    # cross attention
          
                    self.cross_attn = FlaxAttentionBlock(self.dim, self.n_heads, self.d_head, self.dropout, dtype=self.dtype)
          
                    self.attn1 = FlaxAttentionBlock(self.dim, self.n_heads, self.d_head, self.dropout, dtype=self.dtype)
          
                    # cross attention
          
                    self.attn2 = FlaxAttentionBlock(self.dim, self.n_heads, self.d_head, self.dropout, dtype=self.dtype)

src/diffusers/models/unet_2d_condition_flax.py

Comment on lines +216 to +218

+                      # 1. time
+                      t_emb = self.time_proj(timesteps)
+                      t_emb = self.time_embedding(t_emb)

Contributor

patil-suraj Sep 15, 2022

This expects that timestpes is an array, might not work if it's a scaler. We should check this and hanlde scaler to array conversion.

src/diffusers/models/unet_blocks_flax.py

Comment on lines +57 to +58

		if self.add_downsample:
		self.downsample = FlaxDownsample2D(self.out_channels, dtype=self.dtype)

Contributor

patil-suraj Sep 15, 2022

This should be a list, same as in PT

src/diffusers/models/unet_blocks_flax.py

Comment on lines +98 to +99

		if self.add_downsample:
		self.downsample = FlaxDownsample2D(self.out_channels, dtype=self.dtype)

Contributor

patil-suraj Sep 15, 2022

same comment as above.

src/diffusers/models/unet_blocks_flax.py

Comment on lines +153 to +154

		if self.add_upsample:
		self.upsample = FlaxUpsample2D(self.out_channels, dtype=self.dtype)

Contributor

patil-suraj Sep 15, 2022

same comment as above.

src/diffusers/models/unet_blocks_flax.py

Comment on lines +198 to +199

		if self.add_upsample:
		self.upsample = FlaxUpsample2D(self.out_channels, dtype=self.dtype)

Contributor

patil-suraj Sep 15, 2022

same comment as above.

Contributor

patrickvonplaten commented Sep 15, 2022

Thanks a lot for the review here @patil-suraj - you're 100% right here. To move fast, I'd say we merge this PR though and solve the conversion/weigth naming in a new PR (opening an issue for this) as well as the dropout layers.

As discussed offline, feel free to merge @pcuenca and we'll adapt in a future PR according to @patil-suraj's comments here :-)

patrickvonplaten approved these changes

View reviewed changes

Contributor

patrickvonplaten commented Sep 15, 2022

Opened to issues here for future PRs :-)


          Merge remote-tracking branch 'origin/main' into flax-unet-flaxmodelmixin

802e710

pcuenca merged commit d8b0e4f into main

pcuenca deleted the flax-unet-flaxmodelmixin branch

September 15, 2022 16:07

patil-suraj mentioned this pull request

Add from_pt argument in .from_pretrained #527

Merged

MarkRich mentioned this pull request

[Community] Add dropout to Flax UNet #522

Closed

PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request


          Add TF EfficientNet Model (huggingface#502)

ff649b5

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request


          UNet Flax with FlaxModelMixin (huggingface#502)

bca8f65

* First UNet Flax modeling blocks.

Mimic the structure of the PyTorch files.
The model classes themselves need work, depending on what we do about
configuration and initialization.

* Remove FlaxUNet2DConfig class.

* ignore_for_config non-config args.

* Implement `FlaxModelMixin`

* Use new mixins for Flax UNet.

For some reason the configuration is not correctly applied; the
signature of the `__init__` method does not contain all the parameters
by the time it's inspected in `extract_init_dict`.

* Import `FlaxUNet2DConditionModel` if flax is available.

* Rm unused method `framework`

* Update src/diffusers/modeling_flax_utils.py

Co-authored-by: Suraj Patil <[email protected]>

* Indicate types in flax.struct.dataclass as pointed out by @mishig25

Co-authored-by: Mishig Davaadorj <[email protected]>

* Fix typo in transformer block.

* make style

* some more changes

* make style

* Add comment

* Update src/diffusers/modeling_flax_utils.py

Co-authored-by: Patrick von Platen <[email protected]>

* Rm unneeded comment

* Update docstrings

* correct ignore kwargs

* make style

* Update docstring examples

* Make style

* Style: remove empty line.

* Apply style (after upgrading black from pinned version)

* Remove some commented code and unused imports.

* Add init_weights (not yet in use until huggingface#513).

* Trickle down deterministic to blocks.

* Rename q, k, v according to the latest PyTorch version.

Note that weights were exported with the old names, so we need to be
careful.

* Flax UNet docstrings, default props as in PyTorch.

* Fix minor typos in PyTorch docstrings.

* Use FlaxUNet2DConditionOutput as output from UNet.

* make style

Co-authored-by: Mishig Davaadorj <[email protected]>
Co-authored-by: Mishig Davaadorj <[email protected]>
Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

anton-l Awaiting requested review from anton-l

mishig25 Awaiting requested review from mishig25

2 more reviewers

patil-suraj patil-suraj requested changes

patrickvonplaten patrickvonplaten approved these changes

Labels

None yet