-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[ldm3d] first PR #3668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ldm3d] first PR #3668
Changes from 47 commits
80dd04f
dcb2518
62914ab
ec51d63
5c6de02
ec05757
21ef8be
c541015
57e8ce0
426620f
a272128
81ee433
24107ee
8fef68a
f6aa65f
b2b8c92
ec1d1cd
0be8912
c2f5ff7
5e6c3ad
2023a00
7fd6bdb
b4cedf4
42989ff
9ffe6bc
d357649
868dd00
e14fd33
1759a91
8b905c5
f61108d
2c6db8e
5755331
c8a9574
70b26e1
2054c08
e7c5690
ea7b5b9
1b9248b
ff2290f
256e248
aff8da2
b12a2f1
cc3c64f
29b2432
3a93f8a
aeee2ea
562ea35
5f4ed2b
750248e
39ea8a6
647038b
a21c4a5
f66881d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| <!--Copyright 2023 The Intel Labs Team Authors and HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # LDM3D | ||
|
|
||
| LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://arxiv.org/abs/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal | ||
| The abstract of the paper is the following: | ||
|
|
||
| *This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360-degree-view experiences using TouchDesigner. This technology has the potential to transform a wide range of industries, from entertainment and gaming to architecture and design. Overall, this paper presents a significant contribution to the field of generative AI and computer vision, and showcases the potential of LDM3D and DepthFusion to revolutionize content creation and digital experiences. A short video summarizing the approach can be found at [this url](https://t.ly/tdi2).* | ||
|
|
||
|
|
||
| *Overview*: | ||
|
|
||
| | Pipeline | Tasks | Colab | Demo | ||
| |---|---|:---:|:---:| | ||
| | [pipeline_stable_diffusion_ldm3d.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py) | *Text-to-Image Generation* | - | - | ||
|
|
||
| ## Tips | ||
|
|
||
| - LDM3D generates both an image and a depth map from a given text prompt, compared to the existing txt-to-img diffusion models such as [Stable Diffusion](./stable_diffusion/overview) that generates only an image. | ||
| - With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps. | ||
|
|
||
|
|
||
| Running LDM3D is straighforward with the [`StableDiffusionLDM3DPipeline`]: | ||
|
|
||
| ```python | ||
| >>> from diffusers import StableDiffusionLDM3DPipeline | ||
|
|
||
| >>> pipe_ldm3d = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d") | ||
| prompt ="A picture of some lemons on a table" | ||
| output = pipe_ldm3d(prompt) | ||
| rgb_image, depth_image = output.rgb, output.depth | ||
| rgb_image[0].save("lemons_ldm3d_rgb.jpg") | ||
| depth_image[0].save("lemons_ldm3d_depth.png") | ||
| ``` | ||
|
|
||
|
|
||
| ## StableDiffusionPipelineOutput | ||
| [[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## StableDiffusionLDM3DPipeline | ||
| [[autodoc]] StableDiffusionLDM3DPipeline | ||
| - all | ||
| - __call__ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,6 +36,25 @@ class StableDiffusionPipelineOutput(BaseOutput): | |
| nsfw_content_detected: Optional[List[bool]] | ||
|
|
||
|
|
||
| @dataclass | ||
| class LDM3DPipelineOutput(BaseOutput): | ||
|
||
| """ | ||
| Output class for Stable Diffusion pipelines. | ||
|
|
||
| Args: | ||
| images (`List[PIL.Image.Image]` or `np.ndarray`) | ||
| List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width, | ||
| num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline. | ||
| nsfw_content_detected (`List[bool]`) | ||
| List of flags denoting whether the corresponding generated image likely represents "not-safe-for-work" | ||
| (nsfw) content, or `None` if safety checking could not be performed. | ||
| """ | ||
|
|
||
| rgb: Union[List[PIL.Image.Image], np.ndarray] | ||
| depth: Union[List[PIL.Image.Image], np.ndarray] | ||
| nsfw_content_detected: Optional[List[bool]] | ||
estelleafl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| try: | ||
| if not (is_transformers_available() and is_torch_available()): | ||
| raise OptionalDependencyNotAvailable() | ||
|
|
@@ -50,6 +69,7 @@ class StableDiffusionPipelineOutput(BaseOutput): | |
| from .pipeline_stable_diffusion_inpaint_legacy import StableDiffusionInpaintPipelineLegacy | ||
| from .pipeline_stable_diffusion_instruct_pix2pix import StableDiffusionInstructPix2PixPipeline | ||
| from .pipeline_stable_diffusion_latent_upscale import StableDiffusionLatentUpscalePipeline | ||
| from .pipeline_stable_diffusion_ldm3d import StableDiffusionLDM3DPipeline | ||
| from .pipeline_stable_diffusion_model_editing import StableDiffusionModelEditingPipeline | ||
| from .pipeline_stable_diffusion_panorama import StableDiffusionPanoramaPipeline | ||
| from .pipeline_stable_diffusion_sag import StableDiffusionSAGPipeline | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.