Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
471e5c1
fixed typo
Jun 18, 2023
7f5edc5
updated doc to be consistent in naming
Jun 19, 2023
5dddd0e
make style/quality
Jun 19, 2023
ec18756
preprocessing for 4 channels and not 6
Jun 22, 2023
b6f1a7f
make style
Jun 22, 2023
dc12b27
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Jun 22, 2023
d7348d6
test for 4c
Jun 26, 2023
ef923aa
make style/quality
Jun 26, 2023
7d58eb0
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Jun 26, 2023
3b242c3
fixed test on cpu
Jun 29, 2023
e21f8e3
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Jul 26, 2023
a7249bf
fixed doc typo
Jul 26, 2023
1e5cabb
changed default ckpt to 4c
Jul 26, 2023
5d0b7e6
Update pipeline_stable_diffusion_ldm3d.py
estelleafl Aug 1, 2023
2da08c6
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Aug 1, 2023
9d8bd99
convert file
Aug 1, 2023
a82d42f
Merge remote-tracking branch 'upstream/main' into convert_ckpt_file
Sep 6, 2023
5ce1e7e
ldm3d upscaler first commit
Oct 5, 2023
1a3b3fd
temp commit
Oct 26, 2023
1eab17d
ckpt conversion fix
Oct 30, 2023
4aa1440
upscaler from ldm3d
Nov 1, 2023
6ab7729
added ldm3d-hr
Nov 5, 2023
40ef851
merge to upstream
Nov 5, 2023
442d604
convert ckpts
Nov 5, 2023
65fc7f2
doc ldm3d upscaler
Nov 5, 2023
ef2d1e0
updated hr to sr
Nov 5, 2023
a24bd3c
make style
Nov 5, 2023
00f0524
make style
Nov 5, 2023
0546099
variable not defined bug
Nov 5, 2023
717654b
make style
Nov 6, 2023
483b69d
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Nov 6, 2023
809be15
rm files
Nov 6, 2023
9e392b9
removed conversion ckpt
Nov 6, 2023
b3ffcab
updated image processor
Nov 6, 2023
97432b2
tests
Nov 6, 2023
4f4b281
make style
Nov 6, 2023
9ee448c
documentation updated with arxiv ref
Nov 7, 2023
a669d1d
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Nov 7, 2023
6687501
fixed bug in doc
Nov 7, 2023
9eb04d4
fixed bug in test
Nov 7, 2023
85e1525
fixed a copied from
Nov 7, 2023
95518b2
fixed make copies
Nov 7, 2023
0caa7d5
fixed make-copies issue
Nov 8, 2023
af97695
community
Nov 20, 2023
3a29405
fixed import
Nov 20, 2023
ebdd0f2
fixed import
Nov 20, 2023
e733953
fixed import
Nov 20, 2023
3a56a63
readme update
Nov 20, 2023
cd9401d
update readme
Nov 20, 2023
3080a9e
merge
Nov 20, 2023
3423807
update doc
Nov 20, 2023
8141647
fixed
Nov 20, 2023
66097e3
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Nov 21, 2023
35c0e16
revert change
Nov 21, 2023
55079df
Update src/diffusers/image_processor.py
estelleafl Nov 22, 2023
1a11b35
Update src/diffusers/image_processor.py
estelleafl Nov 22, 2023
58aafad
numpy to pil depth >to depth:
Nov 22, 2023
1b9a5f3
Merge branch 'ldm3d_upscaler_community' of https://github.com/estelle…
Nov 22, 2023
7d874cf
merge to upstream
Nov 22, 2023
bd4047b
Merge branch 'main' into ldm3d_upscaler_community
yiyixuxu Nov 27, 2023
b47311c
make style/quality
Nov 28, 2023
dd2ba34
Merge branch 'main' of https://github.com/huggingface/diffusers into …
Nov 28, 2023
6d78bd1
Merge branch 'ldm3d_upscaler_community' of https://github.com/estelle…
Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@
- local: api/pipelines/stable_diffusion/upscale
title: Super-resolution
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth)
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/adapter
title: Stable Diffusion T2I-Adapter
- local: api/pipelines/stable_diffusion/gligen
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [Kandinsky 2.2](kandinsky_v22) | text2image, image2image, inpainting |
| [Latent Consistency Models](latent_consistency_models) | text2image |
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
| [LDM3D](stable_diffusion/ldm3d_diffusion) | text2image, text-to-3D |
| [LDM3D](stable_diffusion/ldm3d_diffusion) | text2image, text-to-3D, text-to-pano, upscaling |
| [MultiDiffusion](panorama) | text2image |
| [MusicLDM](musicldm) | text2audio |
| [Paint by Example](paint_by_example) | inpainting |
Expand Down
20 changes: 19 additions & 1 deletion docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ specific language governing permissions and limitations under the License.

LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://huggingface.co/papers/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, and Vasudev Lal. LDM3D generates an image and a depth map from a given text prompt unlike the existing text-to-image diffusion models such as [Stable Diffusion](./overview) which only generates an image. With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps.

Two checkpoints are available for use:
- [ldm3d-original](https://huggingface.co/Intel/ldm3d). The original checkpoint used in the [paper](https://arxiv.org/pdf/2305.10853.pdf)
- [ldm3d-4c](https://huggingface.co/Intel/ldm3d-4c). The new version of LDM3D using 4 channels inputs instead of 6-channels inputs and finetuned on higher resolution images.


The abstract from the paper is:

*This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360-degree-view experiences using TouchDesigner. This technology has the potential to transform a wide range of industries, from entertainment and gaming to architecture and design. Overall, this paper presents a significant contribution to the field of generative AI and computer vision, and showcases the potential of LDM3D and DepthFusion to revolutionize content creation and digital experiences. A short video summarizing the approach can be found at [this url](https://t.ly/tdi2).*
Expand All @@ -26,12 +31,25 @@ Make sure to check out the Stable Diffusion [Tips](overview#tips) section to lea

## StableDiffusionLDM3DPipeline

[[autodoc]] StableDiffusionLDM3DPipeline
[[autodoc]] pipelines.stable_diffusion.pipeline_stable_diffusion_ldm3d.StableDiffusionLDM3DPipeline
- all
- __call__


## LDM3DPipelineOutput

[[autodoc]] pipelines.stable_diffusion.pipeline_stable_diffusion_ldm3d.LDM3DPipelineOutput
- all
- __call__

# Upscaler

[LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D.

The abstract from the paper is:
*Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods*

Two checkpoints are available for use:
- [ldm3d-pano](https://huggingface.co/Intel/ldm3d-pano). This checkpoint enables the generation of panoramic images and requires the StableDiffusionLDM3DPipeline pipeline to be used.
- [ldm3d-sr](https://huggingface.co/Intel/ldm3d-sr). This checkpoint enables the upscaling of RGB and depth images. Can be used in cascade after the original LDM3D pipeline using the StableDiffusionUpscaleLDM3DPipeline from communauty pipeline.

8 changes: 7 additions & 1 deletion docs/source/en/api/pipelines/stable_diffusion/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,16 @@ The table below summarizes the available Stable Diffusion pipelines, their suppo
<td class="px-4 py-2 text-gray-700">
<a href="./ldm3d_diffusion">StableDiffusionLDM3D</a>
</td>
<td class="px-4 py-2 text-gray-700">text-to-rgb, text-to-depth</td>
<td class="px-4 py-2 text-gray-700">text-to-rgb, text-to-depth, text-to-pano</td>
<td class="px-4 py-2"><a href="https://huggingface.co/spaces/r23/ldm3d-space"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"/></a>
</td>
</tr>
<tr>
<td class="px-4 py-2 text-gray-700">
<a href="./ldm3d_diffusion">StableDiffusionUpscaleLDM3D</a>
</td>
<td class="px-4 py-2 text-gray-700">ldm3d super-resolution</td>
</tr>
</tbody>
</table>
</div>
Expand Down
1 change: 1 addition & 0 deletions docs/source/ja/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,4 @@ specific language governing permissions and limitations under the License.
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
| [stable_diffusion_ldm3d](./api/pipelines/stable_diffusion/ldm3d_diffusion) | [LDM3D: Latent Diffusion Model for 3D](https://arxiv.org/abs/2305.10853) | Text to Image and Depth Generation |
| [stable_diffusion_upscaler_ldm3d](./api/pipelines/stable_diffusion/ldm3d_diffusion) | [LDM3D-VR: Latent Diffusion Model for 3D VR](https://arxiv.org/pdf/2311.03226) | Image and Depth Upscaling |
44 changes: 43 additions & 1 deletion examples/community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ prompt-to-prompt | change parts of a prompt and retain image structure (see [pap
| Latent Consistency Pipeline | Implementation of [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://arxiv.org/abs/2310.04378) | [Latent Consistency Pipeline](#latent-consistency-pipeline) | - | [Simian Luo](https://github.com/luosiallen) |
| Latent Consistency Img2img Pipeline | Img2img pipeline for Latent Consistency Models | [Latent Consistency Img2Img Pipeline](#latent-consistency-img2img-pipeline) | - | [Logan Zoellner](https://github.com/nagolinc) |
| Latent Consistency Interpolation Pipeline | Interpolate the latent space of Latent Consistency Models with multiple prompts | [Latent Consistency Interpolation Pipeline](#latent-consistency-interpolation-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pK3NrLWJSiJsBynLns1K1-IDTW9zbPvl?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) |

| LDM3D-sr (LDM3D upscaler) | Upscale low resolution RGB and depth inputs to high resolution | [StableDiffusionUpscaleLDM3D Pipeline](https://github.com/estelleafl/diffusers/tree/ldm3d_upscaler_community/examples/community#stablediffusionupscaleldm3d-pipeline) | - | [Estelle Aflalo](https://github.com/estelleafl) |
|

To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
```py
Expand Down Expand Up @@ -2344,6 +2345,47 @@ images = pipe(
assert len(images) == (len(prompts) - 1) * num_interpolation_steps
```

### StableDiffusionUpscaleLDM3D Pipeline
[LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D.

The abstract from the paper is:
*Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods*

Two checkpoints are available for use:
- [ldm3d-pano](https://huggingface.co/Intel/ldm3d-pano). This checkpoint enables the generation of panoramic images and requires the StableDiffusionLDM3DPipeline pipeline to be used.
- [ldm3d-sr](https://huggingface.co/Intel/ldm3d-sr). This checkpoint enables the upscaling of RGB and depth images. Can be used in cascade after the original LDM3D pipeline using the StableDiffusionUpscaleLDM3DPipeline pipeline.

'''py
from PIL import Image
import os
import torch
from diffusers import StableDiffusionLDM3DPipeline, DiffusionPipeline

#Generate a rgb/depth output from LDM3D
pipe_ldm3d = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-4c")
pipe_ldm3d.to("cuda")

prompt =f"A picture of some lemons on a table"
output = pipe_ldm3d(prompt)
rgb_image, depth_image = output.rgb, output.depth
rgb_image[0].save(f"lemons_ldm3d_rgb.jpg")
depth_image[0].save(f"lemons_ldm3d_depth.png")


#Upscale the previous output to a resolution of (1024, 1024)
pipe_ldm3d_upscale = DiffusionPipeline.from_pretrained("Intel/ldm3d-sr", custom_pipeline="pipeline_stable_diffusion_upscale_ldm3d")

pipe_ldm3d_upscale.to("cuda")

low_res_img = Image.open(f"lemons_ldm3d_rgb.jpg").convert("RGB")
low_res_depth = Image.open(f"lemons_ldm3d_depth.png").convert("L")
outputs = pipe_ldm3d_upscale(prompt="high quality high resolution uhd 4k image", rgb=low_res_img, depth=low_res_depth, num_inference_steps=50, target_res=[1024, 1024])

upscaled_rgb, upscaled_depth =outputs.rgb[0], outputs.depth[0]
upscaled_rgb.save(f"upscaled_lemons_rgb.png")
upscaled_depth.save(f"upscaled_lemons_depth.png")
'''

### ControlNet + T2I Adapter Pipeline
This pipelines combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once.
It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale = 0` or `controlnet_conditioning_scale = 0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively.
Expand Down
Loading