Skip to content

Commit 06bb1db

Browse files
committed
Style adapted to the documentation of pix2pix
1 parent 5e16d13 commit 06bb1db

File tree

1 file changed

+20
-25
lines changed

1 file changed

+20
-25
lines changed

docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Lines changed: 20 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,32 @@ specific language governing permissions and limitations under the License.
1212

1313
# Text-to-Image Generation with ControlNet Conditioning
1414

15-
## StableDiffusionControlNetPipeline
15+
## Overview
1616

17-
ControlNet by [@lllyasviel](https://huggingface.co/lllyasviel) is a neural network structure to control diffusion models by adding extra conditions.
18-
19-
There are 8 pre-trained ControlNet models that were trained to condition the original Stable Diffusion model on different inputs,
20-
such as edge detection, scribbles, depth maps, semantic segmentations and more.
17+
[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
2118

2219
Using the pretrained models we can provide control images (for example, a depth map) to control Stable Diffusion text-to-image generation so that it follows the structure of the depth image and fills in the details.
2320

24-
The original codebase/paper can be found here:
25-
- [Code](https://github.com/lllyasviel/ControlNet)
26-
- [Paper](https://arxiv.org/abs/2302.05543)
21+
The abstract of the paper is the following:
22+
23+
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
24+
25+
Resources:
26+
27+
* [Paper](https://arxiv.org/abs/2302.05543)
28+
* [Original Code](https://github.com/lllyasviel/ControlNet)
29+
30+
## Available Pipelines:
2731

32+
| Pipeline | Tasks | Demo
33+
|---|---|:---:|
34+
| [StableDiffusionControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py) | *Text-to-Image Generation with ControlNet Conditioning* | [Colab Example](https://colab.research.google.com/drive/1AiR7Q-sBqO88NCyswpfiuwXZc7DfMyKA?usp=sharing) |
2835

36+
<!-- TODO: add space -->
2937
## Available checkpoints
3038

3139
ControlNet requires a *control image* in addition to the text-to-image *prompt*.
3240
Each pretrained model is trained using a different conditioning method that requires different images for conditioning the generated outputs. For example, Canny edge conditioning requires the control image to be the output of a Canny filter, while depth conditioning requires the control image to be a depth map. See the overview and image examples below to know more.
33-
Each pretrained model is trained using a different conditioning method that requires different conditioning images. For example, Canny edge conditioning requires the control image to be the output of a Canny filter, while depth conditioning requires the control image to be a depth map.
34-
See the overview and image examples.
3541

3642
All checkpoints are converted from [lllyasviel/ControlNet](https://huggingface.co/lllyasviel/ControlNet).
3743

@@ -48,12 +54,6 @@ All checkpoints are converted from [lllyasviel/ControlNet](https://huggingface.c
4854
|[takuma104/control_sd15_scribble](https://huggingface.co/takuma104/control_sd15_scribble)<br/> *Trained with human scribbles* |A hand-drawn monochrome image with white outlines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_scribble.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_scribble.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"/></a> |
4955
|[takuma104/control_sd15_seg](https://huggingface.co/takuma104/control_sd15_seg)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
5056

51-
52-
## Resources
53-
54-
- [Colab Notebook Example](https://colab.research.google.com/drive/1AiR7Q-sBqO88NCyswpfiuwXZc7DfMyKA?usp=sharing)
55-
- [controlnet_hinter](https://github.com/takuma104/controlnet_hinter): Image Preprocess Library for ControlNet
56-
5757
## Usage example
5858

5959
- Basic Example (Canny Edge)
@@ -62,6 +62,8 @@ The conditioning image is an outline of the image edges, as detected by a Canny
6262

6363
![White on black edges detected on Vermeer's Girl with a Pearl Earring portrait](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png)
6464

65+
In the following example, note that the text prompt does not make any reference to the structure or contents of the image we are generating. Stable Diffusion interprets the control image as an additional input that controls what to generate.
66+
6567
```python
6668
from diffusers import StableDiffusionControlNetPipeline
6769
from diffusers.utils import load_image
@@ -73,7 +75,6 @@ image = pipe(prompt="best quality, extremely detailed", image=canny_edged_image)
7375
image.save("generated.png")
7476
```
7577

76-
Note that the text prompt does not make any reference to the structure or contents of the image we are generating. Stable Diffusion interprets the control image as an additional input that controls what to generate.
7778
- Controlling custom Stable Diffusion 1.5 models
7879

7980
In the following example we use PromptHero's [Openjourney model](https://huggingface.co/prompthero/openjourney), which was fine-tuned from the base Stable Diffusion v1.5 model on images from Midjourney. This model has the same structure as Stable Diffusion 1.5 but is capable of producing outputs in a different style.
@@ -83,7 +84,7 @@ from diffusers import StableDiffusionControlNetPipeline, AutoencoderKL, UNet2DCo
8384
from diffusers.utils import load_image
8485

8586
# Canny edged image for control
86-
canny_edged_image = load_image("https://huggingface.co/takuma104/controlnet_dev/resolve/main/vermeer_canny_edged.png")
87+
canny_edged_image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png")
8788

8889
base_model_id = "prompthero/openjourney" # an example: openjourney model
8990
vae = AutoencoderKL.from_pretrained(base_model_id, subfolder="vae").to("cuda")
@@ -96,10 +97,4 @@ image.save("generated.png")
9697

9798
[[autodoc]] StableDiffusionControlNetPipeline
9899
- all
99-
- __call__
100-
- enable_attention_slicing
101-
- disable_attention_slicing
102-
- enable_vae_slicing
103-
- disable_vae_slicing
104-
- enable_xformers_memory_efficient_attention
105-
- disable_xformers_memory_efficient_attention
100+
- __call__

0 commit comments

Comments
 (0)