Skip to content
326 changes: 326 additions & 0 deletions docs/source/en/using-diffusers/lcm_lora.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Performing inference with LCM-LoRA

Latent Consistency Models (LCM) enable quality image generation in typically 2-4 steps making it possible to use diffusion models in almost real-time settings.

From the [official website](https://latent-consistency-models.github.io/):

> LCMs can be distilled from any pre-trained Stable Diffusion (SD) in only 4,000 training steps (~32 A100 GPU Hours) for generating high quality 768 x 768 resolution images in 2~4 steps or even one step, significantly accelerating text-to-image generation. We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.

For a more technical overview of LCMs, refer to [the paper](https://huggingface.co/papers/2310.04378).

However, for latent consistency distillation, each model needs to be distilled separately. The core idea with LCM-LoRA is to train just a small number of adapters, known as LoRA layers, instead of the full model. The resulting LoRAs can then be applied to any fine-tuned version of the model without having to distil them separately. Additionally, the LoRAs can be applied to other tasks, such as image-to-image generation, controlnet/t2iadapter, inpainting, animatediff. The LCM-LoRA can also be combined with other style LoRAs, generating styled-images in very few steps. (4-8)

This guide shows how to perform inference with LCM-LoRAs for
- text-to-image
- image-to-image
- combined with style LoRAs
- controlent/t2iadapter
- inpainting
- animatediff

## Text-to-image

You'll use the [`StableDiffusionXLPipeline`] with the scheduler: [`LCMScheduler`] and then load the LCM-LoRA. Together with the LCM-LoRA and the scheduler, the pipeline enables a fast inference workflow overcoming the slow iterative nature of diffusion models.

```python
import torch
from diffusers import DiffusionPipeline, LCMScheduler

pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
scheduler=LCMScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler"),
variant="fp16",
torch_dtype=torch.float16
).to("cuda")

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0
).images[0]
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_i2i.png)

Notice that we use only 4 steps for generation which is way less than what's typically used for standard SDXL.

<Tip>

You may have noticed that we set `guidance_scale=1.0`, which disables classifer-free-guidance. This is because the LCM-LoRA is trained with guidance, so the batch size does not have to be doubled in this case. This leads to a faster inference time, with the drawback that negative prompts don't have any effect on the denoising process.

You can also use guidance with LCM-LoRA, but due to the nature of training the model is very sensitve to the `guidance_scale` values, high values can lead to artifacts in the generated images. In our experiments, we found that the best values are in the range of [1.0, 2.0].

</Tip>

### Inference with a fine-tuned model

As mentioned above, the LCM-LoRA can be applied to any fine-tuned version of the model without having to distil them separately. Let's look at how we can perform inference with a fine-tuned model:

```python
from diffusers import DiffusionPipeline, LCMScheduler

pipe = DiffusionPipeline.from_pretrained(
"Linaqruf/animagine-xl",
scheduler=LCMScheduler.from_pretrained("Linaqruf/animagine-xl", subfolder="scheduler"),
variant="fp16",
torch_dtype=torch.float16
).to("cuda")

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")

prompt = "face focus, cute, masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck"

generator = torch.manual_seed(0)
image = pipe(
prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0
).images[0]
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_i2i_finetuned.png)


## Image-to-image

LCM-LoRA can be applied to image-to-image tasks too. Let's look at how we can perform image-to-image generation with LCMs. For this example we'll use SD-v1-5 model and the LCM-LoRA for SD-v1-5.

```python
import torch
from diffusers import AutoPipelineForImage2Image, LCMScheduler
from diffusers.utils import make_image_grid, load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
"Lykon/dreamshaper-7",
scheduler=LCMScheduler.from_pretrained(model_id, subfolder="scheduler"),
torch_dtype=torch.float16,
variant="fp16",
).to(device)

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipe(prompt, image=init_image, num_inference_steps=4, guidance_scale=1, strength=0.6).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_i2i.png)


<Tip>

Based on your prompt and the image you provide, you can get different results. To get the best results, we recommend you to try different values for `num_inference_steps`, `strength` and `guidance_scale` parameters and choose the best one.

</Tip>


## Combined with style LoRAs

LCM-LoRA can be combined with other style LoRAs, generating styled-images in very few steps. (4-8). In the following example we'll use the LCM-LoRA with the [papercut LoRA](TheLastBen/Papercut_SDXL).

```python
import torch
from diffusers import DiffusionPipeline, LCMScheduler

pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
scheduler=LCMScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler"),
variant="fp16",
torch_dtype=torch.float16
).to("cuda")

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm")
pipe.load_lora_weights("TheLastBen/Papercut_SDXL", weight_name="papercut.safetensors", adapter_name="papercut")

pipe.set_adapters(["lcm", "papercut"], adapter_weights=[1.0, 0.8])

prompt = "papercut, a cute fox"
image = pipe(prompt, num_inference_steps=4, guidance_scale=1).images[0]
image
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_i2i_papercut.png)


## Controlnet/t2iadapter

LCM-LoRA can be used with controlnet/t2iadapter. Let's look at how we can perform inference with controlnet/t2iadapter and LCM-LoRA.

### Controlnet with SD-v1-5 and LCM-LoRA
For this example we'll use SD-v1-5 model and the LCM-LoRA for SD-v1-5 with canny controlnet.

```python
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image
from PIL import Image
import cv2
import numpy as np

image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((512, 512))

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
canny_image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
scheduler=LCMScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler"),
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None,
variant="fp16"
).to("cuda")

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

image = pipe(
"the mona lisa", image=canny_image, num_inference_steps=4, guidance_scale=1.5, controlnet_conditioning_scale=0.8, cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_i2i_controlnet.png)


<Tip>
The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one.
</Tip>

### T2iadapter with SDXL and LCM-LoRA

```python
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, LCMScheduler
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.canny import CannyDetector
import torch

# load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
adapter=adapter,
scheduler=LCMScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler"),
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
canny_detector = CannyDetector()

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
image = load_image(url)

# Detect the canny map in low resolution to avoid high-frequency details
canny_image = canny_detector(image, detect_resolution=384, image_resolution=1024)

prompt = "Mystical fairy in real, magic, 4k picture, high quality"
negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"

image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=canny_image,
num_inference_steps=4,
guidance_scale=1.5,
adapter_conditioning_scale=0.8,
adapter_conditioning_factor=1
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_i2i_t2iadapter.png)


## Inpainting

LCM-LoRA can be used for inpainting as well. Let's look at how we can perform inpainting with LCM-LoRA.

```python
import torch
from diffusers import AutoPipelineForInpainting, LCMScheduler
from diffusers.utils import load_image, make_image_grid

pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16,
scheduler=LCMScheduler.from_pretrained( "runwayml/stable-diffusion-inpainting", subfolder="scheduler"),
variant="fp16",
).to("cuda")

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

# generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
generator=generator,
num_inference_steps=4,
guidance_scale=4,
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
```

![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_inpainting.png)


## Animatediff


```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("diffusers/animatediff-motion-adapter-v1-5")
pipe = AnimateDiffPipeline.from_pretrained(
"frankjoshua/toonyou_beta6",
scheduler=LCMScheduler.from_pretrained("Lykon/dreamshaper-7", subfolder="scheduler"),
motion_adapter=adapter,
).to("cuda")

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora")

pipe.set_adapters(["lcm", "motion-lora"], adapter_weights=[0.55, 1.2])

prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress"
output = pipe(prompt=prompt, num_inference_steps=5, guidance_scale=1.25, cross_attention_kwargs={"scale": 1}, num_frames=24)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```