-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Source - https://github.com/openai/consistencydecoder
OpenAI open-sourced consistency decoder, that decodes latent like a pro. Makes images more consistent and less disturbing

But. Right now I see no options to properly decode latent with their provided code. So, maybe there's some kind of solution already exist, but seems that it might be something you can add in diffusers as official support.
Weights of the Consistency Decoder are in .pt format. Possibly it can be converted into .safetensors and loaded as regular VAE.
Also, it is memory-heavy, meaning to decode image quick there's real need to unload the pipeline/model from VRAM at least to cpu
Here's my code which results into garbage-grade images (specifically out of SDXL pipeline):
import consistencydecoder
from diffusers import StableDiffusionXLPipeline
import torch
from consistencydecoder import ConsistencyDecoder
seed = 42
generator = torch.Generator(device="cuda").manual_seed(seed)
print(f"Loading pipe")
pipe = StableDiffusionXLPipeline.from_single_file(
model_path, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
pipe.vae.cuda()
decoder_consistency = ConsistencyDecoder(device="cuda:0")
image = pipe(prompt=prompt, width=1024, height=1024,
num_inference_steps=20,
generator=generator,
output_type="latent",
).images
pipe.to("cpu")
print(f"Decoding latent")
consistencydecoder.save_image(image, "output.png")