Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/FUNDING.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: ['https://paypal.me/basuj']
custom: [ 'https://paypal.me/basuj' ]
92 changes: 64 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,53 @@
# Update: v0.8
# Update: v0.9 neon optimization edition

[Added support for inpainting](#inpainting)

<h1 align="center">Optimized Stable Diffusion</h1>
<p align="center">
<img src="https://img.shields.io/github/last-commit/basujindal/stable-diffusion?logo=Python&logoColor=green&style=for-the-badge"/>
<img src="https://img.shields.io/github/issues/basujindal/stable-diffusion?logo=GitHub&style=for-the-badge"/>
<img src="https://img.shields.io/github/stars/basujindal/stable-diffusion?logo=GitHub&style=for-the-badge"/>
<img src="https://img.shields.io/github/last-commit/neonsecret/stable-diffusion?logo=Python&logoColor=green&style=for-the-badge"/>
<img src="https://img.shields.io/github/issues/neonsecret/stable-diffusion?logo=GitHub&style=for-the-badge"/>
<img src="https://img.shields.io/github/stars/neonsecret/stable-diffusion?logo=GitHub&style=for-the-badge"/>
<a href="https://colab.research.google.com/github/neonsecret/stable-diffusion/blob/main/optimized_colab.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
</p>

This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by sacrificing inference speed.
This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by
sacrificing inference speed.

To achieve this, the stable diffusion model is fragmented into four parts which are sent to the GPU only when needed. After the calculation is done, they are moved back to the CPU. This allows us to run a bigger model while requiring less VRAM.
To achieve this, the stable diffusion model is fragmented into four parts which are sent to the GPU only when needed.
After the calculation is done, they are moved back to the CPU. This allows us to run a bigger model while requiring less
VRAM.

<h1 align="center">Installation</h1>

All the modified files are in the [optimizedSD](optimizedSD) folder, so if you have already cloned the original repository you can just download and copy this folder into the original instead of cloning the entire repo. You can also clone this repo and follow the same installation steps as the original (mainly creating the conda environment and placing the weights at the specified location).
All the modified files are in the [optimizedSD](optimizedSD) folder, so if you have already cloned the original
repository you can just download and copy this folder into the original instead of cloning the entire repo. You can also
clone this repo and follow the same installation steps as the original (mainly creating the conda environment and
placing the weights at the specified location). <br>
So run: <br>
`conda env create -f environment.yaml` <br>
`conda activate ldm`

Alternatively, if you prefer to use Docker, you can do the following:
1. Install [Docker](https://docs.docker.com/engine/install/), [Docker Compose plugin](https://docs.docker.com/compose/install/), and [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)

1. Install [Docker](https://docs.docker.com/engine/install/)
, [Docker Compose plugin](https://docs.docker.com/compose/install/),
and [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)
2. Clone this repo to, e.g., `~/stable-diffusion`
3. Put your downloaded `model.ckpt` file into `~/sd-data` (it's a relative path, you can change it in `docker-compose.yml`)
3. Put your downloaded `model.ckpt` file into `~/sd-data` (it's a relative path, you can change it
in `docker-compose.yml`)
4. `cd` into `~/stable-diffusion` and execute `docker compose up --build`

This will launch gradio on port 7860 with txt2img. You can also use `docker compose run` to execute other Python scripts.
This will launch gradio on port 7860 with txt2img. You can also use `docker compose run` to execute other Python
scripts.

<h1 align="center">Usage</h1>

## img2img

- `img2img` can generate _512x512 images from a prior image and prompt on a 4GB VRAM GPU in under 20 seconds per image_ on an RTX 2060.
- `img2img` can generate _512x512 images from a prior image and prompt on a 4GB VRAM GPU in under 20 seconds per image_
on an RTX 2060.

- The maximum size that can fit on 6GB GPU (RTX 2060) is around 576x768.

Expand All @@ -47,45 +65,57 @@ This will launch gradio on port 7860 with txt2img. You can also use `docker comp

## inpainting

- `inpaint_gradio.py` can fill masked parts of an image based on a given prompt. It can inpaint 512x512 images while using under 4GB of VRAM.
- `inpaint_gradio.py` can fill masked parts of an image based on a given prompt. It can inpaint 512x512 images while
using under 4GB of VRAM.

- To launch the gradio interface for inpainting, run `python optimizedSD/inpaint_gradio.py`. The mask for the image can be drawn on the selected image using the brush tool.
- To launch the gradio interface for inpainting, run `python optimizedSD/inpaint_gradio.py`. The mask for the image can
be drawn on the selected image using the brush tool.

- The results are not yet perfect but can be improved by using a combination of prompt weighting, prompt engineering and testing out multiple values of the `--strength` argument.
- The results are not yet perfect but can be improved by using a combination of prompt weighting, prompt engineering and
testing out multiple values of the `--strength` argument.

- _Suggestions to improve the inpainting algorithm are most welcome_.

<h1 align="center">Using the Gradio GUI</h1>

- You can also use the built-in gradio interface for `img2img`, `txt2img` & `inpainting` instead of the command line interface. Activate the conda environment and install the latest version of gradio using `pip install gradio`,
- You can also use the built-in gradio interface for `img2img`, `txt2img` & `inpainting` instead of the command line
interface. Activate the conda environment and install the latest version of gradio using `pip install gradio`,

- Run img2img using `python optimizedSD/img2img_gradio.py`, txt2img using `python optimizedSD/txt2img_gradio.py` and inpainting using `python optimizedSD/inpaint_gradio.py`.
- Run img2img using `python optimizedSD/img2img_gradio.py`, txt2img using `python optimizedSD/txt2img_gradio.py` and
inpainting using `python optimizedSD/inpaint_gradio.py`.

- img2img_gradio.py has a feature to crop input images. Look for the pen symbol in the image box after selecting the image.
- img2img_gradio.py has a feature to crop input images. Look for the pen symbol in the image box after selecting the
image.

<h1 align="center">Arguments</h1>

## `--seed`

**Seed for image generation**, can be used to reproduce previously generated images. Defaults to a random seed if unspecified.
**Seed for image generation**, can be used to reproduce previously generated images. Defaults to a random seed if
unspecified.

- The code will give the seed number along with each generated image. To generate the same image again, just specify the seed using `--seed` argument. Images are saved with its seed number as its name by default.
- The code will give the seed number along with each generated image. To generate the same image again, just specify the
seed using `--seed` argument. Images are saved with its seed number as its name by default.

- For example if the seed number for an image is `1234` and it's the 55th image in the folder, the image name will be named `seed_1234_00055.png`.
- For example if the seed number for an image is `1234` and it's the 55th image in the folder, the image name will be
named `seed_1234_00055.png`.

## `--n_samples`

**Batch size/amount of images to generate at once.**

- To get the lowest inference time per image, use the maximum batch size `--n_samples` that can fit on the GPU. Inference time per image will reduce on increasing the batch size, but the required VRAM will increase.
- To get the lowest inference time per image, use the maximum batch size `--n_samples` that can fit on the GPU.
Inference time per image will reduce on increasing the batch size, but the required VRAM will increase.

- If you get a CUDA out of memory error, try reducing the batch size `--n_samples`. If it doesn't work, the other option is to reduce the image width `--W` or height `--H` or both.
- If you get a CUDA out of memory error, try reducing the batch size `--n_samples`. If it doesn't work, the other option
is to reduce the image width `--W` or height `--H` or both.

## `--n_iter`

**Run _x_ amount of times**

- Equivalent to running the script n_iter number of times. Only difference is that the model is loaded only once per n_iter iterations. Unlike `n_samples`, reducing it doesn't have an effect on VRAM required or inference time.
- Equivalent to running the script n_iter number of times. Only difference is that the model is loaded only once per
n_iter iterations. Unlike `n_samples`, reducing it doesn't have an effect on VRAM required or inference time.

## `--H` & `--W`

Expand All @@ -97,19 +127,23 @@ This will launch gradio on port 7860 with txt2img. You can also use `docker comp

**Increases inference speed at the cost of extra VRAM usage.**

- Using this argument increases the inference speed by using around 1GB of extra GPU VRAM. It is especially effective when generating a small batch of images (~ 1 to 4) images. It takes under 25 seconds for txt2img and 15 seconds for img2img (on an RTX 2060, excluding the time to load the model). Use it on larger batch sizes if GPU VRAM available.
- Using this argument increases the inference speed by using around 1GB of extra GPU VRAM. It is especially effective
when generating a small batch of images (~ 1 to 4) images. It takes under 25 seconds for txt2img and 15 seconds for
img2img (on an RTX 2060, excluding the time to load the model). Use it on larger batch sizes if GPU VRAM available.

## `--precision autocast` or `--precision full`

**Whether to use `full` or `mixed` precision**

- Mixed Precision is enabled by default. If you don't have a GPU with tensor cores (any GTX 10 series card), you may not be able use mixed precision. Use the `--precision full` argument to disable it.
- Mixed Precision is enabled by default. If you don't have a GPU with tensor cores (any GTX 10 series card), you may not
be able use mixed precision. Use the `--precision full` argument to disable it.

## `--format png` or `--format jpg`

**Output image format**

- The default output format is `png`. While `png` is lossless, it takes up a lot of space (unless large portions of the image happen to be a single colour). Use lossy `jpg` to get smaller image file sizes.
- The default output format is `png`. While `png` is lossless, it takes up a lot of space (unless large portions of the
image happen to be a single colour). Use lossy `jpg` to get smaller image file sizes.

## `--unet_bs`

Expand All @@ -124,13 +158,15 @@ This will launch gradio on port 7860 with txt2img. You can also use `docker comp
- Prompts can also be weighted to put relative emphasis on certain words.
eg. `--prompt tabby cat:0.25 white duck:0.75 hybrid`.

- The number followed by the colon represents the weight given to the words before the colon. The weights can be both fractions or integers.
- The number followed by the colon represents the weight given to the words before the colon. The weights can be both
fractions or integers.

## Changelog

- v0.8: Added gradio interface for inpainting.
- v0.7: Added support for logging, jpg file format
- v0.6: Added support for using weighted prompts. (based on @lstein's [repo](https://github.com/lstein/stable-diffusion))
- v0.6: Added support for using weighted prompts. (based on
@lstein's [repo](https://github.com/lstein/stable-diffusion))
- v0.5: Added support for using gradio interface.
- v0.4: Added support for specifying image seed.
- v0.3: Added support for using mixed precision.
Expand Down
Loading