Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions docs/features/TEXTUAL_INVERSION.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# **Personalizing Text-to-Image Generation**

You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model.
You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model as a (.pt) embeddings file. Alternatively, you may use or train HuggingFace Concepts embeddings files (.bin) from https://huggingface.co/sd-concepts-library and its associated notebooks.

**Training**

To train, prepare a folder that contains images sized at 512x512 and execute the following:

Expand All @@ -26,9 +28,11 @@ On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.

_Note_: According to the associated paper, the optimal number of images is 3-5. Your model may not converge if you use more images than that.

Training will run indefinately, but you may wish to stop it before the heat death of the universe, when you find a low loss epoch or around ~5000 iterations.
Training will run indefinitely, but you may wish to stop it before the heat death of the universe, when you find a low loss epoch or around ~5000 iterations.

**Running**

Once the model is trained, specify the trained .pt file when starting dream using
Once the model is trained, specify the trained .pt or .bin file when starting dream using

```
(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py --embedding_path /path/to/embedding.pt --full_precision
Expand All @@ -46,7 +50,7 @@ This also works with image2image
dream> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
```

It's also possible to train multiple token (modify the placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine LDM checkpoints using:
For .pt files it's also possible to train multiple tokens (modify the placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine LDM checkpoints using:

```
(ldm) ~/stable-diffusion$ python3 ./scripts/merge_embeddings.py \
Expand Down
27 changes: 22 additions & 5 deletions ldm/modules/embedding_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ def get_clip_token_for_string(tokenizer, string):
return_tensors='pt',
)
tokens = batch_encoding['input_ids']
assert (
""" assert (
torch.count_nonzero(tokens - 49407) == 2
), f"String '{string}' maps to more than a single token. Please use another string"
), f"String '{string}' maps to more than a single token. Please use another string" """

return tokens[0, 1]

Expand Down Expand Up @@ -57,8 +57,9 @@ def __init__(
):
super().__init__()

self.string_to_token_dict = {}
self.embedder = embedder

self.string_to_token_dict = {}
self.string_to_param_dict = nn.ParameterDict()

self.initial_embeddings = (
Expand Down Expand Up @@ -217,12 +218,28 @@ def save(self, ckpt_path):

def load(self, ckpt_path, full=True):
ckpt = torch.load(ckpt_path, map_location='cpu')
self.string_to_token_dict = ckpt["string_to_token"]
self.string_to_param_dict = ckpt["string_to_param"]

# Handle .pt textual inversion files
if 'string_to_token' in ckpt and 'string_to_param' in ckpt:
self.string_to_token_dict = ckpt["string_to_token"]
self.string_to_param_dict = ckpt["string_to_param"]

# Handle .bin textual inversion files from Huggingface Concepts
# https://huggingface.co/sd-concepts-library
else:
for token_str in list(ckpt.keys()):
token = get_clip_token_for_string(self.embedder.tokenizer, token_str)
self.string_to_token_dict[token_str] = token
ckpt[token_str] = torch.nn.Parameter(ckpt[token_str])

self.string_to_param_dict.update(ckpt)

if not full:
for key, value in self.string_to_param_dict.items():
self.string_to_param_dict[key] = torch.nn.Parameter(value.half())

print(f'Added terms: {", ".join(self.string_to_param_dict.keys())}')

def get_embedding_norms_squared(self):
all_params = torch.cat(
list(self.string_to_param_dict.values()), axis=0
Expand Down