-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
I haven't fully fixed the script, but I'm really not sure how anyone has had success with it (eg the blog post). Many show stoppers when trying to get Textual Inversion working.
To debug, I started peeling away at the main Flux training script, and have fixed all the following bugs, BUT it still doesn't train well. With any hyperparams, it fails to learn a new concept, even to overfit on it, and quickly degrades. So the following are necessary to be fixed, but not sufficient to get this working.
- On a 1-node-2-gpu setup,
acceleratorwraps everything in aDistributedSomethingclass, so all the calls tomodel.dtype/model.configetc error out, since they're wrapped, and you actually need something likemodel.module.dtype - it requires an
--instance-prompteven if you're using a dataset with a custom text column - The updated tokenizers are not getting passed into the pipeline
- text_embeddings get saved weird, it looks like a single copy gets saved no matter the checkpoint number?
- in
log_validationthere's a dtype mismatch runtime error when training infp16/bf16, and you need to autocast - also in
log_validationtheGeneratordoesn't seem to do anything. Every generation gets performed on a different seed so you can't see the evolution/performance working on the same samples. This is remedied if you save the RNG state of torch/torch_cuda/random, then use the RNGGenerator, then reset the random state afterward t5training doesn't work. On this line it unfreezestoken_embeddingwhich doesn't exist in t5.sharedneeds to be unfrozen.- Just a weird one, pulling the embeddings toward
std**0.1makes intuitive sense, but this kind of thing should definitely mention justification. Was this done in a paper somewhere?
Some nice to haves that I wish this did out of the box (but would be happy if it the simple path above "just worked"):
- save latent cache to disk
- aspect ratio bucketing
- 8bit backbone
Honestly, at this point i'm tempted to give up on diffusers and use something else for this current client, after sweating over fixing the above, i still haven't gotten it to work, and still haven't been able to stress test things like the dreambooth functionality, or the whole-text-encoder finetuning feature.