This repository contains an implementation of Stable Diffusion inspired by the paper 'High-Resolution Image Synthesis with Latent Diffusion Models'. As this project is for educational purposes, I have implemented the VQ-GAN, CLIP-Encoder and U-Net models from scratch. The model was trained on the Pixel Art dataset, as it contains small images that are still high-quality. I chose Stable Diffusion as it is a good model for learning generative models. This is because it uses GANs, variational autoencoders, attention and transformer encoders (for CLIP), as well as diffusion.
The classes used for generating this output from left to right are: Human_Front, Fruit, Animal, Human_Front, Item
As this project is primarily for educational purposes, I wanted to briefly reflect on the new technologies I learned
The first time I used UV, I used it to manage the project's dependencies and Python environments. Before that, I always used Mambaforge, but I noticed that UV is faster and more manageable, as it uses a project.toml file to manage the entire project, which I found to be a much nicer way. Also that it uses the normal pip repos instead of the conda ones is an added bonus.
I used a computer with limited storage capacity and attempted to train my VQ-GAN on larger images. Unfortunately, the training process did not complete successfully, resulting in the loss of all progress. Having noticed that it would be beneficial if Pytorch Lightning checkpoint callbacks could save checkpoints in the event of exceptions, I have taken the initiative in contributing this feature.