Comparing 5 Recent SD Distillation Methods for Low-VRAM Users: SSD, LCM, Turbo

Comparing 5 Recent SD Distillation Methods for Low-VRAM Users: SSD, LCM, Turbo

SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

For context: I have an RTX2060 6GB and I was interested in getting usable sub-10 second generations. I'd previously made an optimised SD-21 pipe prior to SDXL. The prompts for the images below are taken from Microsoft Image Creator (which were presumably chosen to show off image models breadth of ability).

https://thekitchenscientist.github.io/dalle-3_examples.txt

For text2img you can see how much better SDXL is at grasping the prompt concepts but for those on older hardware, SD-turbo looks usable for near-real time painting using a tool like Krita. It combines nicely with Koyha's DeepShrink and FreeU_V2 to produce 768x1024 images without artefacts in under 5 seconds. If you use the LCM sampler you can also push it past 2 steps without frying the image - it just progressively simplifies it until you have a very simple vector image.

At 4 steps (which what is used below) it fixes most of the issues with malformed limbs that occur with only 2 steps. Some very interesting stuff happens with complex prompts as you start to push SD-Turbo to 7+ steps with the LCM sampler.

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SD-Turbo

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SDXL-Turbo

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SXDL Base

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SDXL-LCM

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SSD-1B LCM

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SSD-1B

MethodSeconds per image on RTX2060 6GB ComfyUI
SDXL - uni_pc_bh230
SDXL LCM Lora (it seems I need to use the merge)60
SDXL Turbo13
SSD-1B18
SSD-1B LCM Lora10
SD-Turbo1.5
SD-2.13

I ranked all the 2135 images I generated using the simulacra aesthetic model. For each prompt I calculated the average aesthetic across all the methods and then subtracted that from the score of each image in that group. The way SSD-1B scores higher than SDXL makes me think the simulacra aesthetic model or similar was used in the distillation process.

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

The average score for each prompt was subtracted from the score for each image

I used seed 1000000007, lcm sampler and sge_uniform scheduler. For turbo it was 4 steps and LCM it was 6 steps. The base images were generated with uni_pc_bh2 and 12 steps. The other two groups of prompts are available here:

https://thekitchenscientist.github.io/dalle-2_examples.txt

https://thekitchenscientist.github.io/artist-space_examples.txt

The image space examples are 244 prompts based on: https://docs.google.com/spreadsheets/d/14xTqtuV3BuKDNhLotB_d1aFlBGnDJOY0BRXJ8-86GpA/edit#gid=0 I ran 10k samples from this list using SSD-1B and then analysed the image composition and colours to sample a spreadout/diverse/representative set of artist prompts from the infinity of latent space.

Bonus plot showing the spread of scores across each group:

r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable

SD Turbo for ideation on people and landscapes; hybrid sampling for furniture, sculpture and architecture; prompt delay for keeping the same composition in multiple styles; The main thing is only using a single seed. If I want control use IPadaptor, img2img or controlNet. Some seeds are biased to split subject, etc so once I found a reliable seed, I've been using it for a year now.

There is a couple more tricks I've not mentioned here, that are not all available in Comfyui yet, that help the weaker models resolve better :

latent jitter
set-up Stable Diffusion with minimal dependencies and a single multi-function pipe - thekitchenscientist/sd_lite

One that you can try now is to use a slow sampler for the first 15% of steps then switch to a fast method like LCM for the rest. I've found for architecture, furniture and sculpture in SSD-1B this gives much better results in only 10 steps (4/14slow+6/6LCM)

Originally posted on Reddit @thkitchenscientist

Read more