Comparing 5 Recent SD Distillation Methods for Low-VRAM Users: SSD, LCM, Turbo
![Comparing 5 Recent SD Distillation Methods for Low-VRAM Users: SSD, LCM, Turbo](/content/images/size/w2000/2023/12/4tr6q7oigj4c1.webp)
SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable
For context: I have an RTX2060 6GB and I was interested in getting usable sub-10 second generations. I'd previously made an optimised SD-21 pipe prior to SDXL. The prompts for the images below are taken from Microsoft Image Creator (which were presumably chosen to show off image models breadth of ability).
https://thekitchenscientist.github.io/dalle-3_examples.txt
For text2img you can see how much better SDXL is at grasping the prompt concepts but for those on older hardware, SD-turbo looks usable for near-real time painting using a tool like Krita. It combines nicely with Koyha's DeepShrink and FreeU_V2 to produce 768x1024 images without artefacts in under 5 seconds. If you use the LCM sampler you can also push it past 2 steps without frying the image - it just progressively simplifies it until you have a very simple vector image.
At 4 steps (which what is used below) it fixes most of the issues with malformed limbs that occur with only 2 steps. Some very interesting stuff happens with complex prompts as you start to push SD-Turbo to 7+ steps with the LCM sampler.
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/4tr6q7oigj4c1.jpg?width=2560&format=pjpg&auto=webp&s=9b876faf8d9d953ef0df15237e076c6c7e987d19)
SD-Turbo
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/knpi6aofgj4c1.jpg?width=2560&format=pjpg&auto=webp&s=53b138f5d194b9ed213418f0c6c1bdeb7e70da10)
SDXL-Turbo
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/hqcwp4zomj4c1.jpg?width=2560&format=pjpg&auto=webp&s=5ec53ab29d8c1506908137c0ac181c0e996136d7)
SXDL Base
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/ifz0hyealj4c1.jpg?width=2560&format=pjpg&auto=webp&s=664623a696fd84ec5c304982e939452418ec5c62)
SDXL-LCM
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/ein3fdalpj4c1.jpg?width=2560&format=pjpg&auto=webp&s=676a3050afac39c622803ce1d6f05f395c81d20c)
SSD-1B LCM
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/meea61cahj4c1.jpg?width=2560&format=pjpg&auto=webp&s=b3640db04bd2631b5e8decac472761960910b70b)
SSD-1B
Method | Seconds per image on RTX2060 6GB ComfyUI |
---|---|
SDXL - uni_pc_bh2 | 30 |
SDXL LCM Lora (it seems I need to use the merge) | 60 |
SDXL Turbo | 13 |
SSD-1B | 18 |
SSD-1B LCM Lora | 10 |
SD-Turbo | 1.5 |
SD-2.1 | 3 |
I ranked all the 2135 images I generated using the simulacra aesthetic model. For each prompt I calculated the average aesthetic across all the methods and then subtracted that from the score of each image in that group. The way SSD-1B scores higher than SDXL makes me think the simulacra aesthetic model or similar was used in the distillation process.
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/vsdtd962hj4c1.png?width=602&format=png&auto=webp&s=bec7eae0118b21b5fd5c180205b631051a067b72)
The average score for each prompt was subtracted from the score for each image
I used seed 1000000007, lcm sampler and sge_uniform scheduler. For turbo it was 4 steps and LCM it was 6 steps. The base images were generated with uni_pc_bh2 and 12 steps. The other two groups of prompts are available here:
https://thekitchenscientist.github.io/dalle-2_examples.txt
https://thekitchenscientist.github.io/artist-space_examples.txt
The image space examples are 244 prompts based on: https://docs.google.com/spreadsheets/d/14xTqtuV3BuKDNhLotB_d1aFlBGnDJOY0BRXJ8-86GpA/edit#gid=0 I ran 10k samples from this list using SSD-1B and then analysed the image composition and colours to sample a spreadout/diverse/representative set of artist prompts from the infinity of latent space.
Bonus plot showing the spread of scores across each group:
![r/StableDiffusion - Comparing 5 recent SD distillation methods SSD/LCM/Turbo to find the best option for low-VRAM users (images and statistical analysis included). SD-Turbo scores significantly higher on aesthetics, the boost to SD-21 is remarkable](https://preview.redd.it/lavvwgnqvj4c1.png?width=579&format=png&auto=webp&s=49ef58f5eca0fe5277707ca39c91e695d00b0ae3)
SD Turbo for ideation on people and landscapes; hybrid sampling for furniture, sculpture and architecture; prompt delay for keeping the same composition in multiple styles; The main thing is only using a single seed. If I want control use IPadaptor, img2img or controlNet. Some seeds are biased to split subject, etc so once I found a reliable seed, I've been using it for a year now.
There is a couple more tricks I've not mentioned here, that are not all available in Comfyui yet, that help the weaker models resolve better :
One that you can try now is to use a slow sampler for the first 15% of steps then switch to a fast method like LCM for the rest. I've found for architecture, furniture and sculpture in SSD-1B this gives much better results in only 10 steps (4/14slow+6/6LCM)
Originally posted on Reddit @thkitchenscientist