5 mins read

SDXL Turbo Debuts: StabilityAI’s Latest Real-Time Text to Image Model

After some teasing by Emad, Stability.ai has finally revealed the awaited news – they’ve just unveiled SDXL Turbo! And as you can guess from the name, its main focus is speed. It makes it possible to generate images in real time!

Moreover, Emad, the CEO of Stability.ai, mentioned that for SDXL Turbo and other recent models, a subscription will be required for commercial use. We are slowly shifting from something that started as fully free and open source, but I guess it’s somehow justified, given the resources needed to work on these models.


So, what is it about? Let’s see the main points taken from the announcement by stability.ai.


Breakthrough Performance with Adversarial Diffusion Distillation

SDXL Turbo seems to be able to offer unparalleled performance thanks to its unique distillation technology, named Adversarial Diffusion Distillation (ADD). This innovative approach enables the model to generate high-quality images in a single step, a significant leap from the previous 50-step processes. The result is an extraordinary capability for real-time image generation, showcasing a level of quality that was once thought to be years away. It also seems to be uncensored like Stable Diffusion 1.5.

For the technically inclined, Stability.ai offers a detailed research paper on SDXL Turbo’s distillation technique. This document sheds light on the use of adversarial training and score distillation, which are key to the model’s exceptional performance. The paper provides valuable insights for researchers and AI enthusiasts eager to understand the mechanics behind this advanced AI model.

Netflix for generative AI

Stability.ai has made SDXL Turbo’s model weights and code available for download on Hugging Face, under a non-commercial research license. This accessibility allows personal, non-commercial use and experimentation, offering a playground for researchers, AI enthusiasts, and digital artists.

Building on the excitement of their latest innovation, SDXL Turbo, Emad has also shared important information regarding commercial use, as highlighted in a recent tweet by Emad, which has been later deleted. He explained that while models like Stable Video Diffusion, SDXL Turbo, and other upcoming “stable series” models will be free for non-commercial personal and academic usage, commercial use will require a Stability membership.

This membership model is designed to be accessible and scalable. For instance, an indie developer might pay around $100 a month, with the fees varying based on the size of the organization but remaining reasonable. Emad compared this approach to services like Amazon Prime or Netflix, but for generative AI models.

Understanding the Limitations of SDXL Turbo

Some limitations to keep in mind about the model: speed comets with a price, quality. In general, the generation of images will be limited in some aspects, so it will not be a big advancement for some use cases. Let’s see the main points:

Fixed Resolution Output: One of the primary limitations is the fixed resolution of the generated images. Currently, SDXL Turbo produces images at a 512×512 pixel resolution.

Imperfect Photorealism: Despite its advanced capabilities, SDXL Turbo does not achieve perfect photorealism.

Rendering Text: The model struggles with rendering legible text (common among diffusion models)

Rendering Faces and People: Another notable challenge is the proper generation of faces and people.

Lossy Autoencoding: The autoencoding component of SDXL Turbo is lossy, meaning some information gets lost in the process of encoding and decoding images. This aspect can affect the fidelity and detail of the generated images, especially when subtle nuances are critical to the overall composition.

Some Examples

I quickly run the ComfyUI workflow on my PC with 12GB of VRAM, I have been able to generate a batch of 8 images in less than 4 seconds! It is extremely fast, definitely an improvement in terms of speed, but also at lower resolution than the standard 1024x1024px of SDXL.

Generating batches of 4 images in a few seconds

If you also want to try it out, you can check this guide to get started with ComfyUI.

sdxl turbo painting

Results seems to be slightly better than LCMs, which are another recent type of models that can generate images in near real time, with some loss in quality and details.

Conclusion

SDXL Turbo, the latest offering from Stability.ai, looks incredibly promising, particularly with its performance capabilities. While some might find the lower resolution a downgrade, this model still represents exciting news for many use cases. It’s definitely worth a test run in ComfyUI to see how it performs: for this, you will have to download the weights on huggingface and the ComfyUI workflow. You can also try out the demo on stability.ai. Let’s see what kind of creative possibilities SDXL Turbo unlocks!


Resources

Model: https://huggingface.co/stabilityai/sdxl-turbo/tree/main

ComfyUI workflow: https://comfyanonymous.github.io/ComfyUI_examples/sdturbo/

Announcement: https://stability.ai/news/stability-ai-sdxl-turbo

2 thoughts on “SDXL Turbo Debuts: StabilityAI’s Latest Real-Time Text to Image Model

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.