8 mins read

Stable Cascade: A New Model That Can Generate AI Images Better And Faster

Stable Cascade is a new kind of image generation model, built on a design called Würstchen, and it works differently from other Stable Diffusion models. Stable Cascade can generate high-quality images using fewer resources, and it excels at incorporating text within images. Let’s explore how to run it using ComfyUI, which supports it natively!

Generated with Stable cascade

About Stable Cascade

In simpler terms, imagine if you could take a big picture and squish it down into a tiny version while still keeping it clear. That’s what Stable Cascade does, but it squeezes images even more than other models. While Stable Diffusion reduces images to about 1/8th of their original size, Stable Cascade crunches them down to just 24×24 pixels, and the pictures still look good.

And why does size matter here? Well, when the model has to work with smaller versions of images, it can generate new images faster and it’s also cheaper to train.

Technical Details

stable cascade generates text
Stable Cascade can generate text in images

Here’s how it works: Stable Cascade has three stages called Stage A, Stage B, and Stage C. Each stage plays a role in creating images. Stages A and B squeeze the images down, and Stage C finishes the job by turning text prompts into pictures.

stable cascade paper
From StabilityAI on GitHub

In the research paper behind SC, they explain that the focus is in the latent space: this is the theoretical space where the images are created: imagine this space like a canvas where the program paints images. The smaller this space is, the quicker the program can make images, but they might not look as detailed. Previous methods like SDXL used a space that compressed the images about 8 times, sacrificing some detail but still making decent images.

However, it has been found a way to compress images by 42 times! This drastically reduces the time and resources needed to train the program and create images. Even though they’re using such a small canvas, their images still look good because of a clever setup where they chain together multiple smaller spaces.

So, in simpler terms, they’ve figured out how to make a model (or better, a combination of models) that can quickly and efficiently create detailed images even though it’s working with very limited resources.

There are different versions of the model, some with more parameters than others. Generally, the versions with more parameters give better results. You can find all the details and the models at stabilityai/stable-cascade on HuggingFace.

Compatibility

Stable cascade can be used together with many of the most popular techniques used in Stable Diffusion, such as:

  1. Text-to-Image: Generate images from text prompts.
  2. Image Variation: Create variations of given images without prompts.
  3. Image-to-Image: Generate images from starting points obtained by noising other images.
  4. ControlNet: Utilize pre-trained or user-trained ControlNets for tasks like inpainting/outpainting, face identity, canny, and super resolution.
  5. LoRA: Implement and use Latent Optimized Reconstructor Architectures for fine-tuning the text-conditional model.
  6. Image Reconstruction: Utilize (Diffusion) Autoencoder to encode and decode images in a highly compressed space for faster training and running of models.

Stable Cascade can be utilized for research and non-commercial purposes only, unless alternative conditions are arranged by contacting sytability.ai. However, the resulting images generated by the model should be free to use for any purpose.

Run With ComfyUI

ComfyUI now natively supports Stable Cascade, allowing us to streamline our workflow using the ComfyUI-Manager to download the required models. You can find the workflow below and download it from flowt.ai.

Simply search for “cascade” within the manager, and you’ll find the necessary files. There will be four files available for download, so it might take some time to complete. Make sure to select the f16 version, which is smaller in size. You can cross-reference the names of the required models with the workflow you downloaded earlier.

You can also manually download the models from HuggingFace:

Once all the models are in place (remember to refresh the UI once everything is downloaded), our workflow will be ready to generate images!

If you encounter an error when generating your first image, like unet_dtype() got an unexpected keyword argument 'supported_dtypes', you might need to update the smZNodes node, using the ComfyUI-Manager.

I kept the default parameters for this workflow, and I got really nice results:

Run Locally [Legacy]

This method can be used to run Stable Cascade using the terminal. It was a temporary workaround while waiting for ComfyUI, but not it’s available as we saw previously. However I will keep this just for information. Someone made a one-click installer for Windows that you can try it by downloading a zip file from this git repository (check the Installation steps in the page for the link), but be aware that installing random batch files is not recommended. However, it is easy to inspect the content and see what commands are executed, so I decided to try it anyway.

To get started, download the zip file “stable-cascade-one-click-installer-main,” then extract it and run “installer.bat.” Windows might show you a warning because it cannot verify the program. Click on “Run Anyway” if you decide to continue, and the terminal will appear and download some files. If you are a bit more tech-savvy, here are the commands:

#Create a virtual environment
python -m venv venv
#Activate the virtual environment
call venv\Scripts\activate.bat
#Install the custom diffusers version from GitHub
pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3
#Install other requirements
pip install -r requirements.txt

It will take a while to download and install all the required libraries. When you see something like this in your terminal:

Then the installation went fine. Let’s now run generate_images.bat! Again, it will ask you to confirm that you want to run this file. The terminal will open again. It might remain black for a while. Just wait for a bit, and it will download the actual AI models (more than 14GB!).

Finally everything is downloaded, let’s see if I can generate images on my 3060 with 12GB of VRAM. When everything is downloaded, the terminal will ask you to provide a prompt and other parameters to start generating images. If you encounter any error, just close the terminal and open again the generate_images.bat.

The prompt that I used is: a penguin in a swimming pool.

A penguin in a swimming pool

Let’s ask for an image with text:

A mouse holding a sign to complain

It seems to work better with 1024×1024 images – when changing the resolution I got images that looked like cropped. Also text is much more defined, but still not perfect. Or maybe my prompting skills are too basic.

Overall the tool is a bit clunky to use in the terminal, so I recommend using the ComfyUi workflow described before.


Stable Cascade is a promising new model with the capability to produce high-quality images, including text, while effectively managing resources. Whether it will replace SD1.5 and SDXL or develop in parallel depends on various factors. Transitioning to a different model might not be easy, especially considering the already existing custom models and workflows. However, if Stable Cascade proves to be significantly better and more cost-effective, it could indeed become the preferred choice, potentially replacing the original Stable Diffusion models, until a better one is made. Ultimately, the decision will depend on factors such as performance, and cost-effectiveness and the community.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.