20 mins read

Stable Diffusion Web UI by automatic1111: A Complete Guide

The Stable Diffusion Web UI by AUTOMATIC1111 is one of the most popular UIs to run Stable Diffusion on your PC. It has been available since the beginning and has evolved rapidly over time, thanks to the open-source community behind it. This comprehensive tutorial will guide you through the ins and outs of this powerful tool, ensuring that you can make the most of it, starting with the basics up to ControlNet, LORAs, Inpainting and more.


Download and Install stable-diffusion-webui

Firstly, you’ll need to install the UI by going directly to their github repository. You’ll have two options: cloning the repository or downloading the zip file. Check the “Installation and Running” in the README for detailed instructions on how to install it, depending on your operating system and preferences.

If you decide to clone the repository, ensure you have Python 3.10.6 and Git installed, as specified in the dependencies page. Then, you can unpack the file and double-click on “webui-user.bat,” which you’ll find in the “stable-diffusion-webui” folder.

If you prefer not to worry about Git and dependencies, you can download this 1.0.0 release version as a zip from the automatic1111 repository, and everything is included. Just double-click “run.bat” to launch it. The browser should open automatically; if not, just head to http://127.0.0.1:7860/ in any browser and the UI will load. A small tip, if you want to set the UI in dark mode, simply add /?__theme=dark at the end of the url like this:
http://127.0.0.1:7860/?__theme=dark

stable diffusion webui interface

Getting Started: Selecting Your Model


The second most important thing you need to generate images is a base model. The most popular ones are Stable Diffusion and Stable Diffusion XL, but you can also download custom-made models, depending on the subject and style you want to see in your final output. You can place the models in the “\stable-diffusion-webui\models\Stable-diffusion” directory. After that, you can select the model from the dropdown menu at the top. If you add additional models while the UI is running, don’t forget to click “refresh” next to the dropdown menu to make them appear in the list.

Different models produce different styles and understand various themes. You can refresh this list to see newly added models. You can find Stable Diffusion on HuggingFace, and many custom models on Civit.ai.

Installing Extensions

Before delving into the details of image generation, let’s discuss extensions, a feature that has made this UI so popular and versatile. Extensions enhance your Stable Diffusion experience by adding more functionality, making it one of the essential aspects to familiarize yourself with. Extensions provide features that streamline your workflow and boost its power, including ControlNet, LORAs, Upscaler, and more.

Installing extensions is straightforward. Just go to the “Extensions” tab, where you’ll find “Installed” displaying your currently installed extensions, “Available” for browsing and searching among many extensions developed by others, and “Install from URL,” where you can install a specific extension by pasting the GitHub URL.

stablke diffusion webui install extensions

Wait for the installation confirmation, and then restart AUTOMATIC1111 for the changes to take effect. You can do this by going to “Installed,” selecting your extension, and clicking on “Apply and restart UI.” Your extension will appear, providing the additional functionality you need, depending on its type.

Text to Image

This is the first tab in the UI, the classic way to generate images using diffusion models. By specifying a set of instructions as text, you can generate one or more images.

Positive and Negative Prompts


Let’s introduce prompts now. There are two types: positive and negative prompts. Positive prompts describe what you want in your image, while negative prompts specify what to avoid. You can use plain English or a list of tags. For example, instead of saying “a photo of a woman,” you can use “photo, woman.” To emphasize certain elements, use parentheses or specific numbers, like “(woman:1.2).” The earlier a word appears in your prompt, the more weight it carries. You can find inspiration in this prompt collection for Stable Diffusion.

Parameters

  • Sampling Steps: This determines the detail level in your image. More steps mean a more refined image. A good starting point is 20 steps.
  • Sampling Method: This is like choosing your AI artist. Options like Euler A, LMS, or DPM++ 2M Karras offer different styles.
  • Hires.fix: To generate a higher resolution image, with more details and to avoid artifacts or duplicates when generating images larger than the base 512×512.
  • Refiner: This field is used when you want to run Stable Diffusion XL – in addition to the base SDXL model, you will need to download the corresponding refiner.
  • Resolution: Set the width and height of your image. Stick to multiples of 64 for the best results.
  • CFG Scale: CFG Scale balances creativity and adherence to your prompt. A lower scale means more creativity, while a higher scale results in more literal interpretations. A range of 6-13 works best.
  • Batch Count: This refers to how many times the image generation pipeline is executed.
  • Batch Size: This is the number of images produced in each run of the pipeline. The total images generated equal the batch count multiplied by the batch size. Typically, adjusting the batch size is more efficient for faster generations.
  • Seed: The seed value determines the precise outcome of your image. Leave it at -1 for random results. If you wish to recreate the same image, use its seed value and ensure all other parameters remain unchanged. If you alter prompts or parameters while keeping the same seed, you will generate a very similar image with minor differences, which can be useful for maintaining consistency.
  • Seed (extra): You can also opt to create another image with a fixed primary seed and a secondary random seed with variable strength. This is useful for controlling the degree of difference you want in an image while still maintaining the style of the primary seed.

HiresFix

The “High-res fix” feature is a crucial tool for addressing distortions and anomalies that frequently occur in high-resolution image generation, especially when surpassing the standard 512×512 resolution. This tool proves especially useful in situations where you encounter duplicates or replicas in your high-resolution image, such as an extra head on a figure or a disproportionately elongated tor

For example, selecting a 2x upscale on a 512×512 base setting results in a highly detailed 1024×1024 image. Generating higher-resolution images requires a substantial amount of VRAM, so choose the image size carefully. The number of steps also affects the generation time of the upscaled image.

A solution to this challenge is the Kohya Hires.fix extension, which allows you to generate higher-resolution images without encountering annoying duplicates. When using Kohya Hires.fix, you can select a resolution of 1024×1024 or higher directly, unlike the standard hires.fix, which first generates a lower-resolution image and then upscales it. Kohya Hires.fix is an extension than you can download on github.

LORAs

A notable mention goes to LORAs, which serve a similar function to custom models but are more versatile and occupy significantly less storage space. You can use one or more LORAs to influence the final output. For example, you can have a LORA to generate black and white drawings, add more details, apply a specific style, and more. You can select them in the LORA tab after downloading the files (safetensor) into the “\stable-diffusion-webui\models\Lora” directory.

Click Refresh and your LORAs will appear in the UI.

When you select a LORA, a snippet of text will appear in the positive prompt field. This text is used to activate the LORA. You will also notice a number next to the LORA’s name, for example, lora:add_detail:1. This number can be adjusted within a range of 0 to 1, with 1 indicating the LORA will have the highest influence on the output. You can reduce this value, for example, to 0.7 or 0.5. Experiment with different values to understand how they affect your final image.

Furthermore, you can combine multiple LORAs simultaneously to achieve even more intriguing results, balancing their influence using the numbers next to their names.

You can specify how much weight each LORA should carry in your final output, with 1.0 indicating the highest strength.

Image To Image

The Parameters in this section are quite similar to those in the text-to-image section, with a significant distinction: here, you have the option to use an existing image as a starting point or reference to generate another image. Depending on your settings, you can obtain an image that is more or less accurate compared to your input image.

A crucial parameter in this case is the Denoising strength: the higher the value, the more different the output will be from the input image, and vice versa.

Here, by setting the denoising strength to 0.75, the core of the original image remains, but additional details can be added that were not present in the original. If you lower this value to 0.25, for instance, you will get a result that closely resembles the original input.

If the new image’s aspect ratio differs from the original, you have several options to adjust it accordingly.

  1. Just Resize” – This option scales your image to fit new dimensions. It may stretch or squeeze the image to fit.
  2. Crop and Resize” – This method involves fitting the new image dimensions into the original image. Anything that doesn’t fit is cut off, but the original aspect ratio is kept.
  3. Resize and Fill” – This adjusts your image to fit into the new dimensions and fills any extra space with the average color from your image, preserving the original aspect ratio.
  4. Just Resize (Latent Upscale)” – Similar to “Just Resize,” but this method scales the image in its latent space. For a clearer image, use a denoising strength higher than 0.5.

Sketch

The concept is similar to image-to-image generation, but with the ability to draw on top of the image to modify specific areas or details using a brush. Furthermore, you can also add or create a sketch from scratch without using a base image. Simply load a white or black image background and then start drawing on top of it.

Inpainting

Inpainting allows you to edit specific areas of an image. Load an image, use the mask tool to select the area, and choose how you want the AI to fill it. Options like “latent noise” or “fill” offer different effects.

You can start by downloading the checkpoint on huggingface: https://huggingface.co/runwayml/stable-diffusion-inpainting/tree/main

And put it in the models/stable-diffusion folder like any other model.

In order to get familiar with inpainting, I followed this process which includes some tips and information about parameters to get the best results:

  1. Generate an Initial Image: Start by creating an image. It’s okay if it’s not perfect; the key is to have the right general composition. You can also skip this step if you already have an image that you want to use.
  2. Move to Inpaint Mode: Use the “send to inpaint” button in the GUI to begin editing your image.
  3. Refine the Prompt: Use the original prompt as a base and improve it by focusing on the areas that need correction. Alternatively, use the “interrogate” feature in img2img mode to suggest a prompt that fits the image’s style.
  4. Select the Area for Inpainting: Use the brush tool in inpaint mode to highlight the specific area you wish to modify.
  5. Optional Prompt Adjustment: For better results, either add details to the original prompt or create a new one focusing on the selected region. This step is optional but recommended.
  6. Choose the Right Mode:
    • Original Mode: Ideal for correcting flawed areas without changing the content. Use ‘restore faces’ for facial fixes.
    • Fill Mode: Best for color-based corrections like background tweaks or skin blemishes, as it uses existing colors from the image.
    • Latent Noise Mode: Useful for adding new elements or making significant changes to a selected area.
    • Latent Nothing Mode: Suited for less detailed areas, such as plain backgrounds, although its best use-cases are still being explored.

Optional Settings

  • Mask Blur: Adjust based on image resolution (e.g., 4 for 512×512, 8 for 1024×1024). Increase the value for background or skin corrections.
  • CFG Scale: A slightly higher value like 8 or 8.5 is preferable.
  • Denoising Strength: Lower this setting for more varied results, especially when using Latent Noise mode.

Do not forget to Iterate as Needed: If you’re satisfied with the result, remove the original image, drag the output to the input section, and repeat the process from step 3 for other areas. Continual prompt refinement might be necessary.

Additional Tips:

  • Combining Elements: For unique combinations like half cow, half horse, use prompts like “a [dog|cat] in a room,” which alternates between “a dog in a room” and “a cat in a room” during generation.
  • Transforming Elements: If an element like handcuffs isn’t generating well, transform it from a simpler shape, e.g., replace “handcuffs” with “[sunglasses:handcuffs:0.25]” in the prompt.

Here I experimented with different images and applications of inpainting. Generally I used the original prompt, with an additional mention to what I want to modify depending on the selected area. I will try to remove an element from the image, replacing a part with a completely new subject and fixing some distorted eyes on a face.

Removing an element

Replacing an element

Fixing a detail

What works best for me is experimenting with parameters as much as possible. It might not be easy to get a great result with a first try, but after some tweaking I believe that what we can do with inpainting is very powerful, especially to fix other generated images which often contain artifacts or bad anatomy.

ControlNet

ControlNet is a very powerful technic to give more control on the coimposition of your generations. You can get all the details in the main github repository.

In order to use it in stable-diffusion-webui, you can use this extension: https://github.com/Mikubill/sd-webui-controlnet. Then apply and restart UI from the Installed Tab. You will also need to download the ControlNet models, and put them in the models/ControlNet folder which appears right after you install the extension correctly.

Now that everything is set, go to the Text to image tab, and right below seed, you can expand the ControlNet section:

You can use ControlNet to retain poses, facial expressions, or to generate illusions and QR codes (using the QRCodeMonster model). Control Weight is an important parameter, determining how much strength the control image should have in the final output. You can also select different types of ControlNet (do not forget to download all the corresponding models), depending on what you want to achieve.

As example, I’m using the Canny ControlNet with a portrait picture of a man, and I want to make his clothes more formal.

Upscaling

I think this is one of the best functionalities of automatic1111, the easiest way to upscale images without adding details or artifacts – it will simply increase the resolution keeping all the details of the original images. Use the Extras tab for upscaling, where you can choose different Upscalers.

You might notice that some upscalers make the image too smooth or plastic-looking, so not all of them will work well depending on the type of image that you want to upscale and the results that you are looking for. You can choose the Scale by factor to upscale the image by 2x or 4x, or use the Scale to tab where you give the exact final resolution.

You can also download additional upscalers and put them in the models/ESRGAN folder. For example, I have the 4x-UltraSharp.pth upscaler that seems to be very versatile and doesn’t create an excessively smooth effect. After choosing your image, you will find them in the outputs/extra-images folder.

automatic1111-image-upscale
Left is the original image, Right is the upscaled one – 4x with R-ESRGAN 4x+

Keep in mind that upscaling is quite resource intesive, so depending on the amount of VRAM that you have, you will be limited in the final resolution of the image.

What is the difference between this and Hires.fix? In simple terms, the upscalig process of the extra tab dosen not modify the content of the image to be upscaled, it will keep it exaclty as it is. Hires.fix, on the other hand, uses a base image which is then upscaled to a higher resolution image, and it might add details and artifacts in the process, which can be a good thing is some cases, and an undesired output ion others.

PNG Info

The tab PNG Info can be very useful to retrieve the prompt and parameters that you used to generate that image. By default, all the images generated with stable-diffusion-webui will contain metadata that can be retrieved by the UI.

If you want to avoid saving the metadata with your images, just go to settings and deselect “Save text information about generation parameters as chunks to png files.” Note that in this way, you will not be able to retrieve the information of the images that you create, so be ready to save your prompts if you do not want to forget your favorite ones.

Checkpoint Merger

The Checkpoint Merger feature lets you combine different models, creating unique, blended styles. You can decide how much importance you want to give to each model that you are going to merge, to retain the features that you prefer.

The output of this process is another .safetensor model, which you can then add to the models/stable-diffusion folder and use it like any other model.

Stable Diffusion XL Turbo

You can run the latest SDXL Turbo model by StabilityAI in automatic1111, simply download the model and set the parameters as follow:

CFG Scale: 1, otherwise it will produce ugly results

Sampling steps: 1, that’s why is so fast!

Then I set the batch size to 6, and almost instantly I get 6 nice generated images.

Resources

stable diffusion webui: https://github.com/AUTOMATIC1111/stable-diffusion-webui

ControlNet: https://huggingface.co/lllyasviel/sd-controlnet-seg/tree/main

Inpainting: https://huggingface.co/runwayml/stable-diffusion-inpainting

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.