9 mins read

Creativity with IP-Adapter in ComfyUI: A Tutorial to Make Realistic Variations of Images

What is IP-Adapter?

The IPAdapter are very powerful models for image-to-image conditioning. Given a reference image you can do variations augmented by text prompt, controlnets and masks. Think of it as a 1-image lora. We can use this IP-adapter in ComfyUI to make image generation faster and more enternaining.

from ComfyUI_IPAdapter_plus repository

IP-Adapter stands for Image Prompt Adapter, designed to enhance pre-trained text-to-image diffusion models like Stable Diffusion. With a mere 22M parameters, IP-Adapter achieves remarkable results, often outperforming models that have been fine-tuned for image prompts (like LORAs). Its versatility allows for integration with various custom models and supports controllable generation using existing tools.

The Power of IP-Adapter

IP-Adapter excels in image-to-image conditioning. Given a reference image, it can create variations augmented by text prompts, control nets, and masks. Think of it as a one-image LORA. It’s compatible with both Stable Diffusion 1.5 and Stable Diffusion XL, offering a wide range of creative possibilities.

Applications of IP-Adapter

  • Enhancing Realistic Images: Transform drawings or paintings into realistic images.
  • Merging Subjects with Landscapes: Seamlessly integrate a subject into a different background.
  • Face Swap: Exchange faces in photographs for creative or fun purposes.
  • Inpainting: Fill in missing parts of an image with relevant details.
  • ControlNet: To add extra conditions to your prompts.

IP-Adapter in ComfyUI: A Step-by-Step Guide

Before you begin, you’ll need ComfyUI. If this is your first encounter, check out the beginner’s guide to ComfyUI. Once you’re familiar, download the IP-Adapter workflow and load it in ComfyUI.

Initial Setup

Once you’ve got ComfyUI up and running, it’s time to integrate the powerful IP-Adapter for transformative image generation. Don’t worry if you encounter a few hiccups initially; these are part of the setup process and easily resolved.

A ComfyUI workflow showing errors because of missing custom nodes for the IPAdapter.
Custom nodes missing in ComfyUI

After downloading the IP-Adapter workflow, load it into ComfyUI. Initially, you might see some error messages popping up. This is a common occurrence as the UI requires specific custom nodes not present in the standard installation. To fix this, simply install the necessary nodes using the ComfyUI Manager (available in the ComfyUI guide).

Acquiring the Necessary Models

ComfyUI nodes showing an error because of some missing models files.
ComfyUI will complain about missing files

Before being able to generate anything, there are some essential models to download:

  1. Pre-Trained Models: These are available on Hugging Face. Download your chosen model – for this tutorial, we’re using ip-adapter_sd15 and ip-adapter-plus_sd12 – and place it in the ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus/models directory.
  2. Base Model: We’re utilizing a custom model named AbsoluteReality, based on Stable Diffusion 1.5. If you’re interested in using IP-Adapters for SDXL, you will need to download corresponding models. However, our current focus is on SD1.5.
  3. Image Encoders: Download the SD 1.5 model encoder. For SDXL, a specific SDXL model encoder is required. Place these encoders in the ComfyUI/models/clip_vision/ directory.
  4. VAE (Variational Autoencoder): You can download it from huggingface and then place it in the models/vae folder.

Once all the necessary files are in place, refresh your UI. You should now be able to select all the downloaded models from the corresponding nodes in ComfyUI. With everything set, you’re ready to explore the possibilities of IP-Adapter.

Generating Images

Now that we’ve set everything up, it’s time for the fun part – creating images with IP-Adapter! This process is straightforward and rewarding, especially as you begin to see your ideas come to life.

Start by selecting an input image from the Load Image node. This image will serve as the basis for your creation. Once you’ve made your choice, click on ‘Queue Prompt’, and after a few seconds (on my NVIDIA 3060), a new image, inspired by your input, is generated.

A comparison of an input image with a generated image using IPAdapter, making a painting more realistic.

The initial result seems nice, capturing the essence of your input image. However, you might notice some imperfections – like in the background – let’s see if we can fix them.

One way to enhance the quality is by increasing the number of steps in the generation process. This gives the model more time to refine the details of the image. Additionally, consider reducing the CFG (Classifier Free Guidance) scale. This adjustment can sometimes lead to a more natural-looking image.

While it’s okay to leave the prompt empty, adding a simple text prompt can significantly improve the results. Describe what you’re aiming for in a few words to guide the model. I used a photo of a beautiful renaissance girl, detailed.

If the image quality or the level of customization are still not where you want it to be, we basically got a very similar image. Let’s focus more on the IP-Adapter settings.

Adjusting the Noise Parameter

Check the Apply IPAdapter node, specifically the noise parameter, which was set to 0. By increasing this parameter, you introduce more noise into the process. Instead of a plain black image, a noisier image will be used in conjunction with your main image.

Modifying the Weight Parameter

The weight parameter is also important to keep in mind, normally set to 1.0. A lower value here means your original image will have a less influence on the final result. So try to lower this value to balance to make the input image less strong.


Let’s start by increasing the noise parameter to 0.33.

A comparison of an input image with a generated image using IPAdapter, making a painting more realistic, with a lower noise value.

By increasing only the noise parameter, I see a little improvement in the generated images, and they are still very similar to the input image. Here I tried with three different values:

0.6
0.8
1.0

Now let’s try to lower also the weight of the Apply IPAdapter node. We should expect the final image to be less similar compared to the input image; moreover, more importance to the text prompt will be given.

0.80
0.50
A comparison of an input image with a generated image using IPAdapter, making a painting more realistic, with a lower weight value.
0.25

When the weight is lower, the resulting image often goes further from the original. This introduces some artifacts and places also a greater emphasis on the text prompt, adding more elements that were not present originally.

We’re now experimenting with a different adapter – the only difference is the number of tokens. I will use the ip-adapter-plus_sd15.safetensor. Download it if you didn’t do it already and put it in the custom_nodes\ComfyUI_IPAdapter_plus\models folder.

A comparison of an input image with a generated image using IPAdapter-plus.

After another run, it seems to be definitely more accurate like the original image, and slightly smoother yet more realistic. Let’s try to play with the weight and noise parameters to see some results.

weight: 0.75, noise: 0.33
weight: 0.50, noise: 0.40
weight: 0.30, noise: 0.40

Generate an image from multiple image sources

We just explored generating images inspired by the style and elements of a single source image. But what happens when we want to incorporate multiple images? By combining different styles and elements, we can create a single, cohesive image that showcases all these aspects.

First, you’ll need the Batch Image node. Just right-click, navigate to ‘Add Node > Image > Batch Image’. Next, duplicate your Load Image node so you have at least two of these. This step allows you to select and load multiple images. Then, connect each Load Image node to the Batch Image node. Finally, link the Batch Image node to the Apply IPAdapter node.

You should see something like this:

Nodes in ComfyUI showing how to load multiple input images in the workflow.

Now load your second image and run again the Queue Prompt, setting image weight to 0.90 and noise to 0.38, for example.

ComfyUI showing an output images with a person and a background merged together.

The results have been impressively quick, and the outcome is quite satisfying, especially considering the additional time typically required to generate such images using only a text prompt. The final image effectively merges the elements and styles of both source images, altering the background without needing specific text instructions.

But why limit ourselves to just two images? Let’s experiment with three.

A ComfyUI workflow showing three input images of superhumans, and an output image that combines the styles and elements of the inputs.
Using three different image sources. Prompt: a super hero

We can see that all the elements of the three images are in the output images, some more, some less.

Adjusting the parameters and experimenting with different seeds can result in a varied balance, so it’s worth exploring these options. However, it’s important to remember that while using multiple input images can be an exciting experiment, it might not always be the best approach, particularly when aiming for a very specific result. Sometimes, less is indeed more!

Resources

2 thoughts on “Creativity with IP-Adapter in ComfyUI: A Tutorial to Make Realistic Variations of Images

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.