7 mins read

A GLIGEN UI for Precise AI Image Compositions

Generate images with stable diffusion with incredible precision thanks to Gligen, enabling you to choose where to place single elements within a picture. A user-friendly interface supported by ComfyUI will elevate AI image generation to a new level of creativity and control. Let’s explore how to download and install these tools!

Introduction

GLIGEN, or Grounded-Language-to-Image Generation, is a relatively new way to create images from text that’s simpler and more effective. Unlike older methods that only use text, GLIGEN lets you add extra details, called grounding inputs, to your description, giving you much more control on the final composition of the image.

Here’s what makes GLIGEN special: it allows you to include extra information along with your text, like:

  • Boxes: You can say where objects should go in the picture.
  • Images: You can even use another picture to influence the style or look.
  • Other inputs: It works with different kinds of details, like keypoints or depth maps, for special effects.

By mixing text with these extra details, like a generic text prompt, GLIGEN can make pictures that match what you want much better, putting elements exactly where you want them.

So what do you get with Gligen?

  1. More control: GLIGEN lets you decide exactly how your picture will turn out.
  2. Better creativity: combine multiple and heterogeneous elements in a single image.

This can indeed be very useful, especially with complex compositions where you might typically rely on lengthy and intricate text prompts to describe everything in your desired image. Additionally, positioning items within the image solely using text can be challenging, as stable diffusion might not comprehend complex instructions easily.

GLIGEN simplifies the process of transforming your words into pictures by allowing you to generate elements in specific locations, using boxes to specify where to place a particular item. While this approach might seem somewhat abstract, let’s delve into an actual example to demonstrate how to use this technique.

Gligen UI Guide


Gligen has been available since January 2023, but I was unaware of any local UI that could utilize this technique until a new gligen-ui was shared on the Stable Diffusion subreddit. Intrigued, I decided to give it a try, and I’ll walk you through how you can start experimenting with it.

The fresh and new UI can be found at https://github.com/mut-ex/gligen-gui, and all credits go to mut-ex.

Before diving in, ensure that ComfyUI is up and running on your PC. You can simply keep ComfyUI open with the default workflow; just make sure you have a diffusion model based on Stable Diffusion 1.5. If you need help with ComfyUI, check this tutorial here.

Installation

Let’s get started: download the Gligen model and place it in the following directory: ComfyUI\models\gligen. You can find the model on HuggingFace at this link.

Assuming you have Python and Git installed, you can follow the steps outlined in the README to get started. A Python virtual environment has been created to keep the tool isolated from other Python libraries.

conda create -n gligen 
conda activate gligen

Then, to install the actual UI, you’ll need to execute these commands in a Terminal:

pip install flask
git clone https://github.com/mut-ex/gligen-gui.git
cd gligen-gui
flask --app "gligen_gui:create_app(8188)" run --port 5000

The final command is the one that will actually start the application. Make sure to use double quotes in the last command; otherwise, it might give you an error. Your app is ready if you see this line in the terminal: Go to: http://127.0.0.1:5000/port/8188.

Now, open a browser and navigate to the URL mentioned above. The UI should load on the page.

gligen ui local

Composition

You start by having an idea in mind for your image. Then, drag your mouse on the canvas in the top left part to draw bounding boxes. Next, label these boxes by entering prompts in the corresponding text inputs located on the right-hand side table. It’s advisable to keep these prompts simple at first, so you can get a better understanding of what the tool does.

gligen ui regional prompts

If you wish to provide additional details about your image, you can use the text input named “POSITIVE.” However, it’s advisable to stick to tags related to the desired style and quality of the image for optimal results.

Make sure to select a checkpoint from the dropdown menu. Then, when all your boxes are ready, click on “Queue prompt.” Keep in mind that the image might not be perfect on the first try; you’ll need to experiment to understand how the boxes interact and blend the content. Sometimes, a box could generate an image that doesn’t fit well with the rest of the composition, so feel free to remove and replace your boxes as needed. These are the changes I made after experimenting with the same prompts:

gligen boxes ui

I needed to make the sky and sunset boxes smaller, increase the overlap, and simplify the text prompts until I achieved a result that I think worked quite well.

It’s better to specify a very specific item in the boxes. By replacing “a storm” with “thunder,” I got exactly what I wanted at that location. I then included the concept of the storm in the general POSITIVE prompt, and I think it worked better that way. Sometimes even leaving that field empty gives good results, so it is not a must but it might help to get more consistency.

This UI is still new, but I find it already very powerful. It’s an easy way to experiment with Gligen. You can select many of the commonly used parameters with Stable Diffusion, and it’s even compatible with LORAs.

The base model you use can also significantly affect the quality of your output. Keep an eye on the Git repository, as there might be frequent changes and possibly new features added over time.

Conclusion

Gligen is a technique that has been known for some time, enabling you to compose AI images with meticulous control by specifying what you want to create in a particular area. It’s not just about control through text prompts; it also involves spatial information. The latest gligen-ui simplifies the implementation of this technique, making it more accessible for us.


Post on Reddit.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.