7 mins read

AI Text Animation with QRCode Monster and AnimateDiff


In this article, I would like to show you how to utilize Stable Diffusion to create AI text animations for captivating effects and smooth transitions with words. The creation of the input animation does not require any AI tools; you can use sites like Canva or Python for a more technical approach. Afterward, we will use a ComfyUI workflow to add colors and dynamism to our input text.


The real magic is then done by combining three elements: prompt traveling, AnimateDiff, and ControlNet (using QRCode monster).

Prompt traveling is a simple technique used to have control over an animation: it’s just a list of text prompts, where each prompt will be activated sequentially and is associated with certain frames of the final animation.

AnimateDiff is a powerful model that can turn text into simple animations and can be combined with other SD tools and techniques to give more control over the generated video.

ControlNet, in this case, is used with QRCodeMonster, a model originally made to create QR codes using stable diffusion, resulting in visually stunning and often scannable QR codes. This model can also be used with other input images, not only QR codes, and it will add text and logos into the composition, creating a sort of illusion.

By combining these three elements, we can convert a simple black and white animation into a creative AI text animation using any model and prompt available (SD 1.5 in my case).

If you wish to learn more about these technique particularly AnimateDiff, have a look at this tutorial:

Input text

This part might be a bit annoying: unfortunately, an original text animation must be provided as input for the workflow to be used. However, there are many possibilities to create simple text animations. Think of simple tools like PowerPoint or Canva.

Simple animation from Canva

If you are comfortable with Python, you can use a simple library named mnim, which is used to generate mathematically accurate animations. It can also be utilized to create text animations from scratch. You will need to install ffmpeg as to use this library.

I tried both approaches: using an input image made from Canva and another using mnim. If you want to try this last approach, I have a simple class that you can use to get started, saved in a file named videotext.py:

from manim import *

class ZoomInText(Scene):
    def construct(self):
        text = Text("HELLO", stroke_width=7,font="Arial", font_size=180, )

        self.play(Create(text, lag_ratio=0.1), run_time=4)
        self.play(Uncreate(text, lag_ratio=0.1), run_time=2)

Then you can run this simple command in the terminal from the folder where you created your py script:

manim -pql videotest.py ZoomInText

And a video will be created with the text that you gave as input in the class. This is a very basic animation, but this library is powerful, so if you decide to dig into more advanced transitions, it might give even better results.

Using manim

Maybe this library can be converted into a custom node for ComfyUI so that everything can be done in one single place. Anyway, once you are happy with your input animation, it’s time to use Stable Diffusion to make it more interesting.


You can find the workflow that I used on my OpenArt, where there are also some example assets available that you can use, if you don’t want to create the black and white animation.

ai text animation workflow comfyui

I think it looks a bit messy with all these spaghetti floating around, but I tried to add groups to clearly separate the interesting parts. Let’s examine the most important nodes in detail and the corresponding files to download:

Load a checkpoint of your preference. I’m using a fine-tuned SD 1.5 model, which is good for creating colorful and dynamic images, but you might get good results with realistic models as well. Choose a LORA as well, if you want. I used this Animatediff v3 adapter LORA, which I think helps to get a better animation.

Here, you need to choose the input animation. Note that you can adjust the number of frames that you want to process, as well as the width and height of the final animation. Try to use the same ratio as the input; otherwise, it will end up cropped. Finally, the input frames are upscaled and ready to be used by controlnets.

This group contains all the nodes to use the two versions of QRcodeMonster. I’m using both the original model (control_v1p_sd15_qrcode_monster) and the updated v20 version. They generally do the same thing, but they have different behaviors in terms of the strength of the text and the level of blending with the background. You can also bypass one of the two. Importantly, the strength parameter will affect your output, determining how much visible the text will be.

Prompt Travelling and AnimateDiff are used here. The Batch prompt Schedule is where you will put your list of prompts and the corresponding frame number to guide the content of the animation. Regarding AnimateDiff, I’m using the latest V3 version, but you can also experiment with older models or temporaldiff or older V2 versions. You can find the weights again on HuggingFace.

The last part includes a simple KSampler and an upscaler. Considering that our input size was set rather low to speed up the generation, it’s good to do a final upscale once we are happy with our animation; anyway upscaling is optional. Note that the Split Image Batch is set to 4, meaning that some frames will be skipped. If you wish to keep all the frames also in the upscaled text animation, just set it to 0.

When everything is ready and all the models are put in place, we are ready to generate our animation.

Using the input from Manim (Python) I got these results:

controlnet strength around 0.40

controlnet strength around 0.90

Using the input from a Canva animation, but keeping the same parameters and prompts, I get a very similar result:

One last example to show an animation using a realistic base model such as epicphotogasm. I needed to change the ControlNets’ strength and the prompts in order to get something interesting. However, this models seems to require a higher number of steps and possibly resolution to obtain good quality animation.


Creating AI text animations can be a nice application of the latest SD-related tools in ComfyUI. You might not get optimal results at the first try, depending on the type of input animation that you use, and most importantly, the base model. It can happen that sometimes the final video is too static or that the text is too visible. If that’s the case, try to use a different model and lower the strength of ControlNet to experiment with different results.

Questions? Just leave a comment!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.