8 mins read

Installing PixArt Alpha Locally: An Alternative to Stable Diffusion

PixArt Alpha (PIXART-α) is a transformers-based text-to-image model that can be run locally on your PC, similar to Stable Diffusion. How does it compare in terms of quality and speed? In this tutorial, I will demonstrate how to run some examples using ComfyUI.

Overview on Pixart

PixArt Alpha is a highly efficient text-to-image model capable of generating high-quality images, comparable to advanced models like SDXL or Midjourney. What sets PixArt Alpha apart is its ability to achieve such quality with significantly lower training costs, making it more accessible for researchers and reducing environmental impact. For instance, it only requires about 10.8% of the time compared to Stable Diffusion v1.5, resulting in substantial cost savings (approximately $300,000) and a 90% reduction in CO2 emissions.

Keep in mind that while PixArt Alpha is more efficient in training, this doesn’t necessarily translate to faster image generation. I’ve observed that on my NVIDIA 3060 with 12GB of VRAM, it consumes more resources to load the necessary encoders compared to a standard Stable Diffusion.

Differences with Stable Diffusion

The main difference between the two models is found in the type of architecture: Stable Diffusion is a Diffusion-based model, whereas PixArt Alpha is a Transformers-based model, using two different types of text encoders. Both of them, however, are used following the same principle: you type a prompt, and the model will generate an image following your instructions.

Text encoders are natural language processing (NLP) models that convert text into numeric representations. This enables machines to understand the meaning of text and use it in various tasks, such as machine translation, text summarization, and question answering. T5 and CLIP are two different types of text encoders.

T5 is a transformer-based encoder that utilizes a large dataset of text and code to learn the relationships between words. This is the one that we will be using in this tutorial with ComfyUI.

CLIP, on the other hand, is a contrastive encoder that leverages two large datasets of text and images to learn how to match text to images. T5 is considered a more general-purpose encoder than CLIP and excels at tasks that demand understanding the meaning of text. However, CLIP is better suited for tasks that involve matching text to images.

Model info

  • Uses T5 text encoder instead of clip
  • Available in 512 and 1024 versions
  • Same latent space as SD1.5 (we can use the same SD1.5 VAE)

Well, these are a lot of interesting words, but let’s see how good it is at generating AI images. In this article, I will show you how I downloaded and tried out PixArt Alpha. If you want, you can follow along. I used ComfyUI for this, which is very popular among the Stable Diffusion community. Maybe you are already familiar with it. If not, have a look at this beginner’s tutorial for ComfyUI.

Install PixArt-Alpha:

I will follow the workflows introduced by the repository of this custom node named ExtraModels, which can be used to run different models and employs a transformer-based text encoder. We will explore how to run the base models and LCM variant with text-to-image and image-to-image functionalities.

XL 1024

The most basic workflow to try out text to images can be downloaded here. Load the model and use the ComfyUI-Manager to install the missing custom nodes from the same repository mentioned at the beginning. Then, ensure that every file and model is in the proper path before starting to generate images.

pixart-alpha basic workflow

Load this workflow first, then we will need to download some additional models.

You can find everything that you need in this Huggingface repository. In particular, you need to download the .pth files (either PixArt-XL-2-512×512.pth or PixArt-XL-2-1024-MS.pth version) and place them in the \ComfyUI\models\checkpoints folder, as you would with any other Stable Diffusion model.

Regarding the Transformer Loader, which is the main difference with Stable Diffusion, download these files from the t5-v1_1-xxl folder of the Hugging Face repo mentioned before:

  • config.json
  • pytorch_model-00001-of-00002.bin
  • pytorch_model-00002-of-00002.bin
  • pytorch_model.bin.index.json

and put these four file in the folloing folder: \ComfyUI\models\t5C:\Users\ema\Desktop\AIGUILDHUB\ComfyUI_windows_portable\ComfyUI\models\t5. (this folder should be available after installing the ExtraModels node).

You will also need to download the VAE, which is the same used for Stable Diffusion workflows.

When everything is ready, simply hit “Queue prompt” to start the generation.

If you encounter an error, such as being prompted to install a missing package, it might help to close and restart ComfyUI (close and reopen the terminal); this could resolve the issue.

Prompt: A volcano during an eruption

I noticed that on my 3060 with 12GB of VRAM, the generation is almost on par with the ones of SDXL, maybe a bit slower. It takes much more time to load the models and during the processing of the T5 Text Encoder node. Anyway, the results are stunning; even with a simple prompt, I got a really good image.


This is a lighter version of PixArt Alpha, which generates images with slightly lower quality than the model we saw earlier. Anyway, to try it, you can use the same workflow. Just make sure to change the ratio to 1.00 and update the model fields to PixArt_XL_2.

Let’s compare the same prompt with the two versions:

Note that I just used a simple prompt to generate these examples, I didn’t quite tried to optimize the output. Anyway, quality seems a bit better with the 1024 model.

Image to Image

There is also the possibility to use a workflow for image-to-image: download PixArt-image-to-image-workflow.json to try it out.

pixart-alpha imate to image workflow

Just check that each node has loaded the proper model files by selecting the right name, then nothing should change from your previous setup. Just choose an image to upload as a reference and specify what you want in the prompt.

In this example, I asked for a similar image but in a pixel art style.

PixArt Alpha LCM

LCM stands for Latent Consistency Model, and its power lies in its ability to generate high-quality images using a reduced number of steps, saving time for each generation. You can download the PixArt LCM model here, and it will have the name diffusion_pytorch_model.safetensor. I suggest you give it a meaningful name, like pixart-lcm.safetensor, to keep track of it.

The workflow is very simple; you will just need to add a ModelSamplingDiscrete model. I made the modifications and uploaded the workflow on OpenArt if you want to download it directly. Note that you will need to set the ratio to 1.0. Moreover, not all dimensions are supported for LCM.

pixart-alpha LCM workflow

The low number of steps, just 5, is enough and will make generation much faster (except for loading the models; this will still take some time). Set a very low CFG value as well, around 1.1.


You cannot run PixArt Alpha locally? Try the demo directly on Hugging Face. It might not always be very fast, but you won’t need to set up anything.

PixArt Alpha is definitely a valid alternative to Stable Diffusion to generate images locally, even if the text encoder that it uses might be a bit heavier for some GPUs. The strength in PixArt Alpha is, however, in the training process, which seems to give better results with reduced training time.

Overall, it’s always beneficial to have different possibilities for generating images, especially in the realm of open source. Adding more (good) competition to closed-source alternatives like Midjourney or DALL-E is a positive development.

Thanks to the creator of the custom nodes that gave us the possibility to experiment with PixArt Alpha.

5 thoughts on “Installing PixArt Alpha Locally: An Alternative to Stable Diffusion

  1. Hi – Looking forward to this – getting this error in each workflow on the Ksampler node. (MacOS)
    TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn’t support float64. Please use float32 instead.

    1. Hello, did you try to change the device type in the T5v1.1 Loader node? For example to ‘auto’ or ‘cpu’. I will try to test on MacBook tomorrow

  2. I got this error running your PixArtV3.json
    Error occurred when executing T5v11Loader:

    D:\Stable_Diffusion\ComfyUI\models\t5 does not appear to have a file named config.json. Checkout ‘https://huggingface.co/D:\Stable_Diffusion\ComfyUI\models\t5/main’ for available files.

    File “D:\Stable_Diffusion\ComfyUI\execution.py”, line 155, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
    File “D:\Stable_Diffusion\ComfyUI\execution.py”, line 85, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
    File “D:\Stable_Diffusion\ComfyUI\execution.py”, line 78, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
    File “D:\Stable_Diffusion\ComfyUI\custom_nodes\ComfyUI_ExtraModels\T5\nodes.py”, line 57, in load_model
    return (load_t5(
    File “D:\Stable_Diffusion\ComfyUI\custom_nodes\ComfyUI_ExtraModels\T5\loader.py”, line 107, in load_t5
    return EXM_T5v11(**model_args)
    File “D:\Stable_Diffusion\ComfyUI\custom_nodes\ComfyUI_ExtraModels\T5\loader.py”, line 44, in __init__
    self.cond_stage_model = T5v11Model(
    File “D:\Stable_Diffusion\ComfyUI\custom_nodes\ComfyUI_ExtraModels\T5\t5v11.py”, line 39, in __init__
    self.transformer = T5EncoderModel.from_pretrained(textmodel_path, **model_args)
    File “D:\Stable_Diffusion\python_embeded\Lib\site-packages\transformers\modeling_utils.py”, line 2942, in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
    File “D:\Stable_Diffusion\python_embeded\Lib\site-packages\transformers\configuration_utils.py”, line 615, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
    File “D:\Stable_Diffusion\python_embeded\Lib\site-packages\transformers\configuration_utils.py”, line 644, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
    File “D:\Stable_Diffusion\python_embeded\Lib\site-packages\transformers\configuration_utils.py”, line 699, in _get_config_dict
    resolved_config_file = cached_file(
    File “D:\Stable_Diffusion\python_embeded\Lib\site-packages\transformers\utils\hub.py”, line 360, in cached_file
    raise EnvironmentError(


    1. Hi, did you check that the config.json and all the other files are in the models/t5/ folder? Maybe also try the PixArt-text-to-image-workflow.json from HuggingsFace (PixArt-alpha/PixArt-alpha)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.