4 mins read

Stable Video Diffusion Revealed: 2 Impressive Image to Video Models (With a Catch)

Stability.ai has just made waves in the world of generative AI by introducing Stable Video Diffusion, the latest image to video model built upon the successful Stable Diffusion image model.

Research Preview and Accessibility

Stable Video Diffusion is now available for researchers and enthusiasts to explore in a research preview. They’ve generously made the code for Stable Video Diffusion accessible on their GitHub repository, and you can easily find the required weights to run the model locally on their Hugging Face page.

Still of example videos as showed by Stability.ai

Versatile Image to Video

One of the standout features of Stable Video Diffusion is its adaptability to various downstream applications. It’s a versatile tool that shines in tasks like generating multi-views from a single image, with the option to fine-tune on multi-view datasets. Stability.ai is actively working on expanding its capabilities to cater to a wide array of applications.

Practical Applications and Accessibility

Here’s where it gets exciting: Stability.ai is gearing up to launch a waitlist for an upcoming web experience that’ll showcase Stable Video Diffusion’s practical applications – A text to video interface. This tool will offer a sneak peek into how the model can be applied across industries such as Advertising, Education, Entertainment, and more.

The Current Landscape of Text-to-Video Solutions

Which solutions do we have now? There is AnimateDiff, which, when used in conjunction with Stable Diffusion, can generate AI-powered videos. However, one of the significant challenges with this approach has been maintaining consistency between frames and mitigating strong artifacts. ControlNet offers a possible solution, addressing some of the challenges related to frame consistency. Meanwhile, Runway’s text-to-video tool delivers impressive results, but it’s not open source. These existing solutions have contributed to the progress in this field, and the arrival of Stable Video Diffusion presents an exciting addition to the toolbox of AI Creators.

image to video with svd
An image animated using Stable Video Diffusion

Impressive Performance

Stability Video Diffusion debuts in two forms, capable of generating 14 and 25 frames at customizable frame rates ranging from a leisurely 3 frames per second to a brisk 30 frames per second. In preliminary evaluations, these models have shown remarkable performance, surpassing existing closed models in user preference studies. The model itself is therefore an image to video model, not directly trained to receive text as input.

However, please note that running this model locally may require a GPU with 40GB of VRAM, which could pose a challenge for many standard PCs. If you’re the technical type, you’ll also find a detailed research paper providing insights into the model’s workings.

After just a few days of releasing the models, several optimizations have been made, and it is now possible to run Stable Video Diffusion using ComfyUI with as low as 12GB of VRAM!

Exclusively for Research (for Now)

It’s worth noting that, for the time being, Stability Video Diffusion is primarily intended for research purposes. The folks at Stability.ai are keen to receive user feedback on safety and quality to fine-tune the model for broader applications in the future.

An Expansive AI Portfolio

Stable Video Diffusion joins Stability.ai’s ever-growing family of open-source models, encompassing diverse modalities like image, language, audio, 3D, and code. Even if these models are available for everyone to be used, Stability.ai recently specified that using the models for professional purposes requires a subscription, depending on the size of your applications.

Anyway, will OpenAI be next, with a closed source image to video feature integrated in ChatGPT? Meanwhile, let’s enjoy this new model for other amazing creations.


Announcement: https://stability.ai/news/stable-video-diffusion-open-ai-video-model

Git repository: https://github.com/Stability-AI/generative-models

Model weights: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt

2 thoughts on “Stable Video Diffusion Revealed: 2 Impressive Image to Video Models (With a Catch)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.