OpenAI to Launch New Gen-AI Video Generation Model, sCM

Sam Altman’s Open AI launched yet another AI model to enhance its AI video offerings earlier this week on October 23.

Called the continuous-time consistency model (sCM), is an improved version of previous consistency models that generated realistic-style AI video media.

The new model aims to significantly advance the generation of realistic images, 3D models, audio, and video.

OpenAI alleges they can generate video media 50 times faster than models currently used.

sCM was designed to accelerate the sampling process of diffusion models.

Diffusion models are a type of generative AI that can generate data by gradually transforming noise into the desired output through a series of denoising steps.

These diffusion models need sequential guidance to yield a single sample. The generation process is often slow and not as realistic as expected.

Introducing sCM to the mix could be advantageous because it offers a faster alternative to directly convert noise into noise-free samples in fewer steps.

What is OpenAI sCM?

OpenAI sCM is a new AI video generation consistency model introduced by the ChatGPT owner, presenting a faster alternative to conventional diffusion models.

The AI giant claims to have scaled the training of continuous-time consistency models to an “unprecedented 1.5 billion parameters on ImageNet at 512×512 resolution.”

The sCM is a large and complex model trained on a humongous database of images, allowing it to generate highly realistic and detailed images.

ChatGPT maker’s new consistency model can also produce high-quality images comparable to the quality of diffusion models and still generate media at a much faster rate.

The firm says that sCMs can generate samples with quality comparable to diffusion models using only two sampling steps, resulting in a ~50x wall-clock speedup.

Alluding to an example of OpenAI’s largest model, the company states, “Our largest model, with 1.5 billion parameters, generates a single sample in just 0.11 seconds on a single A100 GPU without any inference optimisation.”

“Additional acceleration is easily achievable through customised system optimisation, opening up possibilities for real-time generation in various domains such as image, audio, and video,” added OpenAI.

OpenAI sCM is a significant advancement in generative AI, providing improved quality, speed, and stability compared to older models.

sCM Pitted Against Other Gen-AI Models

OpenAI tested its sCM against other gen-AI models and compared the images they generated. They found that sCM generated images that were just as good as the best ones out there, but it used much less computing power.

“For rigorous evaluation, we benchmarked sCM against other state-of-the-art generative models by comparing both sample quality, using the standard Fréchet Inception Distance (FID) scores (where lower is better), and effective sampling compute, which estimates the total compute cost for generating each sample,” stated OpenAI.

In a graph, the American AI organisation showed that sCM in 2 steps generated image samples with quality comparable to the best previous methods while using less than 10% of the effective sampling compute.

open ai scm effectivity level on graph — Credit: OpenAI

How OpenAI sCM works?

OpenAI sCM generates high-quality images similar to that of diffusion models but skips the gradual transformation. Instead of a step-by-step denoising approach, sCM converts noise directly into noise-free samples in a single step.

This is also why they do not require as much computational power and are much faster in generating media than traditional diffusion models.

OpenAI trained sCM by deploying techniques like consistency training or consistency distillation, consistency models to generate high-quality samples with significantly fewer steps, aiming to make them appealing for practical applications that require fast generation.

Additionally, the firm noted that sCMs refine the information from pre-trained diffusion models with key findings showing that sCMs also improve proportionally with the teacher diffusion model as both scale up.

This implies that as both the sCM model and the pre-trained diffusion model get larger and more complex, the benefits of using the pre-trained model to train sCM also increase. As a result, sCM can leverage the knowledge of existing models to produce even better results.

difference between openai scm and diffusion model — Credit: OpenAI

OpenAI notes that the relative difference in sample quality, measured by the ratio of FID scores, remains consistent across several orders of magnitude in model sizes, causing the absolute difference in sample quality to diminish at scale.

The organisation found that using more steps in the sCM process can make the images even more realistic. However, with 2 steps too, sCM can create images that are almost as good as images made by a different type of model that takes hundreds of steps. This shows that sCM is very efficient and can produce high-quality results with minimal effort.

When will OpenAI sCM be available?

OpenAI has not released the model publicly yet as it plans to enhance the new sCM gen-AI model including its inference speed and sample quality.

“We believe these advancements will unlock new possibilities for real-time, high-quality generative AI across a wide range of domains,” the AI firm stated.