Stability AI just announced the release of its state-of-the-art real-time text-to-image generator called SDXL Turbo.
When you use AI text-to-image generators, there’s usually at least a few seconds of waiting time between prompt and picture. With SDXL Turbo the image is generated in milliseconds.
What makes this even more impressive is that as you edit your prompt, the image changes in real-time as fast as you can type.
Stability AI’s demo video gives you a good feel for how groundbreaking this is.
Adversarial Diffusion Distillation
The secret sauce behind this is a new distillation technique called Adversarial Diffusion Distillation (ADD).
Standard Diffusion Models (DM) are behind most AI image generators and deliver high-quality images. A DM starts with a noisy image and gradually removes noise until it resembles an image related to the prompt.
The inference process with a DM is a slow, iterative process that takes a lot of steps and plenty of computer processing.
The other image generation alternative is to use Generative Adversarial Networks (GAN). A GAN plays an adversarial game between a generator and a discriminator neural network. GANs produce an image in a single step so they’re really fast.
The problem with GANs is that they don’t scale well and they can sometimes suffer from mode collapse. This happens when the generator gets stuck in a local optimum and produces only a limited variety of images.
ADD is a kind of hybrid model between the two approaches. It combines the scalability and quality of a DM with the speed of a GAN.
And it’s really quick. When running SDXL Turbo on an Nvidia A100 it generates a 512×512 image in 207ms.
The model weights and code are available for non-commercial use on Hugging Face. If you want to try the beta demo you can check it out on Clipdrop. The demo gives you a sense of the speed but the quality of the images is reduced.
SDXL Turbo can also be set to use 2 or 4 steps to generate an image with even better quality.
Your first reaction to this may be relief at not having to wait so long for your image to be generated, and that is a big plus.
But SDXL Turbo will achieve so much more. With its single-step image generation approach just think of how much computing power will be freed up.
The real-time generation capability means you could eventually create animations or dynamic visuals to match a story’s text in real-time.
It’s been barely 4 months since Stability AI released its improved diffusion model, SDXL 1.0, which was already really good.
In blind tests, users preferred the images generated by SDXL Turbo in 4 steps to those generated by SDXL in 50 steps. A 12x efficiency improvement in just four months is amazing.
It makes you wonder just how good AI image generators will be a year from now.