Chinese tech company Kuaishou Technology released a text-to-video (T2V) generator called Kling that could rival OpenAI’s Sora.
In February, OpenAI wowed us with Sora demo videos that had us all frantically looking for the “sign up” button. Four months later, and we’re still waiting for Sora to be released with no word on when that could happen.
Beijing-based Kuaishou develops content sharing platforms which it says make “content production, distribution, and consumption fast and easy.” The company’s short video platform, also called Kuaishou, is second only to TikTok in terms of average daily active users.
Producing content for its platforms would be a lot easier if it didn’t have to rely on human-generated content. This may be some of the motivation behind the development of its T2V tool.
Kling turns text prompts into temporally and spatially coherent videos that look great. Kuaishou says Kling can generate videos of up to 2 minutes at a 1080p resolution and 30 frames per second.
That’s a minute longer than what OpenAI says Sora can produce. The other big differentiator is that Kling has been released to the public while Sora is still under wraps. If you’re in China, or have a Chinese mobile number and a VPN, you can apply to try the app now.
Like Sora, Kling uses a diffusion transformer architecture. It also has powerful 3D face and body reconstruction technology that can use a full-body image as a prompt to generate a video with smooth limb movements.
If you remember the slightly terrifying video of Will Smith eating spaghetti from the early days of AI-generated video then you’ll appreciate how amazing this video generated by Kling is.
Sora by OpenAI is insane.
But KWAI just dropped a Sora-like model called KLING, and people are going crazy over it.
Here are 10 wild examples you don’t want to miss:
1. A Chinese man sits at a table and eats noodles with chopstickspic.twitter.com/MIV5IP3fyQ
— Angry Tom (@AngryTomtweets) June 6, 2024
Most of the impressive demo videos where a lot of movement is involved are short clips. The longer videos are more scenic with less dynamic elements, which might hint at some of the limitations of the tool.
This clip of a changing scene shown from a train window perspective is pretty impressive.
2. Traveling by train, viewing all sorts of landscapes through the windowpic.twitter.com/WqF9rlJxbh
— Angry Tom (@AngryTomtweets) June 6, 2024
The visual elements that AI has historically struggled with are things like fingers, teeth, or natural mouth movements. Here’s an impressive clip which shows Kling get these right in a very natural looking way.
3. A Chinese boy wearing glasses enjoys a delicious cheeseburger with his eyes closed in a fast food restaurantpic.twitter.com/ZOCy0n3gTa
— Angry Tom (@AngryTomtweets) June 6, 2024
The beta release of Kling is in some ways a commentary on the East vs West approach to AI. While the West debates AI safety, privacy, and dangers of disinformation, China is surging ahead with development. And this in spite of US sanctions trying to slow it down.
While OpenAI tries to work out how to make Sora “safe” or politically correct, we might need to look to China to give us a decent T2V tool in the absence of one made in the USA.