Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you

2025/04/1123:53:36 technology 1973

Fish and Sheep Alex from Aofei Temple
Quantum bits | Official account QbitAI

painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works.

Which documentary do you think is this?

No, No, No! Every frame in the

video, is generated by AI.

or you tell it to "close up the brush on the canvas", and it can directly create the picture.

can not only make a brush out of nothing, but it is not impossible to drink water by pressing the horse's head.

is also a sentence like "Horse drinks water", this AI throws out the picture:

What a good guy, this is how you can really shoot videos in the future depends entirely on the rhythm of ...

is good, and the text to Image of AI painting is making it prosperous, and the researchers of Meta AI have also made a super evolution for the generation of AI.

This time, I can really "make videos with my mouth":

AI is named Make-A-Video, and it directly generates ascension dynamics from DALL·E and Stable Diffusion.

gives it a few words or lines of text to generate video images that do not exist in this world, and the style you master is still very diverse.

not only can the documentary style be held, but there is no problem with the full sci-fi effect. The two styles of

are mixed, and the picture of the robot dancing in Times Square does not seem to be inconsistent.

literary and fresh animation style, it seems that Make-A-Video has also grasped it. After such a wave of operation of

, many netizens were stunned, and even the comments were simplified to three letters:

, and the boss LeCun said meaningfully: What should come will always come.

After all, in a word, many industry insiders thought that "it's so fast." But Meta's move is indeed a bit quick:

is 9 months faster than I imagined.

even said: I can't adapt to the evolutionary speed of AI...

Text image generation model Super Evolution Edition

You may think Make-A-Video is a video version of DALL·E.

In fact, that's about it (manual dog head) .

mentioned earlier that Make-A-Video is a hyperevolution of the text image generation (T2I) model. That is because the first step in AI work is actually relying on text to generate images.

from the data point of view, it is the training data of static image generation models such as DALL·E, and it is paired text-image data.

. Although Make-A-Video eventually generates video, it does not specifically use paired text-video data training, but still relies on text-image to data to let AI learn to reproduce the picture based on text. Of course,

video data is also involved, but it mainly uses separate video clips to teach AI the real world of movement.

Specifically for the model architecture, Make-A-Video mainly consists of three parts:

text image generation model P
spatiotemporal convolution layer and attention layer
used to improve frame interpolation network and two super-segment networks used to improve image quality

The entire model work process is Jiang Aunt:

First, generate image embedding based on the input text.

Then, decoder Dt generates 16 frames 64×64 RGB images.

interpolation network ↑F interpolates the preliminary results to achieve an ideal frame rate.

Next, the first super-segment network will increase the resolution of the screen to 256×256. The second super-segment network continues to be optimized, further improving the image quality to 768×768.

Based on this principle, Make-A-Video can not only generate videos based on text, but also has the following capabilities.

converts a static image into a video:

generates a video based on the first and last two pictures:

generates a new video based on the original video:

refreshes the text video generation model SOTA

In fact, Meta's Make-A-Video is not the first attempt of text-generating video (T2V) .

For example, Tsinghua University and Zhiyuan launched their self-developed "one-sentence video generation" AI: CogVideo, and this is currently the only open source T2V model.

Earlier, GODIVA and Microsoft "Nuwa" also achieved the generation of videos based on text descriptions.

However, this time, Make-A-Video has significantly improved the production quality. The experimental results of

on the MSR-VTT dataset show that Make-A-Video significantly refreshed SOTA in both FID (13.17) and CLIPSIM (0.3049).

In addition, the Meta AI team also used Imagen's DrawBench to conduct human subjective assessments.

They invite testers to experience Make-A-Video for themselves and subjectively evaluate the logical correspondence between video and text.

results show that Make-A-Video is better than the other two methods in terms of quality and loyalty.

One More Thing

Interestingly, while Meta released its new AI, it also seemed to kick off the T2V model racing.

Stable Diffusion's parent company StabilityAI can't sit still. Founder and CEO Emad said:

We will release a better model than Make-A-Video, the kind that everyone can use!

And just a few days ago, a related paper also appeared on the ICLR website Phenaki. The effect of

generation is as follows:

is right. Although Make-A-Video has not been published yet, Meta AI official also stated that it is preparing to launch a demo for everyone to experience it in practice. Interested friends can squat it~

Paper address:
https://makeavideo.studio/Make-A-Video.pdf
Reference link:
[1]https://ai.facebook.com/blog/generative-ai-text-to-video/
[2]https://twitter.com/boztank/status/ 1575541759009964032
[3]https://twitter.com/ylecun/status/1575497338252304384
[4]https://www.theverge.com/2022/9/29/23378210/meta-text-to-video-ai-generation-make-a-video-model-dall-e
[5]https://phenaki.video

— End —

Quantum bits QbitAI · Toutiao Sign

technology

The concept of electric vehicles has quietly emerged with the concept of new energy vehicles. In fact, if we call it smartphones, electric vehicles now should be called "smart cars". Although in this era, Tesla first aroused the love of electric cars in China, the world's first e

Is the world's first electric car Tesla? no! China has the final say in the future electric vehicles

06/08 1067

On October 21, CNMO learned that Microsoft and OpenAI have in-depth negotiations on capital increase. According to a person familiar with the matter, Microsoft is in in-depth negotiations with artificial intelligence research company OpenAI on a new round of financing.

Further integration! Microsoft and OpenAI have in-depth negotiations on capital increase

06/08 1853

A true master will always have an apprentice's heart!

Internet of Things network middleware based on springboot

06/08 1484

At 8 pm on October 20, as JD.com launched its first round of pre-sales, this year's "Double 11" officially kicked off, and the first wave of express delivery peak is expected to arrive from November 1 to 3. Every year, "Double 11" is a big test for express delivery companies. Hu

Pre-sales are launched on "Double 11", and the first wave of express delivery peak is coming. Will the receipt be faster this year?

06/08 1174

Enterprise WeChat Live Class, Download and Playback of Courseware In order to do a good job in online education and teaching, I was stuck at the bottleneck of computer playback of "Enterprise WeChat". The inner depression is imaginable. After all, I am 56 years old and have regre

Courseware download and playback of corporate WeChat live classroom

06/08 1880

Text | Durian If you talk about cross-border e-commerce, many people may think of Amazon. If you talk about fast fashion clothing brands, many people may think of ZARA. However, there is a cross-border e-commerce that does women's clothing, which is rarely known to domestic peopl

A young man from Shandong created an overseas version of "ZARA", and now he has a net worth of over 10 billion yuan and becomes the richest man in Zibo

06/08 1667