Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you

2025/04/1123:53:36 technology 1973

Fish and Sheep Alex from Aofei Temple

Quantum bits | Official account QbitAI

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works.

Which documentary do you think is this?

No, No, No! Every frame in the

video, is generated by AI.

or you tell it to "close up the brush on the canvas", and it can directly create the picture.

can not only make a brush out of nothing, but it is not impossible to drink water by pressing the horse's head.

is also a sentence like "Horse drinks water", this AI throws out the picture:

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

What a good guy, this is how you can really shoot videos in the future depends entirely on the rhythm of ...

is good, and the text to Image of AI painting is making it prosperous, and the researchers of Meta AI have also made a super evolution for the generation of AI.

This time, I can really "make videos with my mouth":

AI is named Make-A-Video, and it directly generates ascension dynamics from DALL·E and Stable Diffusion.

gives it a few words or lines of text to generate video images that do not exist in this world, and the style you master is still very diverse.

not only can the documentary style be held, but there is no problem with the full sci-fi effect. The two styles of

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

are mixed, and the picture of the robot dancing in Times Square does not seem to be inconsistent.

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

literary and fresh animation style, it seems that Make-A-Video has also grasped it. After such a wave of operation of

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

, many netizens were stunned, and even the comments were simplified to three letters:

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

, and the boss LeCun said meaningfully: What should come will always come.

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

After all, in a word, many industry insiders thought that "it's so fast." But Meta's move is indeed a bit quick:

is 9 months faster than I imagined.

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

even said: I can't adapt to the evolutionary speed of AI...

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

Text image generation model Super Evolution Edition

You may think Make-A-Video is a video version of DALL·E.

In fact, that's about it (manual dog head) .

mentioned earlier that Make-A-Video is a hyperevolution of the text image generation (T2I) model. That is because the first step in AI work is actually relying on text to generate images.

from the data point of view, it is the training data of static image generation models such as DALL·E, and it is paired text-image data.

. Although Make-A-Video eventually generates video, it does not specifically use paired text-video data training, but still relies on text-image to data to let AI learn to reproduce the picture based on text. Of course,

video data is also involved, but it mainly uses separate video clips to teach AI the real world of movement.

Specifically for the model architecture, Make-A-Video mainly consists of three parts:

  • text image generation model P
  • spatiotemporal convolution layer and attention layer
  • used to improve frame interpolation network and two super-segment networks used to improve image quality

The entire model work process is Jiang Aunt:

First, generate image embedding based on the input text.

Then, decoder Dt generates 16 frames 64×64 RGB images.

interpolation network ↑F interpolates the preliminary results to achieve an ideal frame rate.

Next, the first super-segment network will increase the resolution of the screen to 256×256. The second super-segment network continues to be optimized, further improving the image quality to 768×768.

Based on this principle, Make-A-Video can not only generate videos based on text, but also has the following capabilities.

converts a static image into a video:

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

generates a video based on the first and last two pictures:

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

generates a new video based on the original video:

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

refreshes the text video generation model SOTA

In fact, Meta's Make-A-Video is not the first attempt of text-generating video (T2V) .

For example, Tsinghua University and Zhiyuan launched their self-developed "one-sentence video generation" AI: CogVideo, and this is currently the only open source T2V model.

Earlier, GODIVA and Microsoft "Nuwa" also achieved the generation of videos based on text descriptions.

However, this time, Make-A-Video has significantly improved the production quality. The experimental results of

on the MSR-VTT dataset show that Make-A-Video significantly refreshed SOTA in both FID (13.17) and CLIPSIM (0.3049).

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

In addition, the Meta AI team also used Imagen's DrawBench to conduct human subjective assessments.

They invite testers to experience Make-A-Video for themselves and subjectively evaluate the logical correspondence between video and text.

results show that Make-A-Video is better than the other two methods in terms of quality and loyalty.

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

One More Thing

Interestingly, while Meta released its new AI, it also seemed to kick off the T2V model racing.

Stable Diffusion's parent company StabilityAI can't sit still. Founder and CEO Emad said:

We will release a better model than Make-A-Video, the kind that everyone can use!

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

And just a few days ago, a related paper also appeared on the ICLR website Phenaki. The effect of

generation is as follows:

Fish and Sheep Alex from Aofeisi Quantum Bits | Official Account QbitAI painter holds a pen and pokes dots on the canvas, forming the unique brushstrokes of hand-painted works. Which documentary do you think is this? No, No, No! Every frame in the video is generated by AI. Or you - DayDayNews

is right. Although Make-A-Video has not been published yet, Meta AI official also stated that it is preparing to launch a demo for everyone to experience it in practice. Interested friends can squat it~

Paper address:
https://makeavideo.studio/Make-A-Video.pdf
Reference link:
[1]https://ai.facebook.com/blog/generative-ai-text-to-video/
[2]https://twitter.com/boztank/status/ 1575541759009964032
[3]https://twitter.com/ylecun/status/1575497338252304384
[4]https://www.theverge.com/2022/9/29/23378210/meta-text-to-video-ai-generation-make-a-video-model-dall-e
[5]https://phenaki.video

— End —

Quantum bits QbitAI · Toutiao Sign

Follow us and learn about cutting-edge technology dynamics

technology Category Latest News