Pine from Aofeisi
Quantum bits | Official account QbitAI
Model trained with 2D data can also generate 3D images. Entering a simple text prompt in
can generate a 3D model. What is the technique of this "AI painter"?
directly see the effect.
The 3D model it generates also has density and color.
html
and can be rendered under different lighting conditions.
Not only that, it can even merge the generated multiple 3D models into one scene.
More importantly, the generated 3D model can also be exported to the grid and further processed using modeling software.
This is simply a high-end version of NeRF, and this AI painter, named DreamFusion, is the latest achievement of Google Research. Does the name
DreamFusion sound a bit familiar?
is right, DreamFields! Not long ago, another Chinese guy opened up an AI painting program based on this model.
This time DreamFusion evolved based on DreamFields.
What changes have been made from DreamFields to DreamFusion, which have made DreamFusion such a huge leap?
diffusion model is the key
In a word, the biggest difference between DreamFusion and DreamFields is the different methods of to calculate the loss .
In the latest DreamFusion, it uses a new loss calculation method instead of CLIP: the loss is calculated through the text-to-image Imagen diffusion model . Everyone should be familiar with the
diffusion model this year. DreamFusion is driven by a diffusion model of billions of image-text pairs, which is equivalent to a NeRF optimized by the diffusion model. It is difficult not to be powerful.
However, the diffusion model needs to be used directly for 3D synthesis to require large-scale labeled 3D data sets and effective 3D data denoising architectures, but there is no two at present, so we can only find another way out.
So in this work, the researchers cleverly avoided these limitations and used a pre-trained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Specifically, it is to use the Imagen diffusion model to calculate the loss in the process of generating 3D images and optimize the 3D model. So how is the loss calculated?
There is a very critical link in this. The researchers introduced a new image sampling method: score distillation sampling (SDS) , It samples in parameter space rather than pixel space .
Due to the limitation of parameters, this method can well control the quality of the generated image toward (right of the figure below) .
Here, the score distillation sampling is used to represent the loss in the generation process. By continuously optimizing and minimizing this loss, a good quality 3D model is output.
It is worth mentioning that in the process of generating images, the parameters in DreamFusion will be optimized and become a training sample of the diffusion model. The parameters after the diffusion model are trained with multi-scale characteristics, which is more conducive to subsequent image generation.
In addition to this, the diffusion model brings another important point: does not require backpropagation of , because the diffusion model can directly predict the direction of updates.
netizens discussed the research results of
, which really shocked netizens. Meta just released text-video, and Google released text-3D model on Google .
(or use the 2D diffusion model to output 3D images)
even asked: When will the next version of high-resolution 3D results come out? Two years? A piece of paper in
commented directly below with a joking comment:
for two weeks?
Of course, this AI technology achievement inevitably arouses the cliché topic - will it replace humans?
However, most people still have a very optimistic attitude:
As a 3D modeler/designer, the potential of (AI) to be used as model design assistance in the future is also incredible.
(small easter egg) Some netizens dug up some interesting failure cases of DreamFusion:
For example, the generated squirrel has an eye behind its hoodie (which is also scary) .
team introduced that
research team is from Google Research, Ben Poole, Jon Barron and Ben Mildenhall, and a doctoral student from the University of California, Berkeley.
Google Research is a department that conducts various state-of-the-art technology research within Google. They also have their own open source projects, which are publicly available in GitHub.
Their slogan is: Our team is eager to make discoveries that affect everyone, and the core of our approach is to share our research and tools to drive progress in the field.
一个 Ben Poole is a PhD in Neurology at Stanford University and a researcher at Google Brain . His current research focuses on using generative models to improve algorithms for unsupervised and semi-supervised learning.
Reference link:
[1]https://dreamfusion3d.github.io/index.html
[2]https://twitter.com/poolio/status/1575618598805983234
— End —
Quantum QbitAI · Toutiao Sign
Follow us and learn about cutting-edge technology dynamics