Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality.

2024/05/2421:56:33 technology 1105

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

Recently, Google introduced an autoregressive text-to-image generation model Parti (Pathways Autoregressive Text-to-Image model), which can achieve high-fidelity photo-level image output and support synthesis involving complex compositions and rich knowledge content.

For example, if you use text to describe "a raccoon wearing formal clothes, holding a cane and a garbage bag" and "a tiger wearing a train conductor's hat and holding a skateboard with a yin and yang symbol", you can generate images similar to the following respectively. .

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

(Source: Google)

In addition to lifelike details, Parti is also familiar with various styles and can generate Van Gogh , abstract Cubism , Egyptian tomb hieroglyphs, illustrations, statues, woodcuts, children's crayons based on descriptions Paint , Chinese ink painting and other styles of paintings.

On June 22, a related research paper was submitted on arXiv under the title "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation".

researchers stated in Google's official blog post: "Outputting images with Parti is a sequence-to-sequence modeling problem, similar to machine translation. Therefore, it can benefit from advances in large language models, especially unlocked by scaling data and model sizes. function. Furthermore, the target output is a sequence of image tokens instead of text tokens as in other languages, and the image tokenizer ViT-VQGAN is utilized to encode the image into a sequence of discrete tokens to reconstruct high-quality, style-diverse images ”

. It is worth mentioning that Imagen, another text-to-image generation model launched by Google more than a month ago, also performed very well on research benchmarks. Parti and Imagen are autoregressive models and diffusion models respectively. They are different but complementary and represent different exploration directions of Google.

Additionally, the researchers explore and highlight the limitations of the Parti model, giving key example focus areas for further improvement.

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

(Source: Google)

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters, and compared them in detail. The model with larger parameters has better performance in functions and output images. There have been substantial improvements in quality. When comparing Parti with 3 billion and 20 billion parameters, it was found that the latter was better at abstract cues.

The following is the effect of four models on the image generation of "a green sign that says Very Deep Learning, located on the edge of the Grand Canyon, with floating white clouds in the sky."

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

(Source: Google)

Parti Recognizing long and complex cues requires that it accurately reflect world knowledge, adhere to specific image formats and styles, and compose numerous actors and objects with fine-grained details and interactions, resulting in high-quality output Image. However, the model has certain limitations that still allow it to generate some failure examples.

For example, generate an image according to the following text: "A portrait of a statue of Anubis wearing a yellow T-shirt with a space shuttle painted on it, and a white brick wall in the background." Output The image shows the shuttle on the wall, not the T-shirt, and the color bleeds a bit.

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

Figure | Fault image (Source: Google)

It is worth mentioning that this time the researchers also used a new test benchmark PartiPrompts (referred to as P2), which can measure the model's capabilities from various categories and challenges. .

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

Figure | PartiPrompts Benchmark (Source: arXiv)

Then, the researchers said that generating images from text is very interesting and allows us to create scenes that have never been seen before or even do not exist. But while this brings many benefits, it also comes with certain risks, with potential impacts on bias and safety, visual communication, disinformation, and creativity and art.

Additionally, some potential risks are related to how the model itself is developed, especially with the training data. Models like Parti are typically trained on noisy image-text datasets. These datasets are known to contain biases against people from different backgrounds, leading models such as Parti to produce stereotypes. For example, when applying the model to visual communication (such as helping social groups with low literacy rate to output pictures), additional risks and concerns will be brought about. The

text-to-image model creates many new possibilities for people. It essentially acts as a brush to create unique and beautiful images, which can help improve human creativity and productivity. But the range of the model's output depends on the training data, which can be biased toward Western images and further prevent the model from expressing entirely new artistic styles.

For the above reasons, the researchers will not release the code or data of the Parti model for public use for the time being without further protection measures. And added "Parti" watermark on all generated images.

Next, the research team will focus on further research on model bias measurement and mitigation strategies, such as cue filtering, output filtering and model recalibration.

They also believe there is promise in using text-to-image generative models to understand bias in large image-text datasets at scale, by explicitly detecting them for a known set of bias types and potentially revealing other forms of hidden bias. Additionally, the researchers plan to work with artists to adapt the capabilities of the high-performance text-to-image generation model to their work.

Finally, compared to DALL·E 2 released by OpenAI some time ago and Google's own Imagen (both of which are diffusion models), the researchers mentioned that Parti shows that the autoregressive model is powerful and generally applicable.

-End-

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

Reference:

https://parti.research.google/

https://arxiv.org/abs/2206.10789

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality. - DayDayNews

technology Category Latest News