Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters and compared them in detail. Models with larger parameters had substantial improvements in functionality and output image quality.

2024/05/2421:56:33 technology 1105

Recently, Google introduced an autoregressive text-to-image generation model Parti (Pathways Autoregressive Text-to-Image model), which can achieve high-fidelity photo-level image output and support synthesis involving complex compositions and rich knowledge content.

For example, if you use text to describe "a raccoon wearing formal clothes, holding a cane and a garbage bag" and "a tiger wearing a train conductor's hat and holding a skateboard with a yin and yang symbol", you can generate images similar to the following respectively. .

(Source: Google)

In addition to lifelike details, Parti is also familiar with various styles and can generate Van Gogh , abstract Cubism , Egyptian tomb hieroglyphs, illustrations, statues, woodcuts, children's crayons based on descriptions Paint , Chinese ink painting and other styles of paintings.

On June 22, a related research paper was submitted on arXiv under the title "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation".

researchers stated in Google's official blog post: "Outputting images with Parti is a sequence-to-sequence modeling problem, similar to machine translation. Therefore, it can benefit from advances in large language models, especially unlocked by scaling data and model sizes. function. Furthermore, the target output is a sequence of image tokens instead of text tokens as in other languages, and the image tokenizer ViT-VQGAN is utilized to encode the image into a sequence of discrete tokens to reconstruct high-quality, style-diverse images ”

. It is worth mentioning that Imagen, another text-to-image generation model launched by Google more than a month ago, also performed very well on research benchmarks. Parti and Imagen are autoregressive models and diffusion models respectively. They are different but complementary and represent different exploration directions of Google.

Additionally, the researchers explore and highlight the limitations of the Parti model, giving key example focus areas for further improvement.

(Source: Google)

Then, they also trained four versions of Parti with 350 million, 750 million, 3 billion and 20 billion parameters, and compared them in detail. The model with larger parameters has better performance in functions and output images. There have been substantial improvements in quality. When comparing Parti with 3 billion and 20 billion parameters, it was found that the latter was better at abstract cues.

The following is the effect of four models on the image generation of "a green sign that says Very Deep Learning, located on the edge of the Grand Canyon, with floating white clouds in the sky."

(Source: Google)

Parti Recognizing long and complex cues requires that it accurately reflect world knowledge, adhere to specific image formats and styles, and compose numerous actors and objects with fine-grained details and interactions, resulting in high-quality output Image. However, the model has certain limitations that still allow it to generate some failure examples.

For example, generate an image according to the following text: "A portrait of a statue of Anubis wearing a yellow T-shirt with a space shuttle painted on it, and a white brick wall in the background." Output The image shows the shuttle on the wall, not the T-shirt, and the color bleeds a bit.

Figure | Fault image (Source: Google)

It is worth mentioning that this time the researchers also used a new test benchmark PartiPrompts (referred to as P2), which can measure the model's capabilities from various categories and challenges. .

Figure | PartiPrompts Benchmark (Source: arXiv)

Then, the researchers said that generating images from text is very interesting and allows us to create scenes that have never been seen before or even do not exist. But while this brings many benefits, it also comes with certain risks, with potential impacts on bias and safety, visual communication, disinformation, and creativity and art.

Additionally, some potential risks are related to how the model itself is developed, especially with the training data. Models like Parti are typically trained on noisy image-text datasets. These datasets are known to contain biases against people from different backgrounds, leading models such as Parti to produce stereotypes. For example, when applying the model to visual communication (such as helping social groups with low literacy rate to output pictures), additional risks and concerns will be brought about. The

text-to-image model creates many new possibilities for people. It essentially acts as a brush to create unique and beautiful images, which can help improve human creativity and productivity. But the range of the model's output depends on the training data, which can be biased toward Western images and further prevent the model from expressing entirely new artistic styles.

For the above reasons, the researchers will not release the code or data of the Parti model for public use for the time being without further protection measures. And added "Parti" watermark on all generated images.

Next, the research team will focus on further research on model bias measurement and mitigation strategies, such as cue filtering, output filtering and model recalibration.

They also believe there is promise in using text-to-image generative models to understand bias in large image-text datasets at scale, by explicitly detecting them for a known set of bias types and potentially revealing other forms of hidden bias. Additionally, the researchers plan to work with artists to adapt the capabilities of the high-performance text-to-image generation model to their work.

Finally, compared to DALL·E 2 released by OpenAI some time ago and Google's own Imagen (both of which are diffusion models), the researchers mentioned that Parti shows that the autoregressive model is powerful and generally applicable.

-End-

Reference:

https://parti.research.google/

https://arxiv.org/abs/2206.10789

technology

Although the Mate 40 series is a mobile phone launched by Huawei before the new year, the Mate 40 series still has good attention. Of course, it also supports 5G networks and is equipped with a Kirin 9000 processor. Now, a blogger's uncle who is watching the mountain revealed tha - DayDayNews

Although the Mate 40 series is a mobile phone launched by Huawei before the new year, the Mate 40 series still has good attention. Of course, it also supports 5G networks and is equipped with a Kirin 9000 processor. Now, a blogger's uncle who is watching the mountain revealed tha

Will the second-hand Huawei Mate 40 series be sold in Huawei Mall?

07/01 1437

The Double 11 that has been waiting for many days is here, and 88VIP users have many benefits. Come and see Tmall Double 11 is here. What makes everyone excited is that this year's Double 11 is very different from previous years. Not only are there many benefits, but there are al - DayDayNews

The Double 11 that has been waiting for many days is here, and 88VIP users have many benefits. Come and see Tmall Double 11 is here. What makes everyone excited is that this year's Double 11 is very different from previous years. Not only are there many benefits, but there are al

Double 11, which has been waiting for many days, is here, 88VIP users have many benefits, come and have a look

07/01 1702

Due to Huawei's suppression in the past two years and intelligently released 4G mobile phones, mobile phone sales have plummeted, and they have jumped out of the top five and entered the "other" series. However, innovation will eventually win the world. From the hot sales of Huaw - DayDayNews

Due to Huawei's suppression in the past two years and intelligently released 4G mobile phones, mobile phone sales have plummeted, and they have jumped out of the top five and entered the "other" series. However, innovation will eventually win the world. From the hot sales of Huaw

Even 4G mobile phones can win the first place in sales, which shows that Chinese people's recognition of Huawei is more than Apple.

07/01 1729

In 2011, Emmanuel Carpentier, a professor at the MaxPhotos Society in Germany, and Jennifer Dudner, a professor at the University of California, Berkeley, met at an academic conference and decided to jointly study the CRISPR/Cas9 technology. - DayDayNews

In 2011, Emmanuel Carpentier, a professor at the MaxPhotos Society in Germany, and Jennifer Dudner, a professor at the University of California, Berkeley, met at an academic conference and decided to jointly study the CRISPR/Cas9 technology.

Released next week! The theme summit of "Top Ten Breakthrough Technologies" is about to open grandly

07/01 1547

In August this year, Arm announced that it had not filed a lawsuit against mobile processor manufacturer Qualcomm and its subsidiary Nuvia, accusing the two companies of violating the license agreement signed with Arm and infringing Arm's patents. Recently, Qualcomm filed a count - DayDayNews

In August this year, Arm announced that it had not filed a lawsuit against mobile processor manufacturer Qualcomm and its subsidiary Nuvia, accusing the two companies of violating the license agreement signed with Arm and infringing Arm's patents. Recently, Qualcomm filed a count

After 2024, will Arm ban public CPUs from matching non-public GPUs/NPUs/ISPs?

07/01 1870

technology

Like a dark horse, the sub-brand of realme, realme, has emerged from the almost saturated domestic mobile phone market and directly snatched certain users from the two cost-effective brands of Redmi and Honor.

With three years of talent to avoid being stuck, 12G+256G has dropped to 2229 yuan, and the independent flagship is about to leave.

07/01 1673

If a phone is difficult to make you complain about anything, then such a phone is definitely successful. After you see such a phone, you really no longer want to consider other models. According to user feedback, the mobile phone that is difficult to complain about this year has - DayDayNews

If a phone is difficult to make you complain about anything, then such a phone is definitely successful. After you see such a phone, you really no longer want to consider other models. According to user feedback, the mobile phone that is difficult to complain about this year has

512G+5000mAh large battery + 100 million pixels, is it suitable to be as low as 2199?

07/01 1252

The arrival of Tmall Double Eleven not only means that young people can buy, but also allows the elderly to have their own old age life. The elderly also like to shop online and know that coupons for 50 off for every 300 yuan can make things much cheaper than usual. They often di - DayDayNews

The arrival of Tmall Double Eleven not only means that young people can buy, but also allows the elderly to have their own old age life. The elderly also like to shop online and know that coupons for 50 off for every 300 yuan can make things much cheaper than usual. They often di

Tmall Double 11 is here, the benefits of the elderly are here, Huang Ruo is here to help

07/01 1762

Tmall Double 11 is here, with constant benefits and surprises, making people feel happy and happy! This year's Tmall Double 11 kicked off at 8 pm on October 24, and it was already very exciting when pre-sales started. - DayDayNews

Tmall Double 11 is here, with constant benefits and surprises, making people feel happy and happy! This year's Tmall Double 11 kicked off at 8 pm on October 24, and it was already very exciting when pre-sales started.

Tmall Double 11 benefits are coming one after another, and surprises are endless. Have you felt it?

06/30 1659

[Mobile China News] Double 11 is coming soon, and laptops are about to usher in a peak in purchasing. When choosing laptops, in addition to paying attention to the conventional configurations such as chips, graphics cards, and screens, more and more people are beginning to pay at - DayDayNews

[Mobile China News] Double 11 is coming soon, and laptops are about to usher in a peak in purchasing. When choosing laptops, in addition to paying attention to the conventional configurations such as chips, graphics cards, and screens, more and more people are beginning to pay at

Huawei Double 11 benefits are coming! Huawei's only 2K touch screen all-around notebook at 4K price is worth knowing

06/30 1892