Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium.

2025/06/1214:01:36 education 1711

Mingmin from Aofeisi

Quantum bits | Official account QbitAI

DALL-E 2, which is popular all over the world with its superb painting level, has been questioned.

For example, the polysynonym of bat, it passed the exam.

a bat is flying over a baseball stadium (an bat /ball bat flies over the baseball field).

As a result, the pictures it draws, the bats and rackets are flying in the sky.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

And this is not an accidental mistake. If you enter "a person is hearing a bat", the bat and bat are drawn.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

to another case, enter a fish and a gold ingot.

is OK. You just cast both things into gold and turn them into real gold fish.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

cannot underestimate these mistakes, because they mean that DALL-E 2 has a basic mapping relationship between symbols and entities in the language during the process of generating images from text.

means that a word corresponds to an entity.

Take bat as an example. Drawing a bat or a ball stick is considered to be DALL-E 2 correctly understood, but if both are given, there is a problem.

It’s like a single choice question itself. Fill in A or B is correct, but writing both of them violates the rules.

What's more, sometimes it mistakes the modifiers of different objects, "The solution to the previous question is used on the next one."

discovered this problem by scholars from Bayilan University and the Allen Institute of Artificial Research and Intelligence, and wrote a paper to analyze it specifically.

Interestingly, researcher Yoav Goldberg also mentioned that this situation is not common in mini DALL-E and Stable Diffusion.

I guess this may be due to the so-called inverse scaling phenomenon.

is simply understood as "the larger the model, the worse the performance." What exactly does the

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

paper say?

After discovering the problem, several scholars conducted repeated experiments and divided the problem into three situations:

  • First, a word is interpreted as two different things
  • Second, a word is interpreted as two different things
  • Third, a word is interpreted as one thing while being interpreted as one thing, and a word is understood as another thing

The first two situations have been mentioned at the beginning.

For example, if you enter "one zebra and a street", there will always be zebra crossings in the output result.

Here, DALL-E 2 explained the zebra twice at the same time.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

After repeated experiments for these cases, the author calculated that in the three cases, the probability of errors in DALL-E 2 is more than 80% .

The second case has the highest error rate, reaching 97.2%.

In the third case, if a new modifier is added to another noun, mistakes can be avoided.

means entering a zebra and a gravel road, and no zebra crossing appears on the road surface.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

And these repeated explanations are not common when using DALL-E mini and Stable Diffusion.

The author explained that in the future, we can consider studying the text codec of the model to trace these problems, and we can study whether these problems are related to the model size and framework. Yoav Goldberg, one of the authors of

, is an outstanding professor at Bayilan University and director of research at the Israel Branch of the Allen Institute of Artificial Intelligence. Before

, he worked as a postdoctoral fellow at the Google Research Center in New York. The research interests are NLP and machine learning, especially in grammatical analysis.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

also discovered the DALL-E 2 self-created language

. But just a few months ago, a doctoral fellow in computer science discovered that feeding DALL-E 2 some strange languages ​​can also generate images of the same type.

and these words are from the DALL-E 2 generated image.

For example, after entering "Two farmers talking about vegetables, with subtitles", some "garbled" words appear in the image given by DALL-E 2.

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

And if the new word Vicootes in the image is thrown to the model as a description, unexpectedly, a bunch of images appear:

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

has radish , pumpkins, and small persimmons... Can "Vicoots" represent vegetables?

If you throw a string of "Apoploe vesrreaitiis" in the bubble above to DALL-E 2, a bunch of bird pictures appear:

Mingmin from Aofeisi Quantum Bits | Official Account QbitAI's DALL-E 2, which is popular all over the world for its superb painting level, has been questioned. For example, the polysynonym of bat was passed. a bat is flying over a baseball stadium. - DayDayNews

"Can you say that this word represents 'bird', so farmers seem to be talking about the birds that affect their vegetables? "

At that time, after this doctoral fellow posted his discovery on the Internet, it immediately caused heated discussion.

Some people tried to analyze how DALL-E 2 encrypts the language, and some people thought it was just noise.

But in general, in terms of language understanding, DALL-E 2 You can always make something unexpected.

What do you think is the reason behind this?

Paper address:
https://arxiv.org/pdf/2210.10606.pdf

Reference link:
https://twitter.com/yoavgo/status/1583088957226881025

— End —

Quantum bit QbitAI · Toutiao account signing

Follow us and learn about cutting-edge technology dynamics

education Category Latest News