Mingmin from Aofeisi
Quantum bits | Official account QbitAI
DALL-E 2, which is popular all over the world with its superb painting level, has been questioned.
For example, the polysynonym of bat, it passed the exam.
a bat is flying over a baseball stadium (an bat /ball bat flies over the baseball field).
As a result, the pictures it draws, the bats and rackets are flying in the sky.
And this is not an accidental mistake. If you enter "a person is hearing a bat", the bat and bat are drawn.
to another case, enter a fish and a gold ingot.
is OK. You just cast both things into gold and turn them into real gold fish.
cannot underestimate these mistakes, because they mean that DALL-E 2 has a basic mapping relationship between symbols and entities in the language during the process of generating images from text.
means that a word corresponds to an entity.
Take bat as an example. Drawing a bat or a ball stick is considered to be DALL-E 2 correctly understood, but if both are given, there is a problem.
It’s like a single choice question itself. Fill in A or B is correct, but writing both of them violates the rules.
What's more, sometimes it mistakes the modifiers of different objects, "The solution to the previous question is used on the next one."
discovered this problem by scholars from Bayilan University and the Allen Institute of Artificial Research and Intelligence, and wrote a paper to analyze it specifically.
Interestingly, researcher Yoav Goldberg also mentioned that this situation is not common in mini DALL-E and Stable Diffusion.
I guess this may be due to the so-called inverse scaling phenomenon.
is simply understood as "the larger the model, the worse the performance." What exactly does the
paper say?
After discovering the problem, several scholars conducted repeated experiments and divided the problem into three situations:
- First, a word is interpreted as two different things
- Second, a word is interpreted as two different things
- Third, a word is interpreted as one thing while being interpreted as one thing, and a word is understood as another thing
The first two situations have been mentioned at the beginning.
For example, if you enter "one zebra and a street", there will always be zebra crossings in the output result.
Here, DALL-E 2 explained the zebra twice at the same time.
After repeated experiments for these cases, the author calculated that in the three cases, the probability of errors in DALL-E 2 is more than 80% .
The second case has the highest error rate, reaching 97.2%.
In the third case, if a new modifier is added to another noun, mistakes can be avoided.
means entering a zebra and a gravel road, and no zebra crossing appears on the road surface.
And these repeated explanations are not common when using DALL-E mini and Stable Diffusion.
The author explained that in the future, we can consider studying the text codec of the model to trace these problems, and we can study whether these problems are related to the model size and framework. Yoav Goldberg, one of the authors of
, is an outstanding professor at Bayilan University and director of research at the Israel Branch of the Allen Institute of Artificial Intelligence. Before
, he worked as a postdoctoral fellow at the Google Research Center in New York. The research interests are NLP and machine learning, especially in grammatical analysis.
also discovered the DALL-E 2 self-created language
. But just a few months ago, a doctoral fellow in computer science discovered that feeding DALL-E 2 some strange languages can also generate images of the same type.
and these words are from the DALL-E 2 generated image.
For example, after entering "Two farmers talking about vegetables, with subtitles", some "garbled" words appear in the image given by DALL-E 2.
And if the new word Vicootes in the image is thrown to the model as a description, unexpectedly, a bunch of images appear:
has radish , pumpkins, and small persimmons... Can "Vicoots" represent vegetables?
If you throw a string of "Apoploe vesrreaitiis" in the bubble above to DALL-E 2, a bunch of bird pictures appear:
"Can you say that this word represents 'bird', so farmers seem to be talking about the birds that affect their vegetables? "
At that time, after this doctoral fellow posted his discovery on the Internet, it immediately caused heated discussion.
Some people tried to analyze how DALL-E 2 encrypts the language, and some people thought it was just noise.
But in general, in terms of language understanding, DALL-E 2 You can always make something unexpected.
What do you think is the reason behind this?
Paper address:
https://arxiv.org/pdf/2210.10606.pdf
Reference link:
https://twitter.com/yoavgo/status/1583088957226881025
— End —
Quantum bit QbitAI · Toutiao account signing
Follow us and learn about cutting-edge technology dynamics