New tests show that AI still lacks common sense

Despite advances in natural language processing, the most advanced systems still produce sentences like "two dogs throwing frisbees at each other". Source: Adriana Sanchez.

Natural Language Processing (NLP) has made great strides recently, but how much does AI know about what it reads? According to researchers from the Department of Computer Science at the University of Southern California, it is less than we thought. In a recent paper, Associate Professor Ren Xiang and PhD student Lin Yuchen discovered that despite the progress made, artificial intelligence still does not have the common sense needed to produce specious sentences.

"The current machine text generation model can write an article that can convince many people, but they are basically imitating what they see in the training phase," Lin said, "Our goal in this article is to study the current state of the art Whether advanced text generation models can write sentences to describe the natural scenes in our daily lives."

understands the scenes in daily life

Specifically, Ren Helin tested the model’s reasoning ability, showing that the current text generation model is compatible with There is a big gap in human performance. Given a set of common nouns and verbs, the task of the state-of-the-art NLP computer model is to create credible sentences that describe everyday scenarios. Although the model generates grammatically correct sentences, they are often not logically coherent.

For example, the following is a sentence generated using modern models such as "dogs, frisbees, throwing, catching":

"Two dogs throwing frisbees at each other. The

test is based on the assumption that if you have a deeper understanding of common sense concepts, then Inability to generate coherent thoughts (in this case: "a person throws a frisbee, the dog catches it"). In other words, common sense is not just a correct understanding of language, it means you don’t have to explain everything in a conversation This is the fundamental challenge for the goal of developing general artificial intelligence, but in addition to the academic world, it is also relevant to consumers. If

does not understand language, chat robots and voice assistants built on these state-of-the-art natural language models are vulnerable The impact of failure. It is also vital that the robot becomes more effective in the human environment. After all, if you ask the robot for hot milk, you expect it to know that you want a glass of milk, not the entire carton.

" We also showed If the first-generation model performs better in our tests, it can also benefit other applications that require common sense reasoning, such as robot learning," Lin said. "Robots need to understand the natural scenes in our daily lives, and then make reasonable decisions. Actions to interact with people. "Z2z

common sense tests

common sense reasoning, or the ability to use basic knowledge of the world to make inferences—for example, dogs can’t throw frisbees at each other—that has resisted the efforts of artificial intelligence researchers for decades. The most advanced deep learning models can now reach 90% About the accuracy rate, NLP seems to be close to its goal. Z2z

Dan Ren, a natural language processing expert and his student Lin, need more convincing accuracy of this statistics. Published in Natural Language Processing Experience on November 16 In the paper at the Method (EMNLP) conference, they questioned the validity of the benchmark and therefore the actual level of progress in the field.

Examples of sentences generated by the most advanced text generation model. Source: From the paper: "Common Gene: Yes Restricted text generation challenges that generate common sense reasoning.

"Humans acquire the ability to write sentences by learning to understand and use common concepts they recognize in their surroundings," Lin said.

" Acquiring this ability is considered an important milestone in human development. However, we want to test whether the machine can really obtain this kind of common sense reasoning ability.

In order to evaluate different machine models, the combination developed a project called CommonGen Restricted text generation task, which can be used as a benchmark for testing machine generation common sense. The researchers proposed a data set of 35,141 concepts related to 77,449 sentences. They found that even the best performing model , Its accuracy rate is only 31.6%, while the human accuracy rate is 63.5%. "Z2z

" We are surprised to find that these models cannot recall simple common sense knowledge, that is,'human throwing a Frisbee' should be more reasonable than a dog throwing a Frisbee. "Lin said, "We found that even the strongest model, called T5, can still make stupid mistakes after training on a large data set. "Z2z

ResearcherSaid that the previous tests did not seem to sufficiently challenge the common sense capabilities of the models, but imitated what they saw during the training phase.

"Previous research mainly focused on discriminatory common sense," Ren said. The machines they tested had a variety of selection problems, among which the search space of the machine was small, usually four to five candidates.

For example, the typical setting of the discrimination test is a multiple choice question answering task, such as: "Where do adults use glue stick? A: Classroom B: Office C: Desk drawer. Of course, the answer here is "B: Office "Even a computer does not have to bother to find out. In contrast, the generation settings are more open, such as the CommonGen task, which requires a model to generate natural sentences from a given concept. Mr. Ren explained: "Through a wide range of models Training to have a good performance on these tasks is very easy. Unlike those common sense reasoning tasks that are discriminatory, our proposed test focuses on the generation of machine common sense.

Ren Helin hopes that this dataset will serve as a new benchmark to facilitate future research on the introduction of common sense into natural language generation. In fact, they even have a leaderboard depicting the scores obtained by various popular models to help other researchers determine their feasibility in future projects.

"Robots need to understand the natural scenes in our daily lives, and then take reasonable actions to interact with people," Lin said.

" By introducing common sense and other domain-specific knowledge to machines, I believe that one day, we can see artificial intelligence agents like Samantha react naturally in the movie "She" and interact with our lives .Z2z