Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab

2024/06/2814:50:33 technology 1211

Wanbo from Ao Fei Si

qubit | Public account QbitAI

language AI, has the human self-examination ability:

Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that:

it Not only can you judge whether your answer is correct, but after training, you can also predict the probability that you know the answer to a question. As soon as the

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

research results were released, they aroused heated discussions. Some people's first reaction was panic:

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

Some people also believe that this result has positive significance for neural network research:

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

Language AI has the ability to self-examine

The research team believes that, If you want the language AI model to self-evaluate, there must be a premise: when

the language AI answers a question, it will calibrate its own answer.

The calibration here is whether the correct probability of an answer predicted by the language AI is consistent with the actual probability of occurrence.

Only in this way can the language AI use this calibration ability to evaluate whether the answers it outputs are correct.

So the first question is, can language AI calibrate its own answers?

In order to prove this problem, the research team prepared 5 multiple-choice questions for AI:

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

The answer options are given in the form of A, B, and C.

If the correct rate of the AI ​​model's answer exceeds chance, it proves that the answer given by the AI ​​model is calibrated.

The result of the test is that the correct rate of the answers given by the language AI significantly exceeds the chance of chance for any option.

In other words, the language AI model can calibrate its own answers very well.

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

However, the research team found that the calibration ability of language AI is based on the premise that the answers to the options are clear.

If you add an uncertain option of "none of the above" to the options, it will damage the calibration ability of the language AI.

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

In other words, in the multiple choice questions of specific format , the language AI model can calibrate the answers very well.

After clarifying this premise, the next question is to verify whether the language AI model can determine whether its answer is correct.

In this round of testing, in order to make the predictions of the AI ​​​​model closer to its own effective decision boundary. The

research team still selects the questions from the previous round of testing, as well as the answer samples of the language AI model.

At the same time, let the AI ​​model choose whether its answer is true or false, and then analyze whether the AI ​​model has made effective calibration based on the "true" or "false" answer. Examples of question settings for

are as follows:

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

After 20 true and false tests, the research team found that the language AI model’s evaluation of its answers as “true” or “false” has been significantly calibrated.

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

That is to say, if the AI ​​model is asked several questions within a range, and then the AI ​​model evaluates the authenticity of the answers to these questions, it has a reasonable and calibrated confidence level .

This also proves that the language AI model can indeed judge whether its claim on an issue is correct.

Finally, the research team asked a more difficult question about language AI models: whether AI models can be trained to predict whether they know the answer to any given question.

In this link, the research team introduced a data P (IK) (I know the probability of this answer) and selected one of the following two training methods for training:

  • Value Head (value-oriented) : Put P( IK) training becomes an additional value guide, and then added to the logarithm of the model (logarithm independent of language modeling), the advantage of this approach is that the research team can easily detect the general marker position of P(IK).
  • Natural Language (natural language) : This method is relatively simple, which requires the AI ​​model to literally answer "What is the probability that you know this answer" and output a percentage data answer

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

In the early stages of training, the research team is more inclined. Natural language training method, but the results were not significant, so it turned to a value-oriented method. However, the research team also stated that the final training of the AI ​​model will return to the natural language method.

After training, the research team found that the language AI model. P(IK) can be predicted very well, and this prediction ability is partially universal in different types of problems.

However, the research team also found that in certain types of problems, such as arithmetic problems, the language AI model is ineffective. There are some difficulties in OOD calibration.

Regarding this academic achievement, the research team stated that the future direction is to extend these results to the field of self-learning and factual reasoning without imitating human text.

Author introduction.

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

Corresponding author of the paper Dr. Jared Kaplan is a theoretical physicist and a machine learning expert. He is currently an assistant professor at Hopkins University. His main research field is machine learning research, including neural models. The scaling law and GPT-3 language model.

Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab - DayDayNews

Co-corresponding author Saurav Kadavath, a researcher at Anthropic Company, is currently studying for a master's degree in and at the University of California, Berkeley. His main research areas are machine learning, large-scale language learning, etc.

Reference link :

https://arxiv.org/abs/2207.05221

—End—

Qubit QbitAI · Toutiao Signed

Follow us and learn about cutting-edge technology trends

technology Category Latest News