Linguistic AI has the ability of human self-examination: Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that it can not only judge whether its own answer is correct, but after training, it can also predict itself The probab

2024/06/2814:50:33 technology 1211

Wanbo from Ao Fei Si
qubit | Public account QbitAI

language AI, has the human self-examination ability:

Recently, an academic team from the University of California, Berkeley and Hopkins University has shown that:

it Not only can you judge whether your answer is correct, but after training, you can also predict the probability that you know the answer to a question. As soon as the

research results were released, they aroused heated discussions. Some people's first reaction was panic:

Some people also believe that this result has positive significance for neural network research:

Language AI has the ability to self-examine

The research team believes that, If you want the language AI model to self-evaluate, there must be a premise: when

the language AI answers a question, it will calibrate its own answer.

The calibration here is whether the correct probability of an answer predicted by the language AI is consistent with the actual probability of occurrence.

Only in this way can the language AI use this calibration ability to evaluate whether the answers it outputs are correct.

So the first question is, can language AI calibrate its own answers?

In order to prove this problem, the research team prepared 5 multiple-choice questions for AI:

The answer options are given in the form of A, B, and C.

If the correct rate of the AI model's answer exceeds chance, it proves that the answer given by the AI model is calibrated.

The result of the test is that the correct rate of the answers given by the language AI significantly exceeds the chance of chance for any option.

In other words, the language AI model can calibrate its own answers very well.

However, the research team found that the calibration ability of language AI is based on the premise that the answers to the options are clear.

If you add an uncertain option of "none of the above" to the options, it will damage the calibration ability of the language AI.

In other words, in the multiple choice questions of specific format , the language AI model can calibrate the answers very well.

After clarifying this premise, the next question is to verify whether the language AI model can determine whether its answer is correct.

In this round of testing, in order to make the predictions of the AI model closer to its own effective decision boundary. The

research team still selects the questions from the previous round of testing, as well as the answer samples of the language AI model.

At the same time, let the AI model choose whether its answer is true or false, and then analyze whether the AI model has made effective calibration based on the "true" or "false" answer. Examples of question settings for

are as follows:

After 20 true and false tests, the research team found that the language AI model’s evaluation of its answers as “true” or “false” has been significantly calibrated.

That is to say, if the AI model is asked several questions within a range, and then the AI model evaluates the authenticity of the answers to these questions, it has a reasonable and calibrated confidence level .

This also proves that the language AI model can indeed judge whether its claim on an issue is correct.

Finally, the research team asked a more difficult question about language AI models: whether AI models can be trained to predict whether they know the answer to any given question.

In this link, the research team introduced a data P (IK) (I know the probability of this answer) and selected one of the following two training methods for training:

Value Head (value-oriented) : Put P( IK) training becomes an additional value guide, and then added to the logarithm of the model (logarithm independent of language modeling), the advantage of this approach is that the research team can easily detect the general marker position of P(IK).
Natural Language (natural language) : This method is relatively simple, which requires the AI model to literally answer "What is the probability that you know this answer" and output a percentage data answer

In the early stages of training, the research team is more inclined. Natural language training method, but the results were not significant, so it turned to a value-oriented method. However, the research team also stated that the final training of the AI model will return to the natural language method.

After training, the research team found that the language AI model. P(IK) can be predicted very well, and this prediction ability is partially universal in different types of problems.

However, the research team also found that in certain types of problems, such as arithmetic problems, the language AI model is ineffective. There are some difficulties in OOD calibration.

Regarding this academic achievement, the research team stated that the future direction is to extend these results to the field of self-learning and factual reasoning without imitating human text.

Author introduction.

Corresponding author of the paper Dr. Jared Kaplan is a theoretical physicist and a machine learning expert. He is currently an assistant professor at Hopkins University. His main research field is machine learning research, including neural models. The scaling law and GPT-3 language model.

Co-corresponding author Saurav Kadavath, a researcher at Anthropic Company, is currently studying for a master's degree in and at the University of California, Berkeley. His main research areas are machine learning, large-scale language learning, etc.

Reference link :

https://arxiv.org/abs/2207.05221

—End—

Qubit QbitAI · Toutiao Signed

Follow us and learn about cutting-edge technology trends

technology

Less than a year this year, I proposed to increase the price of OEM fees twice, and I am very confident in my leading position in semiconductors. Now, I have changed my style and began to persuade employees to go home during holidays, spend more time with their families, and rech - DayDayNews

Less than a year this year, I proposed to increase the price of OEM fees twice, and I am very confident in my leading position in semiconductors. Now, I have changed my style and began to persuade employees to go home during holidays, spend more time with their families, and rech

Losing the mainland market, TSMC's style has changed, customers have run away one after another, and are forced to reduce US$4 billion

07/08 1790

The annual Double 11 is coming soon, and major e-commerce platforms have begun to announce promotional rules one after another. However, this year, many e-commerce platforms no longer focus on sales figures, but focus a large part of them on the level of greenness and environment - DayDayNews

The annual Double 11 is coming soon, and major e-commerce platforms have begun to announce promotional rules one after another. However, this year, many e-commerce platforms no longer focus on sales figures, but focus a large part of them on the level of greenness and environment

Double Eleven is here, and Riniu is expected to reduce carbon by more than 60,000 tons

07/08 1056

For us programmers: every year, it seems like we are urging death. The 35-year-old hurdle is a fate that every programmer cannot escape. Every year after passing, you are one step closer to this hurdle. - DayDayNews

For us programmers: every year, it seems like we are urging death. The 35-year-old hurdle is a fate that every programmer cannot escape. Every year after passing, you are one step closer to this hurdle.

Android development advancement: Android Framework principle and control

07/08 1779

Google Translation Center cannot disrupt industry rules Sarah Hickey published on October 12, 2022, October 11, Google launched an artificial intelligence cloud service called Translation Hub. The news caused a sensation in the language services industry and other areas. Nimdzi c - DayDayNews

Google Translation Center cannot disrupt industry rules Sarah Hickey published on October 12, 2022, October 11, Google launched an artificial intelligence cloud service called Translation Hub. The news caused a sensation in the language services industry and other areas. Nimdzi c

Industry Insights｜Nimdzi: Google Translation Center cannot subvert industry rules

07/08 1241

At the tense moment when all companies started to fight price wars on Double Eleven, Huawei once again did not play according to the routine and directly released four new products during Double Eleven, and each one was a benchmark in the industry. It seems that Yu Chengdong is c - DayDayNews

At the tense moment when all companies started to fight price wars on Double Eleven, Huawei once again did not play according to the routine and directly released four new products during Double Eleven, and each one was a benchmark in the industry. It seems that Yu Chengdong is c

Yu Chengdong is here to make trouble again? Huawei quietly released four new products, each of which is the industry's "dark horse"

07/08 1815

The Ultra version in Xiaomi's digital series is the flagship version of Xiaomi's mobile phone image. Especially this year's Xiaomi 12s Ultra has won Leica's true scriptures based on a 1-inch bottom because of the introduction of Leica, a strategic partner. This mobile phone has b - DayDayNews

The Ultra version in Xiaomi's digital series is the flagship version of Xiaomi's mobile phone image. Especially this year's Xiaomi 12s Ultra has won Leica's true scriptures based on a 1-inch bottom because of the introduction of Leica, a strategic partner. This mobile phone has b

Xiaomi 12s Ultra concept version officially unveiled: This time it is a real Leica, and it has a big Lei head

07/08 1326

As one of the most well-known companies in the global technology industry, Apple is at the top level in many fields, but now it has also been targeted, but this time it can only blame itself. In the eyes of many people, Apple has always been a leading company in the mobile phone - DayDayNews

As one of the most well-known companies in the global technology industry, Apple is at the top level in many fields, but now it has also been targeted, but this time it can only blame itself. In the eyes of many people, Apple has always been a leading company in the mobile phone

Apple encounters new troubles or is forced to open third-party app permissions, iPhone sales decline in 14 weeks

07/08 1275

Some industry observers pointed out, "This patent looks more like an in-car AI assistant. Although it is not a pain point innovation, it can also enrich the car usage ecosystem. It reminds people of the many small innovative functions that Meizu launched on smartphone systems bac - DayDayNews

Some industry observers pointed out, "This patent looks more like an in-car AI assistant. Although it is not a pain point innovation, it can also enrich the car usage ecosystem. It reminds people of the many small innovative functions that Meizu launched on smartphone systems bac

Meizu announced new patents that can help car owners "take care of their children"

07/08 1566

If you want to talk about the Android phone that has the best performance at present, Xiaomi 12S Ultra can definitely become a strong competitor. It is equipped with a 1-inch IMX989 ultra-large bottom, a high-quality ultra-wide-angle and 5x periscope telephoto lens, and has added - DayDayNews

If you want to talk about the Android phone that has the best performance at present, Xiaomi 12S Ultra can definitely become a strong competitor. It is equipped with a 1-inch IMX989 ultra-large bottom, a high-quality ultra-wide-angle and 5x periscope telephoto lens, and has added

300,000 units, limited to 10 units, Xiaomi 12S Ultra concept machine emerged

07/08 1117

China Xiaokang.com News Reporter Jin Chaohui Shanghai Report In the artificial intelligence technology industry community in the ancient capital of Luoyang City, a sign is hung on the wall of Shenlan Technology Luoyang Company - Industrial Intelligent Luoyang Center. The center, - DayDayNews

China Xiaokang.com News Reporter Jin Chaohui Shanghai Report In the artificial intelligence technology industry community in the ancient capital of Luoyang City, a sign is hung on the wall of Shenlan Technology Luoyang Company - Industrial Intelligent Luoyang Center. The center,

Shenlan Technology explores new model of artificial intelligence empowering traditional industries

07/08 1347