Hello, this is Yinao Cloud's scientific research circle, I am a Dahu classmate~ Today we continue our statistical self-examination series of "Journal of Psychology". Students who want to understand the content of the past are welcome to click the following link: Dry information |

Hello,

Here is Yi Nao Cloud Research Circle , I am Da Hu classmates

Today we continue our statistical self-test series of "Journal of Psychology ". Students who want to know the content of the past are welcome to click the following link:

Doubao | Answer the editor/reviewer’s question: The effect and confidence level

dry stuff | Answer the editor/reviewer’s question: the sample size

Not long ago, the editor heard such an interesting story:

In 2009, a UC neurologist Craig M. Bennett gave a report titled "Observation of the Atlantic Dead Salmon on Human Neurological Activities" at an international academic conference. In the

report, the neurologist and his research team showed some human photos to a dead fish, and then used a functional magnetic resonance imaging device to scan the dead fish's head. They found that the dead fish could actually "correctly" judge the human emotions in the photo.

Although it is just funny, this study reveals a profound truth that humans are easy to believe in. We will try to find the connection between all things in the world, even if we find only the wrong connection.

The research team wanted to satirize those pretentious studies: when brain scientists scan the brain, they divide the brain into thousands of extremely small areas. Even when scanning dead fish, there will be some random noise in each extremely small area on the dead fish's brain. Among these noises, there are likely some that look matches the emotional changes in the person in the photo. To put it bluntly, it is like seeing the clouds in the sky. A said it looks like a dog, and B said it looks like a cat.

Then how to avoid this problem? The answer can be explored from today's significance test, hypothesis test, null hypothesis significance test, and Bayesian factor .

1

null hypothesis significance test

stepped on the pit one: accept rejection, ambiguity

significance test was proposed by Ronald Fisher in 1925. In the significance test, the P value represents the degree to which the actual data is consistent with the null hypothesis. The smaller the P, the lower the possibility that the actual data is consistent with the null hypothesis, and the more likely it is to reject the null hypothesis.

However, there is no mention of alternative assumptions, nor does it involve "accepting" a certain assumption. For example, proof that the result does not reject the null hypothesis means that there is no evidence to prove that the null hypothesis is wrong, but it does not explain the correctness of null hypothesis .

2: Cut the hole in one stroke, kill innocent people

Based on this concept, Jerzy Neyman and Karl Pearson proposed " hypothesis test " (also referred to as N-P hypothesis test), and at the same time proposed the critical value of rejecting the null hypothesis, named it significance level (significance level), which is usually represented by α.

Neyman believes that the premise of considering the null hypothesis is to build a reasonable alternative hypothesis. The null hypothesis and the alternative hypothesis are treated differently, and the hypothesis that researchers often want to reject is used as the null hypothesis. A few years later, Neyman introduced confidence levels and confidence intervals as a measure of the probability of the null hypothesis not rejecting it under the premise that the null hypothesis is true.

In short, the assumption test idea is carried out under the condition of controlling a type of error, so setting the significance level is actually setting the probability of making a type of error. When controlling a type of error, the probability of making a type of error is as small as possible, that is, the statistical effect is as large as possible.There is a contradiction between the above two methods, that is, according to the significance test theory of Fisher, the value of 3P is almost the same when making a conclusion, but according to the N-P hypothesis test theory, the conclusion is the opposite.

Solution: Combining the two, playing to the strengths and avoiding the weaknesses

Many researchers are also unanimously committed to solving this problem. With their efforts, the Null Hypothesis Significance Testing (NHST) The pattern of null hypothesis significance test (NHST) is gradually formed, it is a mixed model:

The first step , according to the requirements of the actual problem, propose null hypothesis H0 and alternative hypothesis H1. For example: a1, a2, a3, a4, a5,…, an is a group of samples taken from normal population N(μ, σ), μ0 is the mean of the population, and μ is the mean of the sample, then there is the null hypothesis H0: μ=μ0, alternative hypothesis: μ≠μ0 (two-tailed).

Step 2 , select the appropriate statistics based on the overall distribution and whether variance is known. When the population variance σ2 is known, the 3Z statistic is selected; when the population variance σ2 is unknown, the statistic is selected.

Step 3 , given the significance level α, determine the corresponding critical value level. The significance level α represents the probability of rejecting the null hypothesis when H0 is true, that is, the risk faced by rejecting the null hypothesis. It indicates that when the null hypothesis is true, the probability of the test statistic falling within its rejection area is only α, and the probability of falling within its acceptance area is 1-α.

Step 4 . According to the hypothesis testing rules, the actual value of the test statistics is calculated from the sample data, and compared with the critical value obtained by looking up the table. Depending on whether the actual value falls into the acceptance area or the rejection area, a conclusion is made on whether the null hypothesis H0 is rejected.

is to more accurately reflect the risk of judgment, and in the fourth step, the P value is selected as the basis for whether to reject the null hypothesis decision.

The basic idea of ​​this pattern is: specify the significance level and test the efficacy in advance, and then calculate the P value. If the P value is less than the significance level specified in advance, the null hypothesis will be rejected.

Since then, null hypothesis and alternative hypothesis, select test statistics, select significance level, determine the rejection domain or calculate the P, and make statistical judgments, which have gradually become standardized hypothesis testing steps. NHST mode and P also gradually become common hypothesis testing standards for many professional journals.

2

Bayesian factor test

3 Tap three: over-dependence, publication bias

Although NHST is the most commonly used statistical inference method in social science research at present, researchers usually hope to obtain the results of P0.05 to prove the research theory, but this may lead to publication bias (publication) bias). That is to say, when the research results show that P0.05, the papers are usually published; on the contrary, papers that cannot reject the null hypothesis of 3P≥0.05 are usually not published.

Solution: Bayesian factor test comes to assist

In this way, readers may only see studies that obtain significant results. This screening mechanism will mislead readers' understanding of research problems. The core of the issue of publication bias lies in a pre-given level of significance.In this case, researchers suggest considering an alternative to NHST: Bayesian factor test .

Bayesian factor reflects the degree of support for the null hypothesis by sample information. The specific principles of Bayesian factor will not be explained in detail today, but will focus on the judgment that the assumptions corresponding to the Bayesian factor calculation results are valid.

Bayes factor is a Bayesian hypothesis test indicator, and it is also necessary to define the null hypothesis H0 and the alternative hypothesis H1. Bayesian factor (BF01) quantifies the extent to which data is more likely to be observed under hypothesis H0 than under hypothesis H1. In other words, BF01 measures the extent to which H0 is supported by data relative to H1. For example, BF01=5 means that the data supports H0 5 times that of H1.

Then how big or more hours is the Bayesian factor that will accept or reject the null hypothesis? Unlike NHST, Bayes factor is a continuous value, generally not a binary (reject or not reject) judgment, but quantifies the extent to which the assumption is supported by data.

If the Bayes factor is near 1, there is no preference for the null hypothesis or the alternative hypothesis, i.e. the Bayes factor cannot be decided, and more data is likely to be needed to prove which hypothesis is correct. In fact, the setting of thresholds is artificial, even subjective.

According to the actual size of Bayesian factor, researchers can make inferred conclusions that H0 is supported by data 3 times that of H1 is x. Without making binary judgments and abandoning the use threshold, Bayesian factor testing can avoid the problem of unrepeatable implementation of social science research to a certain extent.

When evaluating the null hypothesis and the alternative hypothesis using Bayesian factors, the status of the two hypotheses is equal, that is, it is not necessary to assume that the null hypothesis is true first, as traditionally. Under the framework of Bayesian factor, H0 and H1 are just two hypotheses that researchers are concerned about.

By combining observation data with prior information, Bayesian factors can obtain two relative evidence that the data is supported by the data. This means that Bayesian factors can determine that the alternative hypothesis is better than the null hypothesis, and also draw conclusions that the null hypothesis is better than the alternative hypothesis.

puts the null hypothesis and the alternative hypothesis in the same place. When the data supports the null hypothesis, it can also draw valid conclusions, making articles with "insignificant" analysis results also possible to be published, and also contribute to the reproducibility of the research.

summarized in one sentence: the P value is the probability that the current observation or more extreme observations appear under the condition that the null hypothesis is true. The Bayesian factor determines which model is relatively more reasonable under the current data conditions.

Currently, there are many software that can implement Bayesian factor calculations, including R packages, JASP, etc. These software can automatically set parameters. Researchers only need to enter data and specify the model used for data analysis (such as t test, ANOVA model) and concerned research hypotheses to obtain the Bayesian factor of the research hypotheses.

Although we do not recommend the use of Bayesian factor threshold for hypothesis testing, researchers may wish to get a clearer conclusion in a data analysis.

At the same time, although we believe that using Bayesian factor testing hypothesis can avoid publication bias and research non-repeatability to a certain extent, different parameter settings and software selection will still lead to different analysis results.

To further overcome these problems, preregistration research (preregistration) is a form recommended by current researchers. If the pre-registration report is accepted, the researcher will actually carry out data collection and analysis work and report the conclusions that no matter how large the Bayesian factor is obtained, the article will be published.

At present, the Center for Open Science has provided pre-registration for research and many pre-registration templates. There are also many important journals that encourage pre-registration research, such as Psychological Science, Journal of Psychology, etc.

Today's sharing ends here. Follow us to learn more about psychological statistics~

References

[1] Fisher, R. A. (1992). Statistical methods for research workers. In Breakthroughs in statistics (pp. 66-70). Springer, New York, NY.

[2] Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London, 231(767), 333-380.

[3] Cheng Kaiming & Li Si'e. (2019). P value in scientific research: misunderstanding, manipulation and improvement. Quantitative Economics and Technological Economics Research (07), 117-136. doi:10.13653/j.cnki.jqte.2019.07.007.

[4] Wang Chenxia. (2021). Replication research in quantitative research and Bayesian factor analysis method (Master's thesis, Harbin Institute of Technology.

https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFDTEMP&filename=1021901094.nh

[5] Hu Chuanpeng, Kong Xiangzhen, Eric-Jan Wagenmakers, Alexander Ly & Peng Kaiping. (2018). Bayesian factor and its implementation in JASP. Advances in Psychological Science (06), 951-965.

Author | Dahu Classmate

Typesetting | Uka

Proofreading | Sister Miaojun Kunkun