Having enjoyed a long career in designing health research studies, and having closely observed colleagues and students put so much effort into accurately applying statistical terms, I see the need to clarify some concepts, especially for early career researchers (ECRs).
Let's get started with the basic and fun aspect of this topic.
I will begin with a scenario in the court of law, which has three key components: The Jury whose role is to study the evidence brought before them and make a decision if the defendant is guilty or not; the Prosecutor whose role is also to study and report any evidence that will prove that the crime has been committed and the Judge who oversees the proceedings of the court and, based on the evidence presented, decides if the defendant goes to jail or not. One word remains constant, “evidence”, which according to the Oxford dictionary is “the available body of facts or information indicating whether a belief or proposition is true or valid.” (Evidence, 2021). Keep the words “true” or “valid” in mind as we progress.
The term 'statistical significance' was first mentioned in the works of John Arbuthnot and Pierre-Simon Laplace as far back as the 1700s when they computed the p-value (see below) for the human sex ratio at birth. In their work, they first created a null hypothesis that says no difference exists in the number or proportion of male and female births (Brian & Jaisson, 2007). Three terms stand out here: null hypothesis, no difference and number/proportion.
Stay with me.
Let's take the null hypothesis or 'no difference' first (which, by the way, mean the same thing) and then the numbers or proportions will come later.
Every research idea is inspired by a problem. For example, the thought that poor knowledge of COVID-19 protective behaviours could lead to a high prevalence of contracting COVID-19. Now the research question: “Is poor knowledge of COVID-19 protective behaviours increasing the spread of COVID-19?” Based on the research question, we propose a hypothesis. The first is the null hypothesis, where you assume nothing is happening; no relationship exists. It is often represented with the symbol (H0) and is stated as “There is no relationship or difference between poor knowledge of COVID-19 protective behaviours and the prevalence/spread of COVID-19.” In other words, the null hypothesis assumes poor knowledge of COVID-19 does not contribute to the prevalence/spread of COVID-19. The second is the alternate hypothesis (H1) which states: “There is a relationship or difference between poor knowledge of COVID-19 protective behaviours and the prevalence/spread of COVID-19.” (Figure 1)
Figure 1: Forms of hypothesis testing
Accepting or rejecting the null hypothesis is based on a piece of evidence which statisticians call a p-value (or probability value). The p-value and it's set cut-off (5%, i.e., 1 in 20) are attributed to Ronald Fisher; the theory of hypothesis testing to Jerzy Neyman and Egon Pearson. In theory, a p-value ≤ 0.05 indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null (no difference) hypothesis is correct. In other words, the alternate hypothesis is true.
Practical demonstration of this concept
In Table 1, the outcome/dependent variable is in the column section: COVID-19 test result, while the explanatory/independent variable is in the row: knowledge of COVID-19 protective behaviours. A lower prevalence of COVID-19 cases was observed among those with good knowledge compared to those with poor knowledge (6.59% vs. 19.67%). While these figures show a 13.08% increase in COVID-19 prevalence for poor knowledge, this is not enough reason to justify the benefit of good knowledge. We need evidence (like in the court of law) and the p-value cut-off gives us that. Since our p-value is lower than the cut-off (0.05), we can boldly say good knowledge of COVID-19 protective practices is beneficial, and thus we reject our null hypothesis: There is no relationship/difference and we accept our alternate hypothesis (H1).
Let's see an alternate scenario in Table 2. We observed almost the same findings as in Table 1, but the evidence is above our cut-off point of 0.05 and so we accept our null hypothesis: no relationship/difference exists. I understand that some researchers may be tempted to say, “we observed a higher prevalence of COVID-19 among those with poor knowledge compared to those with good knowledge (18.03% vs. 7.69%), although our finding is not statistically significant at p=0.093.” I do not agree much with this. Once the evidence is not statistically significant, there is no point in making a postulation based on the proportional differences.
The evidence (p-value) guides us from being sentimental about our proportional differences, or mean differences, or median differences. No matter what we feel, the evidence says “no difference” and we have to accept this. Although, nowadays, with the extreme drive to have more publications for academic promotion, most researchers will simply change the values found in Table 2 to be more like what we have in Table 1 so the p-value will be <0.05 since there is a mindset that most publishers will most likely publish papers with research findings that have statistically significant p-values.
A researcher who is tempted to do this commits academic malpractice called: “Type I error”, (colloquially called p-hacking) which happens when a researcher or statistician rejects the null hypothesis (no significant difference) when the null hypothesis is indeed true (Figure 2).
And so we have to be guided…
Figure 2: Hypothesis errors
Do you have any questions or experiences to share? Post them in the comments below. Part 2 of this blog post will look at Tests of Significance (TOS) and how they differ from statistical significance; and when to apply a Fisher’s exact test (FET).
About the author: Felix Emeka Anyiam is a Research & Data Scientist, Centre for Health and Development, University of Port Harcourt, Port Harcourt, Nigeria. He is an accomplished Researcher, Analytics and Data Science professional with a demonstrated ability to develop and implement data-driven solutions in the Urban and Health sector; with over 10 years’ experience in teaching and leading Epidemiological, Biostatistical and Data Science projects, from descriptive to predictive analytics. He is also a trained Research Scientist, Scientific Writer and Epidemiologist in his day to day duties at the Centre for Health and Development (CHD), a Centre that evolved from several years of international research collaborations between the University of Port Harcourt and the Dalla Lana School of Public Health at the University of Toronto, Canada, and most recently, the University of Ottawa, Canada. CHD aims to develop human and organisational capacity for health-related research and quality health care provision in the Niger Delta region of Nigeria, built on sustainable local structure and international collaborations. He is one of the guest facilitators of AuthorAID online courses, a Biostatistician for the AuthorAID Online Journal Clubs pilot project, INASP, and also advises on a potential curriculum for a statistical/data analysis online course.
Brian, É., & Jaisson, M. (2007). The descent of human sex ratio at birth: A dialogue between mathematics, biology and sociology (Vol. 4). Springer Science & Business Media.
Evidence (2021). The Oxford Pocket Dictionary of Current English. 2009. Available at: https://www.lexico.com/definition/evidence Accessed 4th June 2021
Fisher, R. A. (1992). Statistical methods for research workers. In Breakthroughs in statistics (pp. 66-70). Springer, New York, NY.
Moosa, I. A. (2018). Publish or perish: Perceived benefits versus unintended consequences. Edward Elgar Publishing.
Neyman, J., & Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706), 289-337.