FACT: 100% of people reading this will continue to read beyond this sentence.

Perhaps the above statement isn’t true, but how would you, the reader of this blog, be able to test its validity and reliability anyway?

Simply labeling something as a “fact” and citing a statistic can have an enormously persuasive influence on audiences; not even professionals, journalists or those with excellent statistical reasoning skills are immune to what are commonly referred to as statistical fallacies.

One of the biggest problems with statistics shared by news stories, blogs, podcasts and other outlets of information is that it’s not only difficult to determine how accurately the author interpreted the data but we also don’t know whether there were issues with the study’s methodology that might have produced misleading data.

After all, some scientists have admitted to falsifying or fabricating data, which may be due to the fact that researchers feel enormous pressure to produce statistically “significant” findings in order to receive grant funding for their work or get published (which is often a requirement for tenured researchers at academic institutions). Furthermore, the ​​Shorenstein Center on Media, Politics, and Public Policy at Harvard’s survey of 1,118 journalists in 2015 found that while 80% of respondents agreed that knowing how to interpret statistics from sources is important, just 25% of respondents said they felt “very” well-equipped to interpret data on their own.

So what does all of this mean?

In short, most of us are not well-equipped to interpret statistical information. No human being is completely free from cognitive biases, and the processes of motivated reasoning often lead us to quickly accept information that aligns with our preexisting beliefs while taking more time to scrutinize and criticize information that contradicts our beliefs.

None of us have the time or energy to double-check all statistics we encounter on a regular basis. However, for moments where those numbers really matter to you – such as different efficacy rates among Covid-19 vaccines or job salary ranges – there are some useful, time-saving strategies for evaluating the accuracy of statistical information.

For starters, go to the original source of the information to confirm that the author of whatever you’re reading or listening to is interpreting the data correctly. You don’t need to be a data scientist or mathematician to understand the basics of statistical findings. Some things to be on the lookout for in the original study include:

Who Supported the Research: Did this information come from a peer-reviewed academic journal funded by grants or was it produced and funded by a company with a financial conflict of interest? In other words, what is the purpose or incentive for the organization(s) involved to contribute to the study?

Let’s unpack this with an example.

PLOS Medicine* published an article in 2013 entitled, “Financial Conflicts of Interest and Reporting Bias Regarding the Association between Sugar-Sweetened Beverages and Weight Gain: A Systematic Review of Systematic Reviews.” The review of the research found that studies with financial conflicts of interest (funded by companies like Coca-Cola and PepsiCo) were 5 times more likely to report there was no significant link between the consumption of sugary beverages and weight gain or obesity, compared to studies with no conflicts of interest.

*To practice what I’m preaching here, I originally found the study cited in a New York Times piece but went to the original study to confirm that the NYT’s depiction of this study was accurate.

Who Were the Participants and How Many: In academic and scientific research, you can typically find information pertaining to the background and number of participants in the “Methods” or “Methodology” section of an article. Participants’ demographic information (e.g., gender, race, age, income level, geographic location) and the study’s sample size (number of participants surveyed/studied) can help you determine whether the researchers’ inferences are accurate.

One example of a study that produced misleading data is LendEDU’s survey of 1,217 college students (2017), which found that “nearly a third of Millennials have used Venmo to pay for drugs.” A major problem with this survey is that it did not clearly define its participants (there’s no universal definition of who a “Millennial” is and even if we were to define Millennials as born sometime between 1981 and 1996, this would fail to account for the fact that not all college students are in this age bracket). While the study claimed that its sample size of 1,217 survey respondents was representative of the population of college students in the U.S. (roughly 20.5 million at the time), the Pew Research Center says there are approximately 72.1 million Millennials.

So what are we supposed to believe: almost ⅓ of college students use Venmo to buy drugs or almost ⅓ of Millennials? The two terms are not synonymous and this goes to show why number of participants and how they’re defined are critically important issues for evaluating whether a study accurately portrays the attributes, attitudes and/or behaviors of a given group of people. Unfortunately, a Google search about this study reveals that dozens of journalists and bloggers hastily shared these findings without scrutinizing how the research was conducted in the first place. This is just one of many examples of how reporters lacking scientific backgrounds or statistical reasoning skills can (often inadvertently) spread misinformation to their audiences.

Additional Resources for Developing Your Knowledge of Data Journalism and Statistical Reasoning Skills:

  • This 10-minute video from Crash Course Statistics is one of the most beginner-friendly tutorials on the subjects of scientific journalism and how data might be misrepresented by news publications.
  • The Challenge of Developing Statistical Reasoning: This article was published in the Journal of Statistics Education (2002) and offers an eye-opening glimpse at the variety of correct and incorrect forms of statistical reasoning you’ve probably seen before.
  • Data Journalism, Impartiality and Statistical Claims: This BBC Trust-commissioned study was published in Journalism Practice (2017). While the researchers acknowledged that the “use of data is a potentially powerful democratic force in journalistic inquiry and storytelling, promoting the flow of information…enriching debates in the public sphere” (p. 1211), the study revealed politicians and business leaders in the UK often cited statistics in media, but few journalists or members of the public questioned or verified those claims.