Context matters: a conversation on misleading statistics

This post originally appeared on the Data Values Digest on November 7, 2022.

This week world leaders are in Egypt for COP27—the annual UN climate conference—attempting to make progress to stem rising global temperatures. This feels like an especially grim season: Amid increasing frequency of global events related to climate change, climate denialism remains strong as ever. Though the scientific community is united around the data on warming climate, the 2022 IPCC report pointed to misinformation and climate denialism as a key reason for the lack of political will to take action to respond to climate change.

Against the backdrop of COP27, the Data Value Digest’s editorial team spoke to Ken Springer, who writes a weekly newsletter called Statisfied? on the context and misconceptions around popular statistics, to ask why everyone, everywhere needs to have confidence to critically engage with and use data and how to promote engagement with and understanding of statistics.

[The Data Values Digest] The use of stats in the news and public discourse is pervasive, and you’ve written about everything from vaccine effectiveness to critical race theory and mass shootings. What do you think is typically missing from conversations that involve statistics or media coverage stats?

[Dr. Ken Springer] In a word, what's often missing is context. Statistics don't mean much in themselves.

Here's an anecdote that illustrates what I have in mind. In a recent press release, the CEO of a pharmaceutical company praised a new study for showing that psilocybin can help treat alcohol use disorder (AUD). The CEO concluded that "with a reported 83 percent average reduction in alcohol consumption among heavy drinkers, the results of [this study] point to the clear potential of psilocybin as a breakthrough in the way we treat AUD."

That 83 percent reduction in drinking looks impressive, but it's misleading without some additional context, such as the 51 percent reduction in drinking observed among the placebo group, or the fact that for both groups, most of the reduction in drinking occurred before they took psilocybin or a placebo because the researchers provided all participants with psychotherapy before administering the drugs. The actual benefits of psilocybin in this particular study were almost non-existent.

In short, further details from the study itself provide essential context for that 83 percent statistic. Meanwhile, discussions in news and social media often overlooked such details. Nor was attention consistently paid to the social context for the research – e.g., the enormous financial incentives if psilocybin receives the U.S. Food and Drug Administration’s approval for treating AUD.

What are some examples of news coverage of stats being done well and not so well?

[Springer] I think that some news organizations, like the New York Times, tend to be pretty reliable about providing context for the statistics they cite, and their writers (or the experts they select to interview) are good about viewing these statistics with a critical eye. For example, in covering the psilocybin study I described, the Times reporter commented on several methodological limitations. Along with premier news organizations, some outlets that focus on data (e.g., FiveThirtyEight) as well as those geared to specific niches (e.g., Emily Oster's ParentData newsletter) provide excellent, layperson-friendly treatments of statistics. There are lots of other examples.

At the same time, I find that many news organizations, blogs, newsletters, etc. fall short in their statistical coverage. I think a key issue is limited time or column space devoted to statistics, which might in turn reflect the assumption that the audience really doesn't want to get into the weeds. I agree, but only to a point. I think that a lot of people would actually enjoy the statistical weeds if they were presented in an interesting and personally relevant way.

What are the consequences of this or why does it matter?

[Springer] Great question! The whole purpose of citing statistics is to provide the audience with accurate, useful, perhaps empowering information. If the statistics aren't handled properly, people will come away uninformed or misinformed. For example, I’ve written about evidence that misleading reports on vaccine efficiency and safety statistics from Fox Broadcasting Corporation sources has led to higher COVID-19 mortality rates among Fox consumers, and may explain why Republicans have higher COVID-19 mortality rates overall than Democrats do. This is a tragic illustration of what can happen when statistics are misrepresented. You might call it an example of statistical misrepresentation influencing individual behavior.

There are also institutional-level examples. For instance, now that states like Oregon are beginning to legalize psilocybin for therapeutic use, I worry that voters and legislators haven’t fully grasped the methodological and statistical limitations in studies on its effectiveness. (I have nothing against people taking this drug, by the way. I just haven’t seen convincing evidence of its effectiveness as a mental health treatment.)

It strikes me that much of what your organization calls for in the #DataValues Manifesto would be supported by more careful treatment of statistical data by the media as well as those in positions of authority. People can be better informed about the design, collection, and use of data that affects them if they have a more solid grasp of what the data mean. As a former professor of education, I'm particularly concerned about how educational statistics (e.g., test scores) are used to classify, refer, track, and otherwise influence the education of students in ways that their parents don't fully understand. The problem isn't just that parents may not know their rights or understand what's being done with their students. It's also, in some cases, that they don't understand what the data mean. (This is assuming they see the data in the first place. Staff shortages and outmoded data management systems result in some information never reaching parents.)

All the same, I'm convinced that lay people can understand a lot about statistics, and that in some cases (e.g., educational statistics) they really want to.

When it comes to reporting on research or studies that involve stats, what kind of standards or guidance exist for journalists or news media?

[Springer] As far as I can tell the standards and guidance that news organizations provide to journalists don't typically include a specific focus on statistics (although large organizations employ in-house staff who can provide support). Rather, journalists are expected to work from general principles, such as the need for accuracy, or the need to consult with experts. This is useful guidance. Given the narrow, technical focus of research studies, the best advice you could give any journalist is to consult with an expert in the field. But which experts should the journalist consult? What questions should be posed to the expert? Which studies merit reaching out to experts in the first place? More data literacy, including a better understanding of statistics, would help journalists with these questions.

National Public Radio (NPR) is a bit of an exception to what I've written here, in that the Accuracy section of their Ethics Handbook contains several paragraphs of guidance on presenting data. It's not much, but it's a step in the right direction.

Likewise, the Associated Press (AP) Statement of News Values and Principles has three short paragraphs on reporting data. Although they're on the right track, what they provide is a list of points to consider rather than guidance on how to consider them. For example, AP writes, "We must distinguish carefully between correlations and causal relationships," and that's all they say about the topic. This is excellent advice, but not helpful unless you already know how to make that kind of distinction. (It's tricky enough for researchers to do so.)

Truly exceptional are the BBC Editorial Guidelines. The BBC modestly refers to their guidance on reporting statistics as a "guidance note," but it's actually an extensive, clearly written guide to everything from broad concepts (e.g., guidance on distinguishing correlation from causation) to relatively specific ones (e.g, the definition of margin of error).

Are there other fields or sectors where this type of guidance exists? If so, what are some ideas about how these principles could be applied to help people communicate about data?

[Springer] I think it would be great if there were a single, authoritative guide to statistics, or data literacy more broadly, that provided support to journalists, policymakers, medical professionals, educators, advocacy groups, and others who engage with statistical data – and who may need to explain that data to others – but who may lack sufficient training to understand the statistics.

Currently, I'm developing an educational website that includes, among other things, a prototype of the guide I just mentioned. Creating this guide is challenging. It's not always easy to translate statistical concepts into plain English. There are a staggering number of statistical paradigms and procedures in use, as well as many interpretive issues that require technical knowledge of study methods. The guide I'm envisioning needs to provide strategies for making sense out of statistical data in cases where the reader may not fully understand the details. For example, I think it would be immensely helpful for any journalist writing about research to understand statistical significance, effect sizes, and the difference between them. At the very least, fluency with these concepts would be useful when interviewing experts.

Who bears the responsibility, in your opinion, to check the validity of research and statistics? Whose responsibility is it to communicate well and clearly about data?

[Springer] Another great question!

Primary responsibility falls to the researchers themselves, and to the culture in which they operate (the peer review process, the standards of funding agencies, etc.). There are signs that this culture is changing in positive ways – more pre-registration of studies, more data sharing, more thoughtful alternatives to the traditional significance testing game, and so on. Still, misuse of statistics is endemic in research – both in a technical sense, and in the way study findings are sometimes used to support existing social inequities.

Those who serve as a bridge between the research and the rest of us also play a critical role. By the "rest of us" I mean practitioners, policymakers, and advocates, as well as the lay public. When people encounter a statistic or hear about new research findings, they don't necessarily have the time or the expertise to read the research, and some studies are behind paywalls. So anyone who summarizes research for others is taking on a very important responsibility.

What are some strategies that institutions, organizations, and individuals can use to action the advice that you’re giving above, including improving communication about data?

I'm a strong advocate for greater data literacy. I think we can achieve that by developing and disseminating rigorous standards for news organizations, journalists, and others who communicate about data, and by making improvements to primary and secondary (K-12) education. Again, I think of the #DataValues Manifesto, particularly the call to "democratize data skills for greater equality." Public education is arguably one of the best places to do this. Currently, data literacy is part of the public school curriculum in every U.S. state, but there's a lot of talk about teachers being underprepared, or otherwise unable to consistently teach data literacy at a level that would genuinely benefit students. Greater federal, state, and district-level support is clearly needed.

Dr. Ken Springer is professor emeritus of psychology and education from Southern Methodist University. His weekly newsletter on Substack is called Statisfied?