This article is an excerpt from the Shortform book guide to "Naked Statistics" by Charles Wheelan. Shortform has the world's best summaries and analyses of books you should be reading.
Like this article? Sign up for a free trial here .
Why is reliability important in data collection? What are the main challenges inherent in collecting reliable data?
As we use statistical data to inform our lives and society, we need them to be both accurate and precise. Therefore, collecting quality data is the true challenge and art of producing reliable, constructive statistics.
Keep reading to learn about the importance of reliability in data collection.
The Value of Reliable Data
The “math part” of statistics is the easy part since we do most statistical analyses on a computer, and the statistics formulas themselves are unchanging and easy to look up. Therefore, once we know enough about statistics to understand which formulas to use and what the resulting statistics mean, the calculations component is simply a matter of plugging data into our chosen equations.
Since statistics themselves are relatively “easy” to calculate, Wheelan explains that well-meaning people produce misleading statistics all the time. He notes that many of the statistics we encounter are mathematically precise (if you repeated your calculations you’d get the same result) but factually inaccurate (even though your numbers are “tight,” they’re wrong). In other words, the numbers hold up to scrutiny but they don’t accurately explain a situation.
For example, you could use statistics to present a compelling link between cold weather and an increase in cold and flu cases. But, if you were to publish your results “proving that the cold causes colds,” you’d be using precise figures to promote inaccurate conclusions because you haven’t even addressed the role of viruses.
Precise but inaccurate statistics happen when our calculations are correct, but the data that went into those calculations were inaccurate, incomplete, or not applicable to our research question.
Data Is a Big Business Data isn’t just the backbone of reliable research—it’s big business. Wheelan reminds us that in our technology-driven society, we, the technology users, are a constant source of data for companies like Facebook, which use the data we generate every day to increase their profits. We might not think of the data we create as individuals as having monetary value, but in 2019, Facebook made over $164 from each of its Canadian and American subscribers. This works out to roughly 10 cents per like! These numbers add up: In 2019 Facebook and Google earned $230 billion, mainly from running ads guided by user data. Wheelan explains that “big data” isn’t inherently good or bad. The availability of data today opens doors to research and insight that wouldn’t have been possible just a few years ago. But the practice of collecting users’ data online and in public spaces also opens up a host of ethical considerations about privacy and the appropriate use of that data. Therefore, Wheelan notes that we need to collectively consider the role we want data to play in running our society. |
Collecting Reliable Data
Obtaining reliable data in the complexity of the real world can be complicated, time-consuming, and expensive.
The challenge of reliability in data collection is present at every level of a research project, from the minuscule details of the study to the overall research question itself. For example, Wheelan explains that even the wording of a survey question can skew the results. In our earlier dog park example, for instance, we could phrase our question as “Do you support the construction of a dog park in town?” or “Do you support a tax increase to fund the construction of a dog park in town?” and get different survey results.
Timing is an additional challenge for medicine and social sciences research, as we’re often interested in outcomes that happen months, years, decades, or even generations after a “treatment” or event. For example, if we were interested in the impact of a mother’s diet during pregnancy on her child’s food allergies, we might have to wait years to collect our data.
Collecting enough data to obtain a reliable dataset can also be expensive. Researchers often have to track randomly selected people down or sort through mountains of literature to obtain the data they are looking for. Provided researchers are not working for free, a commitment to collecting reliable data can add up financially for those funding the research.
Paying for Research Participation As Wheelan discusses, collecting data on medical research questions can be particularly problematic from an ethical perspective. Despite criticism, paying participants to be part of medical research studies is a historic and common practice in the US. Walter Reed paid volunteers to allow themselves to be bitten by mosquitoes, and even offered an additional stipend to any volunteer who subsequently contracted yellow fever. Critics of financial incentives for research participation argue that paying people for participation can be seen as a form of coercion, and can lead people to accept risks that they wouldn’t otherwise find acceptable (particularly people who find themselves in financially vulnerable positions). Proponents of the practice argue that providing financial incentives may be the only way to get people to participate (especially healthy people) in potentially life-saving research studies. |
———End of Preview———
Like what you just read? Read the rest of the world's best book summary and analysis of Charles Wheelan's "Naked Statistics" at Shortform .
Here's what you'll find in our full Naked Statistics summary :
- An explanation and breakdown of statistics into digestible terms
- How statistics can inform collective decision-making
- Why learning statistics is an exercise in self-empowerment