The Importance of Reliability in Data Collection

This article is an excerpt from the Shortform book guide to "Naked Statistics" by Charles Wheelan. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here .

Why is reliability important in data collection? What are the main challenges inherent in collecting reliable data?

As we use statistical data to inform our lives and society, we need them to be both accurate and precise. Therefore, collecting quality data is the true challenge and art of producing reliable, constructive statistics.

Keep reading to learn about the importance of reliability in data collection.

The Value of Reliable Data

The “math part” of statistics is the easy part since we do most statistical analyses on a computer, and the statistics formulas themselves are unchanging and easy to look up. Therefore, once we know enough about statistics to understand which formulas to use and what the resulting statistics mean, the calculations component is simply a matter of plugging data into our chosen equations.

Since statistics themselves are relatively “easy” to calculate, Wheelan explains that well-meaning people produce misleading statistics all the time. He notes that many of the statistics we encounter are mathematically precise (if you repeated your calculations you’d get the same result) but factually inaccurate (even though your numbers are “tight,” they’re wrong). In other words, the numbers hold up to scrutiny but they don’t accurately explain a situation.

For example, you could use statistics to present a compelling link between cold weather and an increase in cold and flu cases. But, if you were to publish your results “proving that the cold causes colds,” you’d be using precise figures to promote inaccurate conclusions because you haven’t even addressed the role of viruses.

Precise but inaccurate statistics happen when our calculations are correct, but the data that went into those calculations were inaccurate, incomplete, or not applicable to our research question.

Data Is a Big Business

Data isn’t just the backbone of reliable research—it’s big business. Wheelan reminds us that in our technology-driven society, we, the technology users, are a constant source of data for companies like Facebook, which use the data we generate every day to increase their profits.

We might not think of the data we create as individuals as having monetary value, but in 2019, Facebook made over $164 from each of its Canadian and American subscribers. This works out to roughly 10 cents per like! These numbers add up: In 2019 Facebook and Google earned $230 billion, mainly from running ads guided by user data.

Wheelan explains that “big data” isn’t inherently good or bad. The availability of data today opens doors to research and insight that wouldn’t have been possible just a few years ago. But the practice of collecting users’ data online and in public spaces also opens up a host of ethical considerations about privacy and the appropriate use of that data. Therefore, Wheelan notes that we need to collectively consider the role we want data to play in running our society.

The Importance of Reliability in Data Collection