PDF Summary:Naked Statistics, by Charles Wheelan
Book Summary: Learn the key points in minutes.
Below is a preview of the Shortform book summary of Naked Statistics by Charles Wheelan. Read the full comprehensive summary at Shortform.
1-Page PDF Summary of Naked Statistics
Statistics help us use data to make sense of the world, and statistical insights help guide modern society, informing medical practices, public and fiscal policy, education initiatives, business and marketing decisions, and so on. But statistics aren’t just for “experts.” Thanks to Charles Wheelan, statistics don’t need to be intimidating. Naked Statistics puts the math behind statistics into digestible terms and explains statistics concepts with relatable, relevant, and even humorous examples. Readers also benefit from additional socio-political insight from the book, as Wheelan uses real-world anecdotes to explore how statistics can inform collective decision-making.
As Wheelan does in Naked Statistics, this guide focuses on the meaning and context of commonly used statistics rather than on their calculations. After explaining the insight each statistics concept provides and how to interpret it, this guide uses real and fictional examples for added context.
(continued)...
Vigen’s work has been highlighted by the Harvard Business Review and received positive reviews from the Boston Globe and Washington Post, among others, in part because these ridiculous examples highlight the fact that equating causation with correlation is incorrect, no matter how close the relationship.
Another statistics technique, regression analysis, goes beyond describing the relationship between two variables and allows us to make mathematical predictions based on those relationships. For example, the nursery owner above could generate an equation with regression analysis to predict how many flowers her plants would have based on the amount of sunlight she gave them.
Regression Analysis for Smoking and Lung Cancer
As Wheelan explains, regression analysis is a staple in medical and social sciences research. A study in the National Library of Medicine used regression analysis to calculate that for every 1% more collective smoking American adults do, lung cancer rates rise by 164 cases per 1,000 citizens.
Statistics Help Answer Complicated Questions
Probability is one way statistics can help us make more informed decisions. It allows us to manage uncertainty, calculate risks, and put possible outcomes in perspective. Wheelan explains that understanding probability can be especially relevant to our daily lives because we make decisions based on our perception of probability all the time. However, our perception of likely outcomes is often mathematically irrational. For example, the probability of getting in a car accident while driving to a beach is far higher than the probability of being attacked by a shark there, but we often—irrationally—fear the shark risk more.
Probability Isn’t Intuitive
There are several reasons for our mathematically irrational perception of probability, including:
Confirmation Bias: When we focus on what we expect and ignore the rest. Using our shark attack example above, we might justify our fear of swimming at the beach with a statement like, “well, that one guy was bitten by a shark at Cape Cod last year!” ignoring the tens of thousands of swimmers who weren’t attacked.
Anecdotal Logic: Improbable events are statistically bound to happen, and people notice and talk about them when they do. These stories of improbable occurrences stick in our minds and shape our perception of what is likely. For example, say you have a friend diagnosed with an exceedingly rare cancer. Even though your friend’s diagnosis is an anomaly, the rare form of cancer suddenly feels more prevalent.
Short-Term Thinking: Humans are evolutionally hardwired to think in the short and middle-term, which can make us feel like we're witnessing statistically improbable events when we're not (for example, witnessing a 100-year flood) and make us unable to process long-term data (for example, focusing on the cold snap over the last several days and ignoring climate change).
Our brains’ tendency to misunderstand probability makes it a useful subject to study if we want to use statistics to make more informed decisions.
People often use probability to assess risk when making financial decisions. Wheelan explains that a statistic called the “expected value” can help us determine whether we want to take a financial risk when we know the probability of each possible outcome and its respective payoff. Real-estate developers, for instance, can use this tool to make sure that their multiple investments are likely to make money as a whole. Even if one property loses money or underperforms in a given year, as long as the expected value of their portfolio is profitable overall, they are likely to make money.
Probability and Purchasing Stock
As Wheelan explains, probability is an effective tool for managing risk. Unfortunately, many of us underutilize probability when investing in the stock market. Research shows that we tend to overestimate the probability of rare events and our ability to foresee them. For example, people will often invest in a single stock that they think will be the next Apple instead of spreading their investment across a diverse portfolio. Therefore, people tend to under-diversify their stock, costing them an average of $2,500 per year.
Using statistics like the expected value is likely a good idea before investing in the stock market as it can temper our “gut feeling” about a stock with math and help us make smarter investments.
In addition to helping us make more informed decisions, statistics can offer insight into questions we couldn’t possibly design an experiment to answer. For example, say we wanted to know whether exposure to a certain chemical (we’ll call it chemical X) corresponds to higher rates of cancer. Ethics precludes purposefully exposing people to chemical X in a laboratory setting for the sake of science. Additionally, so many other variables impact a person’s personal cancer risk that we can’t possibly know if chemical X was the sole cause of anyone’s cancer diagnosis. Without statistics, complex but important questions like this would remain unanswered.
To answer the question of whether chemical X is associated with higher rates of cancer, researchers could collect a large dataset including people who were and were not exposed to chemical X and record their rates of cancer diagnoses. Then the researchers could use regression analysis to determine the association between exposure to chemical X and a cancer diagnosis, independent of other factors such as smoking, exercise, family history, and so on. Statistics can even tell us what percent of a person’s overall cancer risk is mathematically associated with exposure to chemical X rather than other factors.
As Wheelan explains, the ability to mathematically separate individual variables (like exposure to a particular chemical) in the complexity of the real world makes statistical analysis an invaluable part of medical and social sciences research.
Using Statistics to Assess Whether Money Can Buy Happiness
Researchers have even used statistics to try to answer the age-old question of whether money can buy happiness. In a 2010 study from Princeton University, researchers gathered data from 450,000 responses to a Gallup survey about day-to-day emotions and overall life evaluation (how people rate their life in the “big picture”). Next, researchers used multivariate regression analysis to analyze whether a high income is correlated with increased emotional well-being and better life evaluation.
The results of the study suggest that money can buy happiness to a point. Income was positively associated with life evaluation in general (people had a more favorable opinion of their lives overall if they made more money) and associated with emotional well-being up to an income of $75,000. Beyond $75,000, income no longer predicted emotional well-being.
Learning Statistics Is Empowering
Learning statistics is an exercise in self-empowerment. Wheelan explains that thanks to modern society’s affinity for and reliance on technology, we're constantly surrounded and impacted by data. This abundance of data is a blessing in that it gives researchers a chance to study society’s most pressing issues, for example, using student outcomes to highlight racial and social inequities in our education system. But, the amount of data we're bombarded with every day through targeted marketing, political campaigns, and social media can also be a challenge when we don’t know how to gauge its reliability. Studying statistics can give us a better sense of how much trust we should put in different sources of information and can help us interpret published statistics correctly.
Data Literacy
Learning statistics to be an informed citizen is part of a larger skill set called “data literacy.” Data literacy refers to the ability to analyze and interpret data correctly. Just as a literate person can understand a story by reading the words on a page, a data-literate person can look at a statistic, chart, graph, and so on and correctly interpret its “story.”
Data literacy is a critical yet neglected skill. Poor data literacy skills hamper our individual and societal ability to make informed decisions. For example, Wheelan cites mainstream confusion about the difference between correlation and causation, combined with a lack of awareness of modern vaccine-safety research, as the cause of the anti-vaccination movement.
The gap between data literacy skills and data literacy needs in the modern workplace is costly. Many jobs require working with and making decisions using data, yet many employees lack the skills to do this effectively. Estimates show that over $109 billion are lost to the US economy every year due to underdeveloped data literacy skills in the workforce. In response, most corporations are adopting data literacy as a critical skill.
Studying statistics also makes us less susceptible to being purposefully misled. Unfortunately, Wheelan explains that the purposeful misuse of statistics is more common than we may think. While the statistics values themselves can't lie, the statistical tests that people choose to use, the data they choose to calculate statistics with, and the choice to include or not include specific statistics from datasets can construct various versions of “the truth.” For example, consider the following statements based on the same hypothetical dataset:
- Vote for Mark Smith! His tax cuts have saved the people in this town an average of $1,000 per year!
- Don’t vote for Mark Smith! His “tax cuts” have saved the wealthiest 1% of town residents tons of money and have saved low-income residents almost nothing!
Neither of these statements is a lie. Instead, different uses of data and statistics construct versions of the truth that best suit differing perspectives. While we can't expect ourselves to dive into the underlying data for every statistic we read or hear, Wheelan explains that we can better spot incomplete or misleading information with a basic understanding of statistics.
Dishonest Statistics
In his 1954 book How to Lie With Statistics (republished in 1993), Darell Huff explores several ways that statistics are used to deliberately mislead an audience. His examples include using small sample sizes to inflate results, taking biased samples, and omitting values that are critical for context.
As an example of the latter, take the following hypothetical marketing for a weight loss supplement:
Headline: “Supplement Users Lost Twice as Much Weight During Their First Month as Those Taking a Placebo!”
This sounds appealing and might tempt many people to spend big on the supplements. However, it gives no context given for what “twice as much” means because no actual weight loss figures are included. Perhaps those on the supplement lost just one pound, while those on the placebo lost just half a pound. While it's true that one pound is twice as much as half a pound, the actual figures are far less impressive than the report makes them appear.
Huff cautions that dishonest or incomplete statistics combined with a data-illiterate audience render many published statistics meaningless at best and harmful at worst.
Want to learn the rest of Naked Statistics in 21 minutes?
Unlock the full book summary of Naked Statistics by signing up for Shortform.
Shortform summaries help you learn 10x faster by:
- Being 100% comprehensive: you learn the most important points in the book
- Cutting out the fluff: you don't spend your time wondering what the author's point is.
- Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
Here's a preview of the rest of Shortform's Naked Statistics PDF summary:
PDF Summary Shortform Introduction
...
The Book’s Publication
Naked Statistics was published by W. W. Norton & Company in 2013, 11 years after Naked Economics and shortly after Wheelan joined the faculty of Dartmouth College.
The Book’s Context
Wheelan’s discussion of the dangers of misused and misunderstood statistics follows other books on the same topic, including Joel Best’s 2001 book Damned Lies and Statistics and Darell Huff’s How to Lie With Statistics (first published in 1954). Following Naked Statistics in 2013, Daniel J. Levitin published A Field Guide to Lies in 2019, which also teaches readers how to spot misleading statistics and became an international bestseller.
The Book’s Strengths and Weaknesses
Commentary on the Book’s...
PDF Summary Why Learn Statistics?
...
Data literacy is a critical yet neglected skill, and the gap between data literacy skills and data literacy needs in the modern workplace is costly. Many jobs require working with and making decisions using data, yet many employees lack the skills to do this effectively. Estimates show that over $109 billion are lost to the US economy every year due to underdeveloped data literacy skills in the workforce. In response, most corporations are adopting data literacy as a critical skill.
Dishonesty Disguised With Statistics
There is also a “dark side” of statistics that we become vulnerable to if we don't educate ourselves about basic statistics concepts. Wheelan explains that the purposeful misuse of statistics is more common than we think.
Even though statistics are math-based, they’re not always objective, and we should interpret them with critical thinking and, in some cases, skepticism. While the values themselves can't lie, **the statistical tests that people choose to use, the data they choose to calculate statistics with, and the choice to include or not...
PDF Summary Using Descriptive Statistics to Describe Measures of Central Tendency
...
Diversity advocates stress the need to consider the story and nuance of the data used to build new tools and ensure that it represents all populations the algorithms are meant to serve. If the data we put into computer-based algorithms is biased, or misrepresents or under-represents certain populations, its results will be similarly flawed.
Using biased statistics to make medicine or social sciences decisions can exacerbate existing inequities because they will best serve the populations for which we've collected the most data (which often ends up being white populations). For medical conditions like diabetes, which are already more prevalent in minority populations, algorithms based on white patients are especially problematic because they will improve medical care for white patients while doing little for the populations they could benefit most.
Next, we'll look at some of the...
What Our Readers Say
This is the best summary of Naked Statistics I've ever read. I learned all the main points in just 20 minutes.
Learn more about our summaries →PDF Summary Using Descriptive Statistics to Summarize Data Distribution
...
Data that are skewed to the right have a longer “tail” to the right of the peak of the curve. In other words, there are a few values in the dataset that are much larger than the others. These larger values make the mean larger than the median. This is also called a positive skew because it skews the mean in the positive direction. An example of positive skew could be a few outstanding students inflating the class average on a difficult test.
Data that is skewed to the left has a longer “tail” to the left of the peak of the bell curve. In other words, there are a few values in the dataset that are much smaller than the others. These smaller values make the mean smaller than the median. This is also called a negative skew because it skews the mean in the negative direction. An example of negative skew could be a few students who skipped class, bringing down the class average of an easy test.
Standard Deviation
**We can describe how...
PDF Summary Using Probability to Make Decisions
...
- Short-Term Thinking: Humans are evolutionally hardwired to think in the short- and middle-term, which can make us feel like we're witnessing statistically improbable events when we're not (for example, witnessing a 100-year flood) and make us unable to process long-term data (for example, focusing on the cold snap over the last several days and ignoring climate change).
Our brains’ tendency to misunderstand probability makes it a useful subject to study if we want to use statistics to make more informed decisions.
Basic Probability
We can determine how mathematically likely an outcome is by setting up a fraction with the outcome we're interested in on top and all possible outcomes on the bottom. For example, in a bag of 32 chess pieces, 16 will be pawns. The probability of pulling a pawn out of the bag will be 16/32, which reduces to 1/2, or 50%.
The probability of multiple independent events happening is the product of their probabilities. For example, in a bag of 32 chess pieces, the probability of picking a pawn out of the bag twice in a row (provided you put...
PDF Summary “Safe Assumptions” With Inferential Statistics
...
Next, we'll look at some of the most basic terms and concepts of inferential statistics.
Hypothesis Testing
Inferential statistics test hypotheses, which are educated guesses about how the world works. Based on our statistical analyses, we can either accept these hypotheses as true or reject them as false with varying degrees of certainty.
There are common conventions around testing a hypothesis with inferential statistics. We'll give a general overview of some of these conventions and apply them to an example in the following sections.
A Good Hypothesis Takes Work
The word hypothesis is often used colloquially to mean a guess. But this colloquial use can create misconceptions about what a scientific hypothesis is. A scientific hypothesis is based on background subject knowledge and research, a review of related studies, and a sound understanding of any statistics that will be performed during the study. Therefore, when researchers arrive at a quality scientific hypothesis, they have already put in a great deal of time and work.
The Null and Alternative Hypotheses
When we use inferential...
PDF Summary Finding Answers With Regression Analysis
...
Longitudinal studies follow the same individuals, often repeatedly measuring the same variables over the course of years or decades. Often, they simply involve observations and no intervention. Longitudinal studies can embrace the complexity of individual lives because they don’t attempt to standardize people’s experiences. Rather, focusing on a single variable (or group of variables) can provide strong evidence for the effect of a particular experience, treatment, or exposure across various circumstances over time.
While longitudinal studies require a great deal of commitment, their insights are invaluable to researchers in medical and social sciences. For example, another longitudinal study called the Baltimore Longitudinal Study of Aging began in 1958 and has informed how the medical field understands the aging process. Additionally, according to UNICEF, [longitudinal studies are particularly valuable...
PDF Summary Quality Data: The Backbone of Reliable Statistics
...
The Value of Reliable Data
Data isn’t just the backbone of reliable research—it’s big business. Wheelan reminds us that in our technology-driven society, we, the technology users, are a constant source of data for companies like Facebook, which use the data we generate every day to increase their profits.
We might not think of the data we create as individuals as having monetary value, but in 2019, Facebook made over $164 from each of its Canadian and American subscribers. This works out to roughly 10 cents per like! These numbers add up: In 2019 Facebook and Google earned $230 billion, mainly from running ads guided by user data.
Wheelan explains that “big data” isn’t inherently good or bad. The availability of data today opens doors to research and insight that wouldn’t have been possible just a few years ago. But the practice of collecting users’ data online and in public spaces also opens up a host of ethical considerations...
Why are Shortform Summaries the Best?
We're the most efficient way to learn the most useful ideas from a book.
Cuts Out the Fluff
Ever feel a book rambles on, giving anecdotes that aren't useful? Often get frustrated by an author who doesn't get to the point?
We cut out the fluff, keeping only the most useful examples and ideas. We also re-organize books for clarity, putting the most important principles first, so you can learn faster.
Always Comprehensive
Other summaries give you just a highlight of some of the ideas in a book. We find these too vague to be satisfying.
At Shortform, we want to cover every point worth knowing in the book. Learn nuances, key examples, and critical details on how to apply the ideas.
3 Different Levels of Detail
You want different levels of detail at different times. That's why every book is summarized in three lengths:
1) Paragraph to get the gist
2) 1-page summary, to get the main takeaways
3) Full comprehensive summary and analysis, containing every useful point and example
PDF Summary Reducing Bias in Data
...
The effect size of a study is a measure of the difference in outcomes between the treatment and experimental groups. Statisticians argue that the effect size is just as, if not more important than the p-value (which tells us if a study is statistically significant) because a study can show very small differences between outcomes for treatment and control groups and still report statistically significant findings. As Wheelan explains, while the differences in outcomes may be mathematically significant, they may be negligible in the real world. Therefore, in addition to studying a large and random sample, one way to reduce bias in published research is to publish the effect size alongside measures of statistical significance.
A famous example of this phenomenon is the five-year study of 22,000 people that resulted in the recommendation that people take aspirin to prevent heart attacks. The p-value in the study was .00001, meaning that there was a .001% chance that the observed reduction in heart attack rates while taking aspirin was due to random chance....