PDF Summary:How to Lie With Statistics, by

Book Summary: Learn the key points in minutes.

Below is a preview of the Shortform book summary of How to Lie With Statistics by Darrell Huff. Read the full comprehensive summary at Shortform.

1-Page PDF Summary of How to Lie With Statistics

Have you ever wondered how the people who publish statistics so often turn up numbers that support their position? Sometimes, the numbers really do support the person or group publishing them, but statistics aren’t as objective as we might like to believe.

In How to Lie With Statistics, author Darrell Huff explains how people with agendas manipulate figures and how they’re presented to create statistics that support their views. You’ll learn how to spot shady sampling and graphs that are technically accurate but visually misleading. You’ll also learn why suspiciously precise numbers really are suspicious. Finally, you’ll learn about a quick questionnaire you can use to evaluate the legitimacy of any statistic you’re presented with.

(continued)...

Technique #5: Omitting Statistical Qualifiers

The last way to fudge numbers is to leave out information that puts caveats on their accuracy or further explains them. There are four types of information liars often neglect to include with their figures:

1. Probable error. Probable error is a measure of how reliable a figure is, expressed as a range that the true result will fall between. (It’s impossible to find the single number that represents the true result because measuring systems aren’t perfectly accurate.) Therefore, if you’re presented with a single figure, and aren’t given any indication of how accurate it is, it may not be accurate at all.

  • For example, if an IQ test has a probable error of 3 and you score 98, this means that your IQ is somewhere between 95-101 (98-3 = 95, and 98+3 = 101). The real number is equally likely to be any number in that range. So, simply telling someone that your IQ is 98 isn’t accurate.

2. Degree of significance. The degree of significance is a measure of how likely it is that results are due to chance. In most cases, for a figure to be statistically significant, the degree needs to be no more than 5%—this means that 95 out of 100 times, the results are real and not attributable to chance. If the degree isn’t given, it may be higher than 5%, which means the results could be due more to chance than anything else.

3. What the comparison is to. Some stats promise to “triple” the effectiveness of a product, or offer “25% more,” but don’t say what they’re compared against. A granola bar that contains 25% more protein than a competitor’s, versus a bar that contains 25% more protein than a rock, are two entirely different things.

4. Negligibility. While there may be mathematical differences between figures, sometimes, these differences are so small they don’t make any practical difference—but liars fail to point this out.

  • For example, one brand of cigarette may contain a slightly smaller amount of poisonous compounds than another. It’s still toxic.

If liars can’t find a calculation that gives them figures they like, another technique they use is to focus on other figures that do seem to support what they have to say: in other words, to fudge the point. If they can’t prove something, sometimes, they’ll prove something else that sounds like it's the same as what they were trying to prove.

  • For example, if a cold medicine company can’t prove that their drug cures colds, but they can prove that it kills germs in a lab, they might advertise that their medicine “kills 15,000 germs.” Killing germs isn’t the same as curing colds (colds probably aren’t even caused by germs), but they’re close enough that people might think the medicine actually works.

Technique #7: Attributing Correlation to Causation

This technique involves pushing the idea that if there’s a relationship between two factors, one of them caused the other, and whichever factor is most favorable to a liar’s argument is the cause.

  • For example, one study found that smokers got lower grades in college. A non-smoking activist with an agenda might report this as “If you stop smoking, your grades will improve.”

This is misleading because:

1. It’s often impossible to know which factor is the cause and which is the effect.

  • For example, people struggling with the stress of bad grades could be driven to smoking for relief: In other words, bad grades could be the cause of smoking, not the effect of it.

2. Both factors may be effects of some other cause. While the relationship between the factors is real, the cause-and-effect is uncertain.

  • For example, maybe the same people who smoke are the same people who have low grades because they like socializing more than studying.

3. The relationship between the two factors may be only due to chance.

4. Even if there is a real cause-and-effect relationship, that doesn’t mean it applies to everyone. Correlations are tendencies.

  • For example, while it’s fairly conclusive that people who get a post-secondary education have higher incomes than those who don’t, that doesn’t mean that you will make more money if you go to college than if you don’t.

5. Correlations can be caused by humans and trends, rather than the factor you think they’re caused by.

  • For example, older women tend to walk with their toes farther apart than younger women. This is because posture trends changed over the years, not because women’s posture necessarily changes as they age (which is what some people may assume).

Techniques #8-10: Manipulating Images

Technique #8: Truncating Graphs or Add More Divisions to the Y-Axis

To make changes look larger than they are, liars remove the empty space on a graph so that the part the data occupies is the only part shown. This will make the slope of a line look steeper, or the difference between bars look greater.

  • For example, from this graph, it’s obvious that there’s little difference in profit from year-to-year:

how-to-lie-with-statistics-truncate0.png

In this graph, which uses the same data as the first graph but has more divisions and has been truncated, profit looks significantly different from year to year:

how-to-lie-with-statistics-truncate1.png

Technique #9: Failing to Include Labels and Numbers on Graphs

If diagrams and graphs don’t have labels or numbers, it’s impossible to know what they show.

  • For example, one advertising agency presented a graph that showed a steadily rising line. The y-axis showed time in years, but the x-axis had no label. Presumably, it was profit, but without further labeling, it was impossible to know if profits were jumping by millions or cents.

Technique #10: In Bar Graphs, Using Illustrations Instead of Bars

In a bar chart, the height of the bar is what indicates the measurement. If you replace a bar with an illustration, when you increase the height of the illustration, all the other dimensions scale proportionally. Increasing the width and depth (if 3-D) of the image makes the differences between the two images—and thus the differences between what the images represent—look much larger than they really are.

  • (Shortform example: In the illustration below, the skulls represent the death rate from a certain illness. Before a liar’s medication was adopted, the death rate was 60 out of 1 million, represented by a skull at height 60. After adoption, the death rate halved to 30, represented by a skull at height 30. However, visually, the rate appears to have dropped by far more than half because the image appears to have decreased by more than half: The whole image was scaled proportionally, rather than just the height being halved.) Death rate pre-adoption:💀Death rate post-adoption:💀

Assessing the Legitimacy of Statistics

In the previous sections, you learned liars’ techniques for misrepresenting statistics. Now, you’ll learn about a five-question checklist you can go through every time you encounter a statistic to assess its legitimacy. The goal is to find balance—you don’t want to swallow statistics without thinking about them (it’s often worse to know something wrong than to be ignorant), but you also don’t want to be so suspicious that you ignore all statistics and miss out on important information.

Here are the evaluation questions:

1. What is the source of the statistic? The first thing to do when confronted with a statistic is to figure out where it’s coming from. If the source might have an agenda, you should be suspicious of the statistic. (Note that liars often borrow the numbers of reputable organizations, such as universities or labs, but come to their own conclusions using those numbers. Then, then try to make it look like their conclusion is the reputable organization's conclusion, to give their conclusion more credibility. Check if the organization that provided the numbers is the same one that provided the conclusions drawn from them.)

2. What was the data collection method? Any data that’s based on what respondents say, or how motivated they are to respond to a survey, can skew the truth. When confronted with a statistic that was calculated based on people’s responses, ask yourself if there were any reasons the respondents might have been motivated to lie.

  • For example, one census in China, for military and tax purposes, found the population of one region to be 28 million. The next census, for famine relief purposes, found the population of the same region to be 105 million. The population hadn’t changed much over the five years in between censuses—people were just a lot keener to be counted when it meant famine relief than when it meant getting taxed.

3. Is any relevant information omitted? Figures exist in context. If a figure is cited on its own, ask yourself if there is other relevant information that might qualify the figure further, and if leaving that information out would further anyone’s interests.

  • For example, an environmentalist who wants the government to regulate pollution might cite a high death rate during pollution-driven foggy weather in London and attribute the deaths to the fog. However, this doesn’t represent how the world works—people die for plenty of reasons that don’t have anything to do with the weather, and the high death rate could have been caused by something else. A more accurate statistic would be to cite the death rate accompanied by the cause of death: This would show how many people truly died due to fog.

4. Is the language surrounding the figures misleading? Study the words surrounding the figure and consider their definitions (to twist their results to suit their argument, liars may not use the most common definition of an everyday word, as you learned with “average”).

  • (Shortform example: Anything can be the “first,” “biggest,” or “best” of its kind, depending on how people define these words. For instance, the “biggest” waterfall in Canada is Niagara Falls (if “big” means the largest volume of water falling) or Della Falls (if “big” means highest).)

5. Does the statistic make sense? Ask yourself if whatever the statistic reveals seems right, if it conflicts with any well-known facts, or if it’s suspiciously precise.

  • For example, one urologist calculated that there are eight million cases of prostate cancer in the US. At the time, the male population of the US was less than eight million, which meant the figure couldn’t be accurate.

Want to learn the rest of How to Lie With Statistics in 21 minutes?

Unlock the full book summary of How to Lie With Statistics by signing up for Shortform.

Shortform summaries help you learn 10x faster by:

  • Being 100% comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you don't spend your time wondering what the author's point is.
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.

Here's a preview of the rest of Shortform's How to Lie With Statistics PDF summary:

PDF Summary Chapter 1: Misleading With Bad Sampling

...

The only way to get a perfectly accurate statistic is to count every entity that makes up the whole. For example, if you want to know how many red beans there are in a jar of red-and-white colored beans, the only way to find out for sure is to count all of the red beans in the jar.

However, in most cases, counting every single entity is impossibly expensive and impractical. For instance, imagine you were trying to know how many red beans there are in every jar on the planet—you’d have to count all the red beans in the world at any given time.

To get around this problem, statisticians count a sample instead of the whole, assuming the sample’s make-up proportionally represents the whole.

A sample must meet the following two criteria to actually be representative of the whole (and thus, be “good”):

Criteria #1: Large. This reduces the effects of chance—chance affects every survey, poll, and experiment, but when the sample size is large, its effects are negligible.

  • For example, the probability of getting heads when flipping a coin is 50%. In practice, if you flip a coin 10 times, you’re unlikely to get heads five times. You’ll probably get some other...

PDF Summary Chapter 2: Fudging the Numbers

...

Average Type #2: Median. This is the number that falls in the middle when the sample numbers are arranged in numerical order. The median is a useful number for you as a seeker of the truth because it gives you information about the data distribution—half of the sample numbers are above the median and half are below.

  • (Shortform example: The median of the five people who make $30,000, $50,000, $70,000, $80,000,and $80,000 is $70,000.)

Average Type #3: Mode. This is the number that appears most frequently in a data set.

  • (Shortform example: The mode of the above list is $80,000.)

When the distribution of a data set is normal (most of the values fall in the middle, with just a few on the extremes), all of the averages will be similar. However, when the distribution isn’t normal, the averages can be wildly different. In this case, nefarious people can pick the number that suits them best and simply label it the average.

Technique #2: Giving Precise Figures to Appear More Reputable

Another number-fudging technique is to include a decimal to make a figure look more precise and therefore reputable. (For example, reading that most people sleep...

PDF Summary Chapter 3: Fudging the Point

...

  • For example, people struggling with the stress of bad grades could be driven to smoking for relief: In other words, bad grades could be the cause of smoking, not the effect of it.

Sometimes, the two factors are so interrelated they both act as both cause and effect.

  • For example, stock ownership and income are probably both causes and effects at the same time. The higher your income, the more stocks you can afford, and since stocks make you money, the more stocks you have, the higher your income will be.

2. Both factors may be effects of some other cause. While the relationship between the factors is real, the cause-and-effect is uncertain.

  • For example, maybe the same people who smoke are the same people who have low grades because they like socializing more than studying.

3. The relationship between the two may be only due to chance.4. Even if there is a real cause-and-effect relationship, that doesn’t mean it applies to everyone. Correlations are tendencies.

  • For example, while it’s fairly conclusive that people who get a post-secondary education have higher incomes than those who don’t, that doesn’t mean that you will make more money...

What Our Readers Say

This is the best summary of How to Lie With Statistics I've ever read. I learned all the main points in just 20 minutes.

Learn more about our summaries →

PDF Summary Chapter 4: Fudging the Graphics

...

4. In bar graphs, use illustrations instead of bars. In a bar chart, the height of the bar is what indicates the measurement. When you replace a bar with an illustration—say, a bag of money—when you increase the height of the moneybag, all the other dimensions scale proportionally. Increasing the width and depth (if 3-D) of the image makes the differences between the two images look much larger.

  • (Shortform example: In the illustration below, the skulls represent the death rate from a certain illness. Before a liar’s medication was adopted, the death rate was 60 out of 1 million, represented by a skull at height 60. After adoption, the death rate halved to 30, represented by a skull at height 30. However, visually, the rate appears to have dropped by far more than half because the image appears to have decreased by more than half: The whole image was scaled down proportionally, rather than just the height being halved.

how-to-lie-with-statistics-skulls.png

This trick can also give the false impression, depending on the illustration, that whatever’s represented in the image is now larger...

PDF Summary Chapter 5: Assessing the Legitimacy of Statistics

...

  • Unconscious. If the bias is unconscious, there won’t be obvious clues that the figures are inaccurate (for instance, the vague use of the word “average,” without explaining which average they mean). If there are no signs of obvious lying, consider whether the source’s agenda is furthered by the figures it gives and if this might have blinded them to certain ideas or further explorations of the data.

Question #2: What Was the Data Collection Method?

The second question addresses the data collection method. Any data that’s based on what respondents say, or how motivated they are to respond to something in a certain way, can skew the truth because people aren't always truthful. When confronted with a statistic that was calculated based on people’s responses, ask yourself if there’s any reason the respondents might have been motivated to lie.

  • For example, one census in China, for military and tax purposes, found the population of one region to be 28 million. The next census, for famine relief purposes, found the population of the same region to be 105 million. The population hadn’t changed much over the five years in between censuses—people were just a...

Why are Shortform Summaries the Best?

We're the most efficient way to learn the most useful ideas from a book.

Cuts Out the Fluff

Ever feel a book rambles on, giving anecdotes that aren't useful? Often get frustrated by an author who doesn't get to the point?

We cut out the fluff, keeping only the most useful examples and ideas. We also re-organize books for clarity, putting the most important principles first, so you can learn faster.

Always Comprehensive

Other summaries give you just a highlight of some of the ideas in a book. We find these too vague to be satisfying.

At Shortform, we want to cover every point worth knowing in the book. Learn nuances, key examples, and critical details on how to apply the ideas.

3 Different Levels of Detail

You want different levels of detail at different times. That's why every book is summarized in three lengths:

1) Paragraph to get the gist