This article is an excerpt from the Shortform book guide to "Naked Statistics" by Charles Wheelan. Shortform has the world's best summaries and analyses of books you should be reading.
Like this article? Sign up for a free trial here .
What is the difference between mean and mode? Which of the two is a more accurate measure?
Both the mean and the mode are measures of central tendency. For non-skewed distributions, the mean is more accurate because it takes into account every value in the data set. For skewed data, the median is better because it isn’t influenced by outliers.
Keep reading to learn about the difference between mean vs. mode and when to use which.
Mean vs. Mode
Both the mean and the mode represent the middle of a data set.
The average, or mean, of a data set is the sum of all of the values in the data set divided by the number of data points. For example: If you wanted to know the average number of cookies you eat each time you open a package, you would keep track of the number of cookies you eat at each sitting and divide that number by the number of cookie-eating events.
The sum of the values in your data set: 15+8+6+10+9= 48. Divided by the number of data points (5): 48/5= 9.6. You average 9.6 cookies per sitting.
Number of Cookies Eaten per Sitting | ||||
15 | 8 | 6 | 10 | 9 |
The median is another way to measure central tendency and is not influenced by outliers. The median takes an ordered data set (where the values are organized into ascending order) and divides it in half. The median is the middle value of a data set (or the average of the two middle values if the data set has an even number of data points).
Back to our chocolate eggs example, our ordered data set might look like this:
$ Earned From Chocolate Easter Egg Sales | |||||||||||
0 | 0 | 0 | 1 | 1 | 4 | 10 | 15 | 16 | 20 | 25 | 3000 |
To calculate the median, we take the average of four and 10, which is seven. So the median chocolate egg sales figure is $7, which is a very different figure from the mean of $300, even though both are measures of central tendency.
Limitations of Using the Mean
Wheelan cautions that the mean can be a misleading figure because it doesn’t convey the influence of outliers in a data set. (An outlier is a data point that is numerically far from others in the same data set.) In other words, a few “extreme” pieces of data can skew the mean in either direction, giving us a warped sense of the average.
For example, a store manager may report that her average monthly sales of Easter egg chocolates totaled $300 over the last year. However, her monthly sales data shows that she sold $3,000-worth of chocolate eggs in April, while sales for the other 11 months totaled between zero and $25. In this data set, the month of April is an outlier, and the mean of $300 doesn’t provide the truest picture of average chocolate egg sales for the store.
Choosing Between the Mean and Median Because the mean and median can tell different stories about a dataset, there is a convention in statistics for when to use one over the other: When a dataset is evenly distributed, the convention is to use the mean. When a dataset is not evenly distributed, or extreme outliers will skew the mean, the convention is to use the median. However, the mean is the only measure of central tendency used in mathematical statistics calculations and is used more often than the median. Another way to add context in reporting central tendency is to provide the mean or median and a sense of the spread of the data, such as the range (highest and lowest values) or standard deviation (which we’ll discuss next). For instance, in our chocolate egg example, if the manager reported a sales mean of $7/month and a range of zero to $3,000, readers would quickly see that sales aren’t evenly distributed. |
———End of Preview———
Like what you just read? Read the rest of the world's best book summary and analysis of Charles Wheelan's "Naked Statistics" at Shortform .
Here's what you'll find in our full Naked Statistics summary :
- An explanation and breakdown of statistics into digestible terms
- How statistics can inform collective decision-making
- Why learning statistics is an exercise in self-empowerment