Measures of Variability
Measures of central tendency give information about the similarity among data values, and so measures of variability give information about how much data values differ or vary.
The fact that measures of objects vary describes the concept of individual differences. The description of data is never complete until some indication of variability is found. To interpret the measures of central tendency correctly, knowledge of variability is important.
There are three major measures of variability: the range, the standard deviation, and the variance. Other measures of variability are the interquartile range, the interdecile range, and the quartile deviation.
The Range A quick and easy way to describe the variability of any distribution is to calculate the range, R.
The range is the measurement of the width of the entire distribution, where it is always given as a single value. The range is calculated as the following:
R = (highest data value) - (lowest data value)
Although the range is a useful device for providing some information about variability, it has the severe limit of being based on only two data values, the highest and the lowest data value. Distributions can have identical means and ranges but differ widely in terms of other measures of variability.
Another problem with the range is that it can change when new data values are added to the distribution. These new data values cannot reduce the range, but they can increase the range.
Percentiles, Quartiles, and the Deciles Percentiles, quartiles, and deciles are descriptive terms used to locate specific data values in a distribution. Although they are not measures of variability, they are used to create various forms of the range.
An nth percentile (or centile) is a data value on a distribution, where below this data value is n% of the data values, and above this data value is (100 - n)% of the data values. It is when a raw data value is converted into a percentile to be described.
For example, a data value at the 95th percentile (P95) is at the high end of the distribution, because 95% of the data values are below that data value. A data value at the 5th percentile (P5) is at the low end of the distribution, because only 5% of the data values are below that data value, with 95% above it.
Since the 50th percentile divides the distribution in half, the 50th percentile is always equal to the median.
Percentiles are also referred to as percentile ranks. For example, a percentile rank of 65 indicates a data values at the 65th percentile.
When referring to either percentiles or percentile ranks, the actual location on the distribution is a hypothetical location without dimension.
There are two types of confusion regarding percentiles. First, if one took a test and they scored at the 95th percentile, it does not mean they answered 95% of the test correctly. Instead, it means 95% of the others that took the test scored worse. Second, the rank expressed by a percentile is always in reference to an entire group being measured and compared, so be careful what kind of group is being measured before making conclusions.
Quartiles divide a distribution into quarters. The first quartile, Q1, is the 25th percentile, the second quartile, Q2, is the 50th percentile, or the median, and the third quartile, Q3, is the 75th percentile.
Deciles divide a distribution into tenths. The first decile is the 10th percentile, and so on. Note that the fifth decile is the 50th percentile, which is the median.
The Interquartile Range The interquartile range (IQR) is the difference between the first quartile and the third quartile.
IQR = Q3 - Q1
The interquartile range includes the data values that make up the middlemost 50% of the distribution, where 25% of the data values in the interquartile range are above and below the median.
The following graph is percentiles, quartiles, deciles, and the interquartile range of a distribution:
Since the interquartile range is not affected by extreme data values, it can be used with skewed distributions.
The interquartile range is always computed with reference to the median, or the second quartile. Therefore, not only can the interquartile range be used with interval or ratio data, it can also be used with ordinal data. Even if the data values are given only in terms of their ranked order, the interquartile range can be used to determine what data values fall in the middlemost 50% of its distribution.
When the median is being used as the appropriate measure of central tendency, the interquartile range is generally used as its measure of variability.
The Quartile Deviation When the interquartile range is divided in half, it produces a measure of variability called the quartile deviation (Q), or the semi-interquartile range:
The quartile deviation gives a general measure of how much the data values vary around the median.
The Interdecile Range The interdecile range is the difference between the first decile and the ninth decile.
The interdecile range includes the middlemost 80% of the distribution, where 40% of the data values falls above and below the median.
This measure of variability is unaffected by extreme data values, and so the interdecile range can be used with skewed distributions.
Like the interquartile range, the interdecile range can be calculated when the data is ordinal.
The Standard Deviation The standard deviation is a very important part of the variability concept. Unlike the range, the standard deviation takes into account all of the data values in a distribution.
The standard deviation is a measure of variability that indicates how much all of the data values in a distribution deviate, or vary, from the mean.
Since the standard deviation is always computed with reference to the mean and never the median or the mode, its calculation can only be used with interval or ratio data.
The standard deviation is the average deviation in a given distribution. The larger the value of the standard deviation, the more the data values are spread out around the mean. The smaller the value of the standard deviation, the less the data values are spread out around the mean. A distribution with a small standard deviation indicates that the group being measured is homogeneous, where the data values are very close to the mean. A distribution with a large standard deviation indicates that the group being measured is heterogeneous, where the data values are very far from the mean.
There are two methods for calculating the standard deviation: the deviation method, which helps understand the concept of the standard deviation, and the computational method, which is easier to compute on a calculator than the deviation method.
Additionally, when the distribution is normal, the quartile deviation is approximately equal to two-thirds of the standard deviation.
The Deviation Method The deviation method is based on the concept of the deviation data value. Any raw data value in a distribution is called x, and the mean of the distribution is called M or x bar. Using these two values creates the deviation data value X.
Because the standard deviation must account for all the data values in the entire distribution, a deviation data value must be found for each and every raw data value.
The symbol for the standard deviation is SD, or s.d., where the following is its formula:
The Computational Method The computational method is much easier and quicker to use with a calculator. The equation for the computational method is the following:
Unlike the computational method, the deviation shows the reliance on the variability of all data values around the mean.
Using the Standard Deviation The standard deviation is a very useful descriptive tool. It allows a variability analysis of the data, which can be the key to what the data are communicating.
For example, suppose there exists a manufacturer of flashlight batteries. The testing of two large-number groups of two types of batteries shows that each battery type has the same average life of 25 hours. The first type, Battery A, has a standard deviation of 2 hours, and the other type of battery, Battery B, has a standard deviation of 10 hours. If the manufacturer plans to guarantee a customer's battery will last for 25 hours, the manufacturer should choose Battery A, because it is closer to the mean of 25 hours, so Battery A has the chance to last at least 27 hours (25 + 2), or 23 hours or less (25 - 2). In regards to Battery B with standard deviation of 10 hours, it has a chance of lasting at least 35 hours (25 + 10), or 15 hours or less (25 - 10), which could cause complaints to the manufacturer.
The Variance The variance is the third major technique for assessing variability. Once the standard deviation is known, the variance (V) is easy to calculate.
Calculating the variance is the same as calculating the standard deviation without the square root.
Since the variance has a mathematical relationship with the standard deviation, the variance is also a measure of variability that tells how much all the data values in a distribution vary from the mean. Just like with the standard deviation, heterogeneous distributions with a lot of spread have large variances, and therefore large standard deviations. Homogeneous distributions with little spread have small variances, and therefore small standard deviations.
It may seem redundant to have two such variability measures, with one that is simply the square root of the other, however, there are situations where working directly with the variance allows for certain calculations not possible for the standard deviation to achieve.
Measures of Variability and Negativity All measures of variability must reflect either some variability or none at all. There can never be a less than zero, or negative, variability. The range, the standard deviation, and the variance should never be negative values.
Although unlikely, when the data values are all equal, the range, the standard deviation, and the variance are equal to zero, because there is no variability. The value of zero is the smallest value any variability measure can have. If it is a negative value, then the calculations were computed wrongly.














