Skip to main content

Statistics: Basic Statistics II

Many undergraduate degree programs require that students have a basic understanding of descriptive statistics. Descriptive statistics are statistics that collect, summarize, classify and present data. This guide gives an overview of one type of descriptive statistics, measures of variability.

Measures of Central Tendency

Measures of central tendency are the methods of determining central values in a population. The following are the three main measures of central tendency.

  • Mean: the average score
  • Median: the middle score in a sequence of scores in ranked order
  • Mode: the most frequent score

Depending on the shape of a distribution, one of these measures may be more accurate than the others. In symmetrical, unimodal datasets, the mean is the most accurate measure of central tendency. For asymmetrical (skewed), unimodal datasets, the median is likely to be more accurate. For bimodal distributions, the only measure that can capture central tendency accurately is the mode.

 

Descriptive Statistics: statistics that collect, summarize, classify and present data.

 

Measures of Central Tendency: the methods of determining central values in a population.

 

Mean: the average of a sample or a population of scores.

 

Median: the middle score in a set of scores that have been ranked in numerical order.

 

Mode: the most frequently occurring number within a data set.

 

Bimodal: a frequency distribution that has two modes.

 

Multimodal: a frequency distribution that has two or more modes.

 

Skew: A skew occurs when a population’s mean or mode is shifted to the left or right of the median or the mode. They can be negative or positive. The mean is less than the median in a negatively skewed population because there are some low scores that shift the mean to the left. The mode is always less than the mean and median in a positively skewed population.

 

Outlier: a number that is numerically distant from most of the data points in a set of data. 

 

 

 

The mode is the most frequently occurring number within a data set.

 

If two scores occur equally as often within a data set, the set is called bimodal because it has two modes. Any data set that has two or more modes is multimodal. 

 

There is no equation for finding the mode; you just simply count the number of times each score occurs to find the mode. If the data set is multimodal, then you report all modes. 


 

EXAMPLES:

 

1.    12 12 14 15 16 19 22 25 29 33 16 17 18 16 19 16

Mode = 16

 

 

2.  15 16 19 17 14 17 19 17 21 23 25 19 28 26 17 19

Mode = 17 and 19

The median is the middle score in a set of scores that have been ranked in numerical order.  

 

In sequences that have an even number of scores, the median is between the two middle scores and calculated as the middle of those two scores unless the two scores have the same value. 


 

EXERCISE: Order these sequences from smallest to largest to find the median

a. 12 15 48 23 56 22 21 41 57 52 22 46 41 62 34 

   12 15 21 22 22 23 34 41 41 46 48 52 56 57 62

    Median = 41

b. 11 5 7 32 56 41 23 22 17 18 42 6 27 31 42 8 7 11 (18)

     5 6 7 7 8 11 11 17 18 22 23 27 31 32 41 42 42 56

    Median =

 


 

When should you use the median to describe your statistics?

The median is a measure of central tendency that should be used with frequencies that have scores that are heavily skewed because the median is resistant to outliers.


EXAMPLE: the following sequence of scores has been ranked to illustrate the skew within the distribution:

1 4 5 6 7 17 21 21 22 23 24 26 27 31 32 44 109

As you can see, the frequency is skewed, which is indicated by the abnormally large score at the end of the sequence. 


 

In a sample mean, the scores are from the same sample and the mean is denoted by M. When the scores are from a population, you must use an arithmetic mean, which is denoted by m (pronounced “mew”). Therefore, the respective equations for the sample mean and arithmetic mean are as follows:

M= X N
μ = X N

Notice: the equations are the same; the only difference is the symbol used to represent what kind of mean you are looking at.

EXERCISE: find the mean of the following data sets

  1. 8 11 12 14 17 17 18 19 22 23 25 28
  2. 41 42 44 46 47 48 48 49 51 61 66 67

The mean is the average of a sequence of scores. The mean is calculated by summing scores and dividing that sum by the total number of scores. S, or “sigma”, is the Greek symbol for summing. 

M= X / N

 

 


EXAMPLE: this answer was found by summing the scores within a data set and then dividing by the number of scores.

4 + 5 + 5 + 5 + 5 + 8 + 8 + 9 + 11 + 11 + 11 + 12 + 12 + 14 + 15 = 135

135/15 = 9 


 

When should you use the mean?

The mean of a data set can be helpful when it is a relatively normal distribution. However, the mean can be misleading if the frequency of scores is heavily skewed. 

As you can see in Pearson’s diagram below, the mean is equal to the mode and the median in a normal or symmetrical distribution, while in a negatively skewed distribution the mean is to the left of the median and the mode, while the positively skewed distribution has a mean that is to the right of the median and mode.

 

                                                   ©Burak Mızrak

The frequency distributions below show a normal distribution, a positively skewed distribution, and a negatively skewed distribution.

Symmetrical or Normal Distributions

In a normal (or symmetrical) distribution, the mean is in the center of a distribution.

Skewness

Skewness is a measure of the lack of symmetry in a distribution. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. A skew occurs when a population’s mean or mode is shifted to the left or right of the median and/or the mode. They can be negative or positive. If there are outliers within the frequency, the distribution will be skewed and the mean will not be representative of the group. An outlier is a number that lies outside of the distribution’s range. 


EXAMPLE: In a distribution with an outlier or in a heavily skewed distribution (the data is not normally distributed), the mean is pulled in the direction of the outlier or skew, and is thus not the most accurate measure of central tendency. Under these circumstances, the median will better describe the dataset.
 

(2+2+2+3+3)/5 = 2.4 (without outlier)

(2+2+2+3+3+12)/6 = 4 (with outlier)


 

Negatively Skewed Distributions

The mean is less than the median in a negatively skewed population because there are some low scores that shift the mean to the left.

Positively Skewed Distribution

The mode is always less than the mean and median in a positively skewed population.

Kurtosis

Kurtosis is a measure of a distribution’s peak. It can be peaked or flat relative to a normal distribution. Leptokurtic data sets with high kurtosis have a distinct peak near the mean, decline rapidly, and have heavy tails. Platykurtic data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. Finally, mesokurtic data sets are symmetrical and have a moderate peak. 

1. 

  1. 8 11 12 14 17 17 18 19 22 23 25 28

8+11+12+14+17+17+18+19+22+23+25+28=214

214/N or 214/12 = 17.83

  1. 41 42 44 46 47 48 48 49 51 61 66 67
    41+42+44+46+47+48+48+49+51+61+66+67=610

610/N = 610/12 = 50.83

2. 

a. 12 15 48 23 56 22 21 41 57 52 22 46 41 62 34

12 15 21 22 22 23 34 41 41 46 48 52 56 57 62

Median = 41

b. 11 5 7 32 56 41 23 22 17 18 42 6 27 31 42 8 7 11 (18)

5 6 7 7 8 11 11 17 18 22 23 27 31 32 41 42 42 56

Median = (18 + 22)/2 = 20

 
Loading

Download The Related Handout in PDF format

Credits

This Libguide is the collaborative product of Learning Centre tutors and faculty at Douglas College, British Columbia.

 

Project Coordinator

 Mina Sedaghatjou

 

 

Handout Developers

Hailea Williams

 

 

LibGuide Designer

 Farzad Kooshyar 

 

 

Editor 

Kevin Kumagai