Mean and Variance

17.22 Mean and Variance

To start our presentation of descriptive statistics, we construct a data set using a spreadsheet program. The idea is to simulate the flipping of a two-sided coin. While you might think it would be easier just to flip a coin, doing this on a spreadsheet gives you a full range of tools embedded in that program. To generate the data set, we drew 10 random numbers using the spreadsheet program. In the program we used, the function was called RAND, and this generated the choice of a number between zero and one. Those choices are listed in the second column of the table.

The third column creates the two events of heads and tails that we normally associate with a coin flip. To generate this last column, we adopted a rule: if the random number was less than 0.5, we termed this a “tail” and assigned a 0 to the draw; otherwise, we termed it a “head” and assigned a 1 to the draw. The choice of 0.5 as the cut-off for heads reflects the fact that we are considering the flips of a fair coin in which each side has the same probability of 0.5.

Table 17.10

Draw	Random Number	Heads (1) or Tails (0)
1	0.94	1
2	0.84	1
3	0.26	0
4	0.04	0
5	0.01	0
6	0.57	1
7	0.74	1
8	0.81	1
9	0.64	1
10	0.25	0

Keep in mind that the realization of the random number in draw i is independent of the realizations of the random numbers in both past and future draws. Whether a coin comes up heads or tails on any particular flip does not depend on other outcomes.

There are many ways to summarize the information contained in a sample of data. Even before you start to compute some complicated statistics, having a way to present the data is important. One possibility is a bar graph in which the fraction of observations of each outcome is easily shown. Alternatively, a pie chart is often used to display this fraction. Both the pie chart and bar diagram are commonly found in spreadsheet programs.

Economists and statisticians often want to describe data in terms of numbers rather than figures. We use the data from Table 17.10 to define and illustrate two statistics that are commonly used in economics discussions. The first is the mean (or average) and is a measure of central tendency. Before you read any further, ask yourself what you think the average ought to be from the coin-flipping exercise. It is natural to say 0.5, since half of the time the outcome will be a head and thus have a value of zero, while the remainder of the time the outcome will be a tail and thus have a value of one.

Whether or not that guess holds can be checked by looking at Table 17.10 and calculating the mean of the outcome. We let k_i be the outcome of draw i. For example, from the table, k₁ = 1 and k₅ = 0. Then the formula for the mean if there are N draws is μ = Σ_i k_i/N. Here Σ_i k_i means the sum of the k_i outcomes. In words, the mean, denoted by μ, is calculated by summing the draws and dividing by the number of draws, N. In the table, N = 10 and the sum of the draws of random numbers is about 51.0. Thus the mean of the 10 draws is about 0.51.

We can also calculate the mean of the heads/tails column, and this is 0.6, since heads came up 6 times in our experiment. This calculation of the mean differs from the mean of the draws because the numbers in the two columns differ, with the third column being a very discrete way to represent the information in the second column.

A second commonly used statistic is a measure of dispersion of the data called the variance. The variance, denoted σ², is calculated as σ² = Σ_i (k_i − μ)²/(N − 1). From this formula, if all the draws were the same (thus equal to the mean), then the variance would be zero. As the draws spread out from the mean (both above and below), the variance increases. Since some observations are above the mean and others below, we square the difference between a single observation (k_i) and the mean (μ) when calculating the variance. This means that values above and below the mean both contribute a positive amount to the variance. Squaring also means that values that are a long way away from the mean have a big effect on the variance.

For the data given in Table 17.10, the mean of the 10 draws was given: μ = 0.51. So to calculate the variance, we would subtract the mean from each draw, square the difference, and then sum up the squared differences. This yields a variance of 0.118 for this draw. A closely related concept is that of the standard deviation, which is just the square root of the variance. For our example, the standard deviation is 0.34. The standard deviation is bigger than the variance because the variance is less than 1.