Mean and Variance

16.12 Mean and Variance

To start our presentation of descriptive statistics, we construct a data set using a spreadsheet program. The idea is to simulate the flipping of a two-sided coin. Although you might think it would be easier just to flip a coin, doing this on a spreadsheet gives you a full range of tools embedded in that program. To generate the data set, we drew 10 random numbers using the spreadsheet program. In the program we used, the function was called RAND and this generated the choice of a number between zero and one. Those choices are listed in the second column of Table 16.9.

The third column creates the two events of heads and tails that we normally associate with a coin flip. To generate this last column, we adopted a rule: if the random number was less than 0.5, we termed this a “tail” and assigned a 0 to the draw; otherwise we termed it a “head” and assigned a 1 to the draw. The choice of 0.5 as the cutoff for heads reflects the fact that we are considering the flips of a fair coin in which each side has the same probability: 0.5.

Table 16.9

Draw	Random Number	Heads (1) or Tails (0)
1	0.94	1
2	0.84	1
3	0.26	0
4	0.04	0
5	0.01	0
6	0.57	1
7	0.74	1
8	0.81	1
9	0.64	1
10	0.25	0

Keep in mind that the realization of the random number in draw i is independent of the realizations of the random numbers in both past and future draws. Whether a coin comes up heads or tails on any particular flip does not depend on other outcomes.

There are many ways to summarize the information contained in a sample of data. Even before you start to compute some complicated statistics, having a way to present the data is important. One possibility is a bar graph in which the fraction of observations of each outcome is easily shown. Alternatively, a pie chart is often used to display this fraction. Both the pie chart and the bar diagram are commonly found in spreadsheet programs.

Economists and statisticians often want to describe data in terms of numbers rather than figures. We use the data from the table to define and illustrate two statistics that are commonly used in economics discussions. The first is the mean (or average) and is a measure of central tendency. Before you read any further, ask, “What do you think the average ought to be from the coin flipping exercise?” It is natural to say 0.5, since half the time the outcome will be a head and thus have a value of zero, whereas the remainder of the time the outcome will be a tail and thus have a value of one.

Whether or not that guess holds can be checked by looking at Table 16.9 and calculating the mean of the outcome. We let k_i be the outcome of draw i. For example, from the table, k₁ = 1 and k₅ = 0. Then the formula for the mean if there are N draws is μ = Σ_ik_i/N. Here Σ_ik_i means the sum of the k_i outcomes. In words, the mean, denoted by μ, is calculated by adding together the draws and dividing by the number of draws (N). In the table, N = 10, and the sum of the draws of random numbers is about 51.0. Thus the mean of the 10 draws is about 0.51.

We can also calculate the mean of the heads/tails column, which is 0.6 since heads came up 6 times in our experiment. This calculation of the mean differs from the mean of the draws since the numbers in the two columns differ with the third column being a very discrete way to represent the information in the second column.

A second commonly used statistic is a measure of dispersion of the data called the variance. The variance, denoted σ², is calculated as σ² = Σ_i(k_i − μ)²/(N). From this formula, if all the draws were the same (thus equal to the mean), then the variance would be zero. As the draws spread out from the mean (both above and below), the variance increases. Since some observations are above the mean and others below, we square the difference between a single observation (k_i) and the mean (μ) when calculating the variance. This means that values above and below the mean both contribute a positive amount to the variance. Squaring also means that values that are a long way away from the mean have a big effect on the variance.

For the data given in the table, the mean of the 10 draws was given as μ = 0.51. So to calculate the variance, we would subtract the mean from each draw, square the difference, and then add together the squared differences. This yields a variance of 0.118 for this draw. A closely related concept is that of the standard deviation, which is the square root of the variance. For our example, the standard deviation is 0.34. The standard deviation is greater than the variance since the variance is less than 1.

16.12 Mean and Variance

The Main Uses of This Tool