9.3 Comparison of Two Population Means: Paired Samples

Learning Objectives

1. To learn the distinction between independent samples and paired samples.
2. To learn how to construct a confidence interval for the difference in the means of two distinct populations using paired samples.
3. To learn how to perform a test of hypotheses concerning the difference in the means of two distinct populations using paired samples.

Suppose chemical engineers wish to compare the fuel economy obtained by two different formulations of gasoline. Since fuel economy varies widely from car to car, if the mean fuel economy of two independent samples of vehicles run on the two types of fuel were compared, even if one formulation were better than the other the large variability from vehicle to vehicle might make any difference arising from difference in fuel difficult to detect. Just imagine one random sample having many more large vehicles than the other. Instead of independent random samples, it would make more sense to select pairs of cars of the same make and model and driven under similar circumstances, and compare the fuel economy of the two cars in each pair. Thus the data would look something like Table 9.1 "Fuel Economy of Pairs of Vehicles", where the first car in each pair is operated on one formulation of the fuel (call it Type 1 gasoline) and the second car is operated on the second (call it Type 2 gasoline).

Table 9.1 Fuel Economy of Pairs of Vehicles

Make and Model Car 1 Car 2
Buick LaCrosse 17.0 17.0
Dodge Viper 13.2 12.9
Honda CR-Z 35.3 35.4
Hummer H 3 13.6 13.2
Lexus RX 32.7 32.5
Mazda CX-9 18.4 18.1
Saab 9-3 22.5 22.5
Toyota Corolla 26.8 26.7
Volvo XC 90 15.1 15.0

The first column of numbers form a sample from Population 1, the population of all cars operated on Type 1 gasoline; the second column of numbers form a sample from Population 2, the population of all cars operated on Type 2 gasoline. It would be incorrect to analyze the data using the formulas from the previous section, however, since the samples were not drawn independently. What is correct is to compute the difference in the numbers in each pair (subtracting in the same order each time) to obtain the third column of numbers as shown in Table 9.2 "Fuel Economy of Pairs of Vehicles" and treat the differences as the data. At this point, the new sample of differences $d1=0.0,…,d9=0.1$ in the third column of Table 9.2 "Fuel Economy of Pairs of Vehicles" may be considered as a random sample of size n = 9 selected from a population with mean $μd=μ1−μ2.$ This approach essentially transforms the paired two-sample problem into a one-sample problem as discussed in the previous two chapters.

Table 9.2 Fuel Economy of Pairs of Vehicles

Make and Model Car 1 Car 2 Difference
Buick LaCrosse 17.0 17.0 0.0
Dodge Viper 13.2 12.9 0.3
Honda CR-Z 35.3 35.4 −0.1
Hummer H 3 13.6 13.2 0.4
Lexus RX 32.7 32.5 0.2
Mazda CX-9 18.4 18.1 0.3
Saab 9-3 22.5 22.5 0.0
Toyota Corolla 26.8 26.7 0.1
Volvo XC 90 15.1 15.0 0.1

Note carefully that although it does not matter what order the subtraction is done, it must be done in the same order for all pairs. This is why there are both positive and negative quantities in the third column of numbers in Table 9.2 "Fuel Economy of Pairs of Vehicles".

Confidence Intervals

When the population of differences is normally distributed the following formula for a confidence interval for $μd=μ1−μ2$ is valid.

$100(1−α)%$ Confidence Interval for the Difference Between Two Population Means: Paired Difference Samples

$d-±tα∕2sdn$

where there are n pairs, $d-$ is the mean and sd is the standard deviation of their differences.

The number of degrees of freedom is $df=n−1.$

The population of differences must be normally distributed.

Example 7

Using the data in Table 9.1 "Fuel Economy of Pairs of Vehicles" construct a point estimate and a 95% confidence interval for the difference in average fuel economy between cars operated on Type 1 gasoline and cars operated on Type 2 gasoline.

Solution:

We have referred to the data in Table 9.1 "Fuel Economy of Pairs of Vehicles" because that is the way that the data are typically presented, but we emphasize that with paired sampling one immediately computes the differences, as given in Table 9.2 "Fuel Economy of Pairs of Vehicles", and uses the differences as the data.

The mean and standard deviation of the differences are

$d-=Σdn=1.39=0.14-andsd=Σd2−1n(Σd)2n−1=0.41−19(1.3)28=0.16-$

The point estimate of $μ1−μ2=μd$ is

$d-=0.14$

In words, we estimate that the average fuel economy of cars using Type 1 gasoline is 0.14 mpg greater than the average fuel economy of cars using Type 2 gasoline.

To apply the formula for the confidence interval, we must find $tα∕2.$ The 95% confidence level means that $α=1−0.95=0.05$ so that $tα∕2=t0.025.$ From Figure 12.3 "Critical Values of ", in the row with the heading $df=9−1=8$ we read that $t0.025=2.306.$ Thus

$d-±tα∕2sdn=0.14±2.3060.16-9≈0.14±0.13$

We are 95% confident that the difference in the population means lies in the interval $[0.01,0.27]$, in the sense that in repeated sampling 95% of all intervals constructed from the sample data in this manner will contain $μd=μ1−μ2.$ Stated differently, we are 95% confident that mean fuel economy is between 0.01 and 0.27 mpg greater with Type 1 gasoline than with Type 2 gasoline.

Hypothesis Testing

Testing hypotheses concerning the difference of two population means using paired difference samples is done precisely as it is done for independent samples, although now the null and alternative hypotheses are expressed in terms of $μd$ instead of $μ1−μ2.$ Thus the null hypothesis will always be written

$H0:μd=D0$

The three forms of the alternative hypothesis, with the terminology for each case, are:

Form of $Ha$ Terminology
$Ha:μd Left-tailed
$Ha:μd>D0$ Right-tailed
$Ha:μd≠D0$ Two-tailed

The same conditions on the population of differences that was required for constructing a confidence interval for the difference of the means must also be met when hypotheses are tested. Here is the standardized test statistic that is used in the test.

Standardized Test Statistic for Hypothesis Tests Concerning the Difference Between Two Population Means: Paired Difference Samples

$T=d-−D0sd∕n$

where there are n pairs, $d-$ is the mean and sd is the standard deviation of their differences.

The test statistic has Student’s t-distribution with $df=n−1$ degrees of freedom.

The population of differences must be normally distributed.

Example 8

Using the data of Table 9.2 "Fuel Economy of Pairs of Vehicles" test the hypothesis that mean fuel economy for Type 1 gasoline is greater than that for Type 2 gasoline against the null hypothesis that the two formulations of gasoline yield the same mean fuel economy. Test at the 5% level of significance using the critical value approach.

Solution:

The only part of the table that we use is the third column, the differences.

• Step 1. Since the differences were computed in the order $Type 1 mpg− Type 2 mpg$, better fuel economy with Type 1 fuel corresponds to $μd=μ1−μ2>0.$ Thus the test is

$H0:μd=0 vs. Ha:μd>0@ α=0.05$

(If the differences had been computed in the opposite order then the alternative hypotheses would have been $Ha:μd<0.$)

• Step 2. Since the sampling is in pairs the test statistic is

$T=d-−D0sd∕n$
• Step 3. We have already computed $d-$ and sd in the previous example. Inserting their values and $D0=0$ into the formula for the test statistic gives

$T=d-−D0sd∕n=0.14-0.16-∕3=2.600$
• Step 4. Since the symbol in Ha is “>” this is a right-tailed test, so there is a single critical value, $tα=t0.05$ with 8 degrees of freedom, which from the row labeled $df=8$ in Figure 12.3 "Critical Values of " we read off as 1.860. The rejection region is $[1.860,∞).$
• Step 5. As shown in Figure 9.5 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H0. In the context of the problem our conclusion is:

Figure 9.5 Rejection Region and Test Statistic for Note 9.20 "Example 8"

The data provide sufficient evidence, at the 5% level of significance, to conclude that the mean fuel economy provided by Type 1 gasoline is greater than that for Type 2 gasoline.

Example 9

Perform the test of Note 9.20 "Example 8" using the p-value approach.

Solution:

The first three steps are identical to those in Note 9.20 "Example 8".

• Step 4. Because the test is one-tailed the observed significance or p-value of the test is just the area of the right tail of Student’s t-distribution, with 8 degrees of freedom, that is cut off by the test statistic T = 2.600. We can only approximate this number. Looking in the row of Figure 12.3 "Critical Values of " headed $df=8$, the number 2.600 is between the numbers 2.306 and 2.896, corresponding to t0.025 and t0.010.

The area cut off by t = 2.306 is 0.025 and the area cut off by t = 2.896 is 0.010. Since 2.600 is between 2.306 and 2.896 the area it cuts off is between 0.025 and 0.010. Thus the p-value is between 0.025 and 0.010. In particular it is less than 0.025. See Figure 9.6.

Figure 9.6 P-Value for Note 9.21 "Example 9"

• Step 5. Since 0.025 < 0.05, $p<α$ so the decision is to reject the null hypothesis:

The data provide sufficient evidence, at the 5% level of significance, to conclude that the mean fuel economy provided by Type 1 gasoline is greater than that for Type 2 gasoline.

The paired two-sample experiment is a very powerful study design. It bypasses many unwanted sources of “statistical noise” that might otherwise influence the outcome of the experiment, and focuses on the possible difference that might arise from the one factor of interest.

If the sample is large (meaning that n ≥ 30) then in the formula for the confidence interval we may replace $tα∕2$ by $zα∕2.$ For hypothesis testing when the number of pairs is at least 30, we may use the same statistic as for small samples for hypothesis testing, except now it follows a standard normal distribution, so we use the last line of Figure 12.3 "Critical Values of " to compute critical values, and p-values can be computed exactly with Figure 12.2 "Cumulative Normal Probability", not merely estimated using Figure 12.3 "Critical Values of ".

Key Takeaways

• When the data are collected in pairs, the differences computed for each pair are the data that are used in the formulas.
• A confidence interval for the difference in two population means using paired sampling is computed using a formula in the same fashion as was done for a single population mean.
• The same five-step procedure used to test hypotheses concerning a single population mean is used to test hypotheses concerning the difference between two population means using pair sampling. The only difference is in the formula for the standardized test statistic.

Basic

In all exercises for this section assume that the population of differences is normal.

1. Use the following paired sample data for this exercise.

$Population 135323535363536Population 228262726292729$
1. Compute $d-$ and sd.
2. Give a point estimate for $μ1−μ2=μd.$
3. Construct the 95% confidence interval for $μ1−μ2=μd$ from these data.
4. Test, at the 10% level of significance, the hypothesis that $μ1−μ2>7$ as an alternative to the null hypothesis that $μ1−μ2=7.$
2. Use the following paired sample data for this exercise.

$Population 110312796110Population 2811067388$ $Population 190118130106Population 2709510983$
1. Compute $d-$ and sd.
2. Give a point estimate for $μ1−μ2=μd.$
3. Construct the 90% confidence interval for $μ1−μ2=μd$ from these data.
4. Test, at the 1% level of significance, the hypothesis that $μ1−μ2<24$ as an alternative to the null hypothesis that $μ1−μ2=24.$
3. Use the following paired sample data for this exercise.

$Population 140275534Population 253426850$
1. Compute $d-$ and sd.
2. Give a point estimate for $μ1−μ2=μd.$
3. Construct the 99% confidence interval for $μ1−μ2=μd$ from these data.
4. Test, at the 10% level of significance, the hypothesis that $μ1−μ2≠−12$ as an alternative to the null hypothesis that $μ1−μ2=−12.$
4. Use the following paired sample data for this exercise.

$Population 1196165181201190Population 2212182199210205$
1. Compute $d-$ and sd.
2. Give a point estimate for $μ1−μ2=μd.$
3. Construct the 98% confidence interval for $μ1−μ2=μd$ from these data.
4. Test, at the 2% level of significance, the hypothesis that $μ1−μ2≠−20$ as an alternative to the null hypothesis that $μ1−μ2=−20.$

Applications

1. Each of five laboratory mice was released into a maze twice. The five pairs of times to escape were:

Mouse 1 2 3 4 5
First release 129 89 136 163 118
Second release 113 97 139 85 75
1. Compute $d-$ and sd.
2. Give a point estimate for $μ1−μ2=μd.$
3. Construct the 90% confidence interval for $μ1−μ2=μd$ from these data.
4. Test, at the 10% level of significance, the hypothesis that it takes mice less time to run the maze on the second trial, on average.
2. Eight golfers were asked to submit their latest scores on their favorite golf courses. These golfers were each given a set of newly designed clubs. After playing with the new clubs for a few months, the golfers were again asked to submit their latest scores on the same golf courses. The results are summarized below.

Golfer 1 2 3 4 5 6 7 8
Own clubs 77 80 69 73 73 72 75 77
New clubs 72 81 68 73 75 70 73 75
1. Compute $d-$ and sd.
2. Give a point estimate for $μ1−μ2=μd.$
3. Construct the 99% confidence interval for $μ1−μ2=μd$ from these data.
4. Test, at the 1% level of significance, the hypothesis that on average golf scores are lower with the new clubs.
3. A neighborhood home owners association suspects that the recent appraisal values of the houses in the neighborhood conducted by the county government for taxation purposes is too high. It hired a private company to appraise the values of ten houses in the neighborhood. The results, in thousands of dollars, are

House County Government Private Company
1 217 219
2 350 338
3 296 291
4 237 237
5 237 235
6 272 269
7 257 239
8 277 275
9 312 320
10 335 335
1. Give a point estimate for the difference between the mean private appraisal of all such homes and the government appraisal of all such homes.
2. Construct the 99% confidence interval based on these data for the difference.
3. Test, at the 1% level of significance, the hypothesis that appraised values by the county government of all such houses is greater than the appraised values by the private appraisal company.
4. In order to cut costs a wine producer is considering using duo or 1 + 1 corks in place of full natural wood corks, but is concerned that it could affect buyers’s perception of the quality of the wine. The wine producer shipped eight pairs of bottles of its best young wines to eight wine experts. Each pair includes one bottle with a natural wood cork and one with a duo cork. The experts are asked to rate the wines on a one to ten scale, higher numbers corresponding to higher quality. The results are:

Wine Expert Duo Cork Wood Cork
1 8.5 8.5
2 8.0 8.5
3 6.5 8.0
4 7.5 8.5
5 8.0 7.5
6 8.0 8.0
7 9.0 9.0
8 7.0 7.5
1. Give a point estimate for the difference between the mean ratings of the wine when bottled are sealed with different kinds of corks.
2. Construct the 90% confidence interval based on these data for the difference.
3. Test, at the 10% level of significance, the hypothesis that on the average duo corks decrease the rating of the wine.
5. Engineers at a tire manufacturing corporation wish to test a new tire material for increased durability. To test the tires under realistic road conditions, new front tires are mounted on each of 11 company cars, one tire made with a production material and the other with the experimental material. After a fixed period the 11 pairs were measured for wear. The amount of wear for each tire (in mm) is shown in the table:

Car Production Experimental
1 5.1 5.0
2 6.5 6.5
3 3.6 3.1
4 3.5 3.7
5 5.7 4.5
6 5.0 4.1
7 6.4 5.3
8 4.7 2.6
9 3.2 3.0
10 3.5 3.5
11 6.4 5.1
1. Give a point estimate for the difference in mean wear.
2. Construct the 99% confidence interval for the difference based on these data.
3. Test, at the 1% level of significance, the hypothesis that the mean wear with the experimental material is less than that for the production material.
6. A marriage counselor administered a test designed to measure overall contentment to 30 randomly selected married couples. The scores for each couple are given below. A higher number corresponds to greater contentment or happiness.

Couple Husband Wife
1 47 44
2 44 46
3 49 44
4 53 44
5 42 43
6 45 45
7 48 47
8 45 44
9 52 44
10 47 42
11 40 34
12 45 42
13 40 43
14 46 41
15 47 45
16 46 45
17 46 41
18 46 41
19 44 45
20 45 43
21 48 38
22 42 46
23 50 44
24 46 51
25 43 45
26 50 40
27 46 46
28 42 41
29 51 41
30 46 47
1. Test, at the 1% level of significance, the hypothesis that on average men and women are not equally happy in marriage.
2. Test, at the 1% level of significance, the hypothesis that on average men are happier than women in marriage.

Large Data Set Exercises

1. Large Data Set 5 lists the scores for 25 randomly selected students on practice SAT reading tests before and after taking a two-week SAT preparation course. Denote the population of all students who have taken the course as Population 1 and the population of all students who have not taken the course as Population 2.

http://www.gone.2012books.lardbucket.org/sites/all/files/data5.xls

1. Compute the 25 differences in the order $after− before$, their mean $d-$, and their sample standard deviation sd.
2. Give a point estimate for $μd=μ1−μ2$, the difference in the mean score of all students who have taken the course and the mean score of all who have not.
3. Construct a 98% confidence interval for $μd.$
4. Test, at the 1% level of significance, the hypothesis that the mean SAT score increases by at least ten points by taking the two-week preparation course.
2. Large Data Set 12 lists the scores on one round for 75 randomly selected members at a golf course, first using their own original clubs, then two months later after using new clubs with an experimental design. Denote the population of all golfers using their own original clubs as Population 1 and the population of all golfers using the new style clubs as Population 2.

http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

1. Compute the 75 differences in the order $original clubs− new clubs$, their mean $d-$, and their sample standard deviation sd.
2. Give a point estimate for $μd=μ1−μ2$, the difference in the mean score of all golfers using their original clubs and the mean score of all golfers using the new kind of clubs.
3. Construct a 90% confidence interval for $μd.$
4. Test, at the 1% level of significance, the hypothesis that the mean golf score decreases by at least one stroke by using the new kind of clubs.
3. Consider the previous problem again. Since the data set is so large, it is reasonable to use the standard normal distribution instead of Student’s t-distribution with 74 degrees of freedom.

1. Construct a 90% confidence interval for $μd$ using the standard normal distribution, meaning that the formula is $d-±zα∕2sdn.$ (The computations done in part (a) of the previous problem still apply and need not be redone.) How does the result obtained here compare to the result obtained in part (c) of the previous problem?
2. Test, at the 1% level of significance, the hypothesis that the mean golf score decreases by at least one stroke by using the new kind of clubs, using the standard normal distribution. (All the work done in part (d) of the previous problem applies, except the critical value is now $zα$ instead of $tα$ (or the p-value can be computed exactly instead of only approximated, if you used the p-value approach).) How does the result obtained here compare to the result obtained in part (c) of the previous problem?
3. Construct the 99% confidence intervals for $μd$ using both the $t-$ and $z-$distributions. How much difference is there in the results now?

1. $d-=7.4286$, $sd=0.9759$,
2. $d-=7.4286$,
3. $(6.53,8.33)$,
4. T = 1.162, $df=6$, $t0.10=1.44$, do not reject H0
1. $d-=−14.25$, $sd=1.5$,
2. $d-=−14.25$,
3. $(−18.63,−9.87)$,
4. $T=−3.000$, $df=3$, $±t0.05=±2.353$, reject H0
1. $d-=25.2$, $sd=35.6609$,
2. 25.2,
3. $25.2±34.0$
4. T = 1.580, $df=4$, $t0.10=1.533$, reject H0 (takes less time)
1. 3.2,
2. $3.2±7.5$
3. T = 1.392, $df=9$, $t0.10=2.821$, do not reject H0 (government appraisals not higher)
1. 0.65,
2. $0.65±0.69$,
3. T = 3.014, $df=10$, $t0.01=2.764$, reject H0 (experimental material wears less)
1. $d-=16.68$ and $sd=10.77$
2. $d-=16.68$
3. $(11.31,22.05)$
4. $H0:μ1−μ2=10$ vs. $Ha:μ1−μ2>10.$ Test Statistic: T = 3.1014. $d.f.=24.$ Rejection Region: $[2.492,∞).$ Decision: Reject H0.
1. $(1.6266,2.6401).$ Endpoints change in the third deciaml place.
2. $H0:μ1−μ2=1$ vs. $Ha:μ1−μ2>1.$ Test Statistic: Z = 3.6791. Rejection Region: $[2.33,∞).$ Decision: Reject H0. The decision is the same as in the previous problem.
3. Using the $t−$distribution, $(1.3188,2.9478).$ Using the $z−$distribution, $(1.3401,2.9266).$ There is a difference.