Statistical Inferences About ?1

10.5 Statistical Inferences About β₁

Learning Objectives

To learn how to construct a confidence interval for $β_{1}$ , the slope of the population regression line.
To learn how to test hypotheses regarding $β_{1} .$

The parameter $β_{1}$ , the slope of the population regression line, is of primary importance in regression analysis because it gives the true rate of change in the mean $E (y)$ in response to a unit increase in the predictor variable x. For every unit increase in x the mean of the response variable y changes by $β_{1}$ units, increasing if $β_{1} > 0$ and decreasing if $β_{1} < 0 .$ We wish to construct confidence intervals for $β_{1}$ and test hypotheses about it.

Confidence Intervals for β₁

The slope ${\hat{β}}_{1}$ of the least squares regression line is a point estimate of $β_{1} .$ A confidence interval for $β_{1}$ is given by the following formula.

$100 (1 - α) %$ Confidence Interval for the Slope $β_{1}$ of the Population Regression Line

{\hat{β}}_{1} \pm t_{α ∕ 2} \frac{s_{ε}}{\sqrt{S S_{x x}}}

where $s_{ε} = \sqrt{\frac{S S E}{n − 2}}$ and the number of degrees of freedom is $d f = n − 2 .$

The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must hold.

Definition

The statistic $s_{ε}$ is called the sample standard deviation of errorsThe statistic $s_{ε}$ .. It estimates the standard deviation σ of the errors in the population of y-values for each fixed value of x (see Figure 10.5 "The Simple Linear Model Concept" in Section 10.3 "Modelling Linear Relationships with Randomness Present").

Example 6

Construct the 95% confidence interval for the slope $β_{1}$ of the population regression line based on the five-point sample data set

\begin{matrix} x & 2 & 2 & 6 & 8 & 10 \\ y & 0 & 1 & 2 & 3 & 3 \end{matrix}

Solution:

The point estimate ${\hat{β}}_{1}$ of $β_{1}$ was computed in Note 10.18 "Example 2" in Section 10.4 "The Least Squares Regression Line" as ${\hat{β}}_{1} = 0.34375 .$ In the same example $S S_{x x}$ was found to be $S S_{x x} = 51.2 .$ The sum of the squared errors $S S E$ was computed in Note 10.23 "Example 4" in Section 10.4 "The Least Squares Regression Line" as $S S E = 0.75 .$ Thus

s_{ε} = \sqrt{\frac{S S E}{n − 2}} = \sqrt{\frac{0.75}{3}} = 0.50

Confidence level 95% means $α = 1 - 0.95 = 0.05$ so $α ∕ 2 = 0.025 .$ From the row labeled $d f = 3$ in Figure 12.3 "Critical Values of " we obtain $t_{0.025} = 3.182 .$ Therefore

{\hat{β}}_{1} \pm t_{α ∕ 2} \frac{s_{ε}}{\sqrt{S S_{x x}}} = 0.34375 \pm 3.182 (\frac{0.50}{\sqrt{51.2}}) = 0.34375 \pm 0.2223

which gives the interval $(0 . 1215,0 . 5661) .$ We are 95% confident that the slope $β_{1}$ of the population regression line is between 0.1215 and 0.5661.

Example 7

Using the sample data in Table 10.3 "Data on Age and Value of Used Automobiles of a Specific Make and Model" construct a 90% confidence interval for the slope $β_{1}$ of the population regression line relating age and value of the automobiles of Note 10.19 "Example 3" in Section 10.4 "The Least Squares Regression Line". Interpret the result in the context of the problem.

Solution:

The point estimate ${\hat{β}}_{1}$ of $β_{1}$ was computed in Note 10.19 "Example 3", as was $S S_{x x} .$ Their values are ${\hat{β}}_{1} = − 2.05$ and $S S_{x x} = 14 .$ The sum of the squared errors $S S E$ was computed in Note 10.24 "Example 5" in Section 10.4 "The Least Squares Regression Line" as $S S E = 28.946 .$ Thus

s_{ε} = \sqrt{\frac{S S E}{n − 2}} = \sqrt{\frac{28.946}{8}} = 1.902169814

Confidence level 90% means $α = 1 - 0.90 = 0.10$ so $α ∕ 2 = 0.05 .$ From the row labeled $d f = 8$ in Figure 12.3 "Critical Values of " we obtain $t_{0.05} = 1.860 .$ Therefore

{\hat{β}}_{1} \pm t_{α ∕ 2} \frac{s_{ε}}{\sqrt{S S_{x x}}} = − 2.05 \pm 1.860 (\frac{1.902169814}{\sqrt{14}}) = − 2.05 \pm 0.95

which gives the interval $(− 3.00, − 1.10) .$ We are 90% confident that the slope $β_{1}$ of the population regression line is between −3.00 and −1.10. In the context of the problem this means that for vehicles of this make and model between two and six years old we are 90% confident that for each additional year of age the average value of such a vehicle decreases by between $1,100 and $3,000.

Testing Hypotheses About β₁

Hypotheses regarding $β_{1}$ can be tested using the same five-step procedures, either the critical value approach or the p-value approach, that were introduced in Section 8.1 "The Elements of Hypothesis Testing" and Section 8.3 "The Observed Significance of a Test" of Chapter 8 "Testing Hypotheses". The null hypothesis always has the form $H_{0} : β_{1} = B_{0}$ where B₀ is a number determined from the statement of the problem. The three forms of the alternative hypothesis, with the terminology for each case, are:

Form of H_a	Terminology
$H_{a} : β_{1} < B_{0}$	Left-tailed
$H_{a} : β_{1} > B_{0}$	Right-tailed
$H_{a} : β_{1} \neq B_{0}$	Two-tailed

The value zero for B₀ is of particular importance since in that case the null hypothesis is $H_{0} : β_{1} = 0$ , which corresponds to the situation in which x is not useful for predicting y. For if $β_{1} = 0$ then the population regression line is horizontal, so the mean $E (y)$ is the same for every value of x and we are just as well off in ignoring x completely and approximating y by its average value. Given two variables x and y, the burden of proof is that x is useful for predicting y, not that it is not. Thus the phrase “test whether x is useful for prediction of y,” or words to that effect, means to perform the test

H_{0} : β_{1} = 0 vs. H_{a} : β_{1} \neq 0

Standardized Test Statistic for Hypothesis Tests Concerning the Slope $β_{1}$ of the Population Regression Line

T = \frac{{\hat{β}}_{1} - B_{0}}{s_{ε} ∕ \sqrt{S S_{x x}}}

The test statistic has Student’s t-distribution with $d f = n − 2$ degrees of freedom.

The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must hold.

Example 8

Test, at the 2% level of significance, whether the variable x is useful for predicting y based on the information in the five-point data set

\begin{matrix} x & 2 & 2 & 6 & 8 & 10 \\ y & 0 & 1 & 2 & 3 & 3 \end{matrix}

Solution:

We will perform the test using the critical value approach.

Step 1. Since x is useful for prediction of y precisely when the slope $β_{1}$ of the population regression line is nonzero, the relevant test is
$\begin{array}{l} H_{0} : β_{1} = 0 \\ vs . H_{a} : β_{1} \neq 0 @ α = 0.02 \end{array}$
Step 2. The test statistic is
$T = \frac{{\hat{β}}_{1}}{s_{ε} ∕ \sqrt{S S_{x x}}}$
and has Student’s t-distribution with $n − 2 = 5 - 2 = 3$ degrees of freedom.
Step 3. From Note 10.18 "Example 2", ${\hat{β}}_{1} = 0.34375$ and $S S_{x x} = 51.2 .$ From Note 10.30 "Example 6", $s_{ε} = 0.50 .$ The value of the test statistic is therefore
$T = \frac{{\hat{β}}_{1} - B_{0}}{s_{ε} ∕ \sqrt{S S_{x x}}} = \frac{0.34375}{0.50 ∕ \sqrt{51.2}} = 4.919$
Step 4. Since the symbol in H_a is “≠” this is a two-tailed test, so there are two critical values $\pm t_{α ∕ 2} = \pm t_{0.01} .$ Reading from the line in Figure 12.3 "Critical Values of " labeled $d f = 3$ , $t_{0.01} = 4.541 .$ The rejection region is $(− \infty, − 4.541] \cup [4.541, \infty) .$
Step 5. As shown in Figure 10.9 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H₀. In the context of the problem our conclusion is:

The data provide sufficient evidence, at the 2% level of significance, to conclude that the slope of the population regression line is nonzero, so that x is useful as a predictor of y.

Figure 10.9 Rejection Region and Test Statistic for Note 10.33 "Example 8"

Example 9

A car salesman claims that automobiles between two and six years old of the make and model discussed in Note 10.19 "Example 3" in Section 10.4 "The Least Squares Regression Line" lose more than $1,100 in value each year. Test this claim at the 5% level of significance.

Solution:

We will perform the test using the critical value approach.

Step 1. In terms of the variables x and y, the salesman’s claim is that if x is increased by 1 unit (one additional year in age), then y decreases by more than 1.1 units (more than $1,100). Thus his assertion is that the slope of the population regression line is negative, and that it is more negative than −1.1. In symbols, $β_{1} < − 1.1 .$ Since it contains an inequality, this has to be the alternative hypotheses. The null hypothesis has to be an equality and have the same number on the right hand side, so the relevant test is
$\begin{array}{l} H_{0} : β_{1} = − 1.1 \\ vs . H_{α} : β_{1} < − 1.1 @ α = 0.05 \end{array}$
Step 2. The test statistic is
$T = \frac{{\hat{β}}_{1} - B_{0}}{s_{ε} ∕ \sqrt{S S_{x x}}}$
and has Student’s t-distribution with 8 degrees of freedom.
Step 3. From Note 10.19 "Example 3", ${\hat{β}}_{1} = − 2.05$ and $S S_{x x} = 14 .$ From Note 10.31 "Example 7", $s_{ε} = 1.902169814 .$ The value of the test statistic is therefore
$T = \frac{{\hat{β}}_{1} - B_{0}}{s_{ε} ∕ \sqrt{S S_{x x}}} = \frac{− 2.05 - (− 1.1)}{1.902169814 ∕ \sqrt{14}} = − 1.869$
Step 4. Since the symbol in H_a is “<” this is a left-tailed test, so there is a single critical value $− t_{α} = − t_{0.05} .$ Reading from the line in Figure 12.3 "Critical Values of " labeled $d f = 8$ , $t_{0.05} = 1.860 .$ The rejection region is $(− \infty, − 1.860] .$
Step 5. As shown in Figure 10.10 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H₀. In the context of the problem our conclusion is:

The data provide sufficient evidence, at the 5% level of significance, to conclude that vehicles of this make and model and in this age range lose more than $1,100 per year in value, on average.

Figure 10.10 Rejection Region and Test Statistic for Note 10.34 "Example 9"

Key Takeaways

The parameter $β_{1}$ , the slope of the population regression line, is of primary interest because it describes the average change in y with respect to unit increase in x.
The statistic ${\hat{β}}_{1}$ , the slope of the least squares regression line, is a point estimate of $β_{1} .$ Confidence intervals for $β_{1}$ can be computed using a formula.
Hypotheses regarding $β_{1}$ are tested using the same five-step procedures introduced in Chapter 8 "Testing Hypotheses".

Exercises

Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient" and Section 10.4 "The Least Squares Regression Line".

Construct the 95% confidence interval for the slope $β_{1}$ of the population regression line based on the sample data set of Exercise 1 of Section 10.2 "The Linear Correlation Coefficient".
Construct the 90% confidence interval for the slope $β_{1}$ of the population regression line based on the sample data set of Exercise 2 of Section 10.2 "The Linear Correlation Coefficient".
Construct the 90% confidence interval for the slope $β_{1}$ of the population regression line based on the sample data set of Exercise 3 of Section 10.2 "The Linear Correlation Coefficient".
Construct the 99% confidence interval for the slope $β_{1}$ of the population regression Exercise 4 of Section 10.2 "The Linear Correlation Coefficient".
For the data in Exercise 5 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether x is useful for predicting y (that is, whether $β_{1} \neq 0$ ).
For the data in Exercise 6 of Section 10.2 "The Linear Correlation Coefficient" test, at the 5% level of significance, whether x is useful for predicting y (that is, whether $β_{1} \neq 0$ ).
Construct the 90% confidence interval for the slope $β_{1}$ of the population regression line based on the sample data set of Exercise 7 of Section 10.2 "The Linear Correlation Coefficient".
Construct the 95% confidence interval for the slope $β_{1}$ of the population regression line based on the sample data set of Exercise 8 of Section 10.2 "The Linear Correlation Coefficient".
For the data in Exercise 9 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether x is useful for predicting y (that is, whether $β_{1} \neq 0$ ).
For the data in Exercise 10 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether x is useful for predicting y (that is, whether $β_{1} \neq 0$ ).

Applications

For the data in Exercise 11 of Section 10.2 "The Linear Correlation Coefficient" construct a 90% confidence interval for the mean number of new words acquired per month by children between 13 and 18 months of age.
For the data in Exercise 12 of Section 10.2 "The Linear Correlation Coefficient" construct a 90% confidence interval for the mean increased braking distance for each additional 100 pounds of vehicle weight.
For the data in Exercise 13 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether age is useful for predicting resting heart rate.
For the data in Exercise 14 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether wind speed is useful for predicting wave height.
For the situation described in Exercise 15 of Section 10.2 "The Linear Correlation Coefficient"
1. Construct the 95% confidence interval for the mean increase in revenue per additional thousand dollars spent on advertising.
2. An advertising agency tells the business owner that for every additional thousand dollars spent on advertising, revenue will increase by over $25,000. Test this claim (which is the alternative hypothesis) at the 5% level of significance.
3. Perform the test of part (b) at the 10% level of significance.
4. Based on the results in (b) and (c), how believable is the ad agency’s claim? (This is a subjective judgement.)
For the situation described in Exercise 16 of Section 10.2 "The Linear Correlation Coefficient"
1. Construct the 90% confidence interval for the mean increase in height per additional inch of length at age two.
2. It is claimed that for girls each additional inch of length at age two means more than an additional inch of height at maturity. Test this claim (which is the alternative hypothesis) at the 10% level of significance.
For the data in Exercise 17 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether course average before the final exam is useful for predicting the final exam grade.
For the situation described in Exercise 18 of Section 10.2 "The Linear Correlation Coefficient", an agronomist claims that each additional million acres planted results in more than 750,000 additional acres harvested. Test this claim at the 1% level of significance.
For the data in Exercise 19 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1/10th of 1% level of significance, whether, ignoring all other facts such as age and body mass, the amount of the medication consumed is a useful predictor of blood concentration of the active ingredient.
For the data in Exercise 20 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether for each additional inch of girth the age of the tree increases by at least two and one-half years.
For the data in Exercise 21 of Section 10.2 "The Linear Correlation Coefficient"
1. Construct the 95% confidence interval for the mean increase in strength at 28 days for each additional hundred psi increase in strength at 3 days.
2. Test, at the 1/10th of 1% level of significance, whether the 3-day strength is useful for predicting 28-day strength.
For the situation described in Exercise 22 of Section 10.2 "The Linear Correlation Coefficient"
1. Construct the 99% confidence interval for the mean decrease in energy demand for each one-degree drop in temperature.
2. An engineer with the power company believes that for each one-degree increase in temperature, daily energy demand will decrease by more than 3.6 million watt-hours. Test this claim at the 1% level of significance.

Large Data Set Exercises

Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.

http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
1. Compute the 90% confidence interval for the slope $β_{1}$ of the population regression line with SAT score as the independent variable (x) and GPA as the dependent variable (y).
2. Test, at the 10% level of significance, the hypothesis that the slope of the population regression line is greater than 0.001, against the null hypothesis that it is exactly 0.001.
Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).

http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls
1. Compute the 95% confidence interval for the slope $β_{1}$ of the population regression line with scores using the original clubs as the independent variable (x) and scores using the new clubs as the dependent variable (y).
2. Test, at the 10% level of significance, the hypothesis that the slope of the population regression line is different from 1, against the null hypothesis that it is exactly 1.
Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions.

http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
1. Compute the 95% confidence interval for the slope $β_{1}$ of the population regression line with the number of bidders present at the auction as the independent variable (x) and sales price as the dependent variable (y).
2. Test, at the 10% level of significance, the hypothesis that the average sales price increases by more than $90 for each additional bidder at an auction, against the default that it increases by exactly $90.

Answers

$0.743 \pm 0.578$
$− 0.610 \pm 0.633$
$T = 1.732$ , $\pm t_{0.05} = \pm 2.353$ , do not reject H₀
$0.6 \pm 0.451$
$T = − 4.481$ , $\pm t_{0.005} = \pm 3.355$ , reject H₀

$4.8 \pm 1.7$ words
$T = 2.843$ , $\pm t_{0.05} = \pm 1.860$ , reject H₀
1. $42.024 \pm 28.011$ thousand dollars,
2. $T = 1.487$ , $t_{0.05} = 1.943$ , do not reject H₀;
3. $t_{0.10} = 1.440$ , reject H₀
$T = 4.096$ , $\pm t_{0.05} = \pm 1.771$ , reject H₀
$T = 25.524$ , $\pm t_{0.0005} = \pm 3.505$ , reject H₀
1. $2.550 \pm 0.127$ hundred psi,
2. $T = 41.072$ , $\pm t_{0.005} = \pm 3.674$ , reject H₀

1. $(0 . 0014,0 . 0018)$
2. $H_{0} : β_{1} = 0.001$ vs. $H_{a} : β_{1} > 0.001 .$ Test Statistic: $Z = 6.1625 .$ Rejection Region: $[1.28, + \infty) .$ Decision: Reject H₀.
1. $(101 . 789,131 . 4435)$
2. $H_{0} : β_{1} = 90$ vs. $H_{a} : β_{1} > 90 .$ Test Statistic: $T = 3.5938 .$ $d . f . = 58 .$ Rejection Region: $[1.296, + \infty) .$ Decision: Reject H₀.

10.5 Statistical Inferences About β1

Learning Objectives

Confidence Intervals for β1

100(1−α)% Confidence Interval for the Slope β1 of the Population Regression Line

Definition

Example 6

Example 7

Testing Hypotheses About β1

Standardized Test Statistic for Hypothesis Tests Concerning the Slope β1 of the Population Regression Line

Example 8

Example 9

Key Takeaways

Exercises

Basic

Applications

Large Data Set Exercises

Answers

10.5 Statistical Inferences About β₁

Confidence Intervals for β₁

$100 (1 - α) %$ Confidence Interval for the Slope $β_{1}$ of the Population Regression Line

Testing Hypotheses About β₁

Standardized Test Statistic for Hypothesis Tests Concerning the Slope $β_{1}$ of the Population Regression Line