The following table gives examples of the kinds of pairs of variables which could be of interest from a statistical point of view.
x | y |
---|---|
Predictor or independent variable | Response or dependent variable |
Temperature in degrees Celsius | Temperature in degrees Fahrenheit |
Area of a house (sq.ft.) | Value of the house |
Age of a particular make and model car | Resale value of the car |
Amount spent by a business on advertising in a year | Revenue received that year |
Height of a 25-year-old man | Weight of the man |
The first line in the table is different from all the rest because in that case and no other the relationship between the variables is deterministic: once the value of x is known the value of y is completely determined. In fact there is a formula for y in terms of x: $y=\frac{9}{5}x+32.$ Choosing several values for x and computing the corresponding value for y for each one using the formula gives the table
$$\begin{array}{rrrrrr}x& \hfill \text{\u2212}40& \hfill \text{\u2212}15& \hfill 0& \hfill 20& \hfill 50\\ y& \hfill \text{\u2212}40& \hfill 5& \hfill 32& \hfill 68& \hfill 122\end{array}$$We can plot these data by choosing a pair of perpendicular lines in the plane, called the coordinate axes, as shown in Figure 10.1 "Plot of Celsius and Fahrenheit Temperature Pairs". Then to each pair of numbers in the table we associate a unique point in the plane, the point that lies x units to the right of the vertical axis (to the left if $x<0$) and y units above the horizontal axis (below if $y<0$). The relationship between x and y is called a linear relationship because the points so plotted all lie on a single straight line. The number $\frac{9}{5}$ in the equation $y=\frac{9}{5}x+32$ is the slope of the line, and measures its steepness. It describes how y changes in response to a change in x: if x increases by 1 unit then y increases (since $\frac{9}{5}$ is positive) by $\frac{9}{5}$ unit. If the slope had been negative then y would have decreased in response to an increase in x. The number 32 in the formula $y=\frac{9}{5}x+32$ is the y-intercept of the line; it identifies where the line crosses the y-axis. You may recall from an earlier course that every non-vertical line in the plane is described by an equation of the form $y=mx+b$, where m is the slope of the line and b is its y-intercept.
Figure 10.1 Plot of Celsius and Fahrenheit Temperature Pairs
The relationship between x and y in the temperature example is deterministic because once the value of x is known, the value of y is completely determined. In contrast, all the other relationships listed in the table above have an element of randomness in them. Consider the relationship described in the last line of the table, the height x of a man aged 25 and his weight y. If we were to randomly select several 25-year-old men and measure the height and weight of each one, we might obtain a collection of $\left(x,y\right)$ pairs something like this:
$$\begin{array}{cccccc}\left(\mathrm{68,151}\right)& \left(\mathrm{69,146}\right)& \left(\mathrm{70,157}\right)& \left(\mathrm{70,164}\right)& \left(\mathrm{71,171}\right)& \left(\mathrm{72,160}\right)\\ \left(\mathrm{72,163}\right)& \left(\mathrm{72,180}\right)& \left(\mathrm{73,170}\right)& \left(\mathrm{73,175}\right)& \left(\mathrm{74,178}\right)& \left(\mathrm{75,188}\right)\end{array}$$A plot of these data is shown in Figure 10.2 "Plot of Height and Weight Pairs". Such a plot is called a scatter diagram or scatter plot. Looking at the plot it is evident that there exists a linear relationship between height x and weight y, but not a perfect one. The points appear to be following a line, but not exactly. There is an element of randomness present.
Figure 10.2 Plot of Height and Weight Pairs
In this chapter we will analyze situations in which variables x and y exhibit such a linear relationship with randomness. The level of randomness will vary from situation to situation. In the introductory example connecting an electric current and the level of carbon monoxide in air, the relationship is almost perfect. In other situations, such as the height and weights of individuals, the connection between the two variables involves a high degree of randomness. In the next section we will see how to quantify the strength of the linear relationship between two variables.
A line has equation $y=0.5x+2.$
A line has equation $y=x\text{\u2212}0.5.$
A line has equation $y=\text{\u2212}2x+4.$
A line has equation $y=\text{\u2212}1.5x+1.$
Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.
Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.
A data set consists of eight $\left(x,y\right)$ pairs of numbers:
$$\begin{array}{cccc}\left(\mathrm{0,12}\right)& \left(\mathrm{4,16}\right)& \left(\mathrm{8,22}\right)& \left(\mathrm{15,28}\right)\\ \left(\mathrm{2,15}\right)& \left(\mathrm{5,14}\right)& \left(\mathrm{13,24}\right)& \left(\mathrm{20,30}\right)\end{array}$$A data set consists of ten $\left(x,y\right)$ pairs of numbers:
$$\begin{array}{ccccc}\left(\mathrm{3,20}\right)& \left(\mathrm{6,9}\right)& \left(\mathrm{11,0}\right)& \left(\mathrm{14,1}\right)& \left(\mathrm{18,9}\right)\\ \left(\mathrm{5,13}\right)& \left(\mathrm{8,4}\right)& \left(\mathrm{12,0}\right)& \left(\mathrm{17,6}\right)& \left(\mathrm{20,16}\right)\end{array}$$A data set consists of nine $\left(x,y\right)$ pairs of numbers:
$$\begin{array}{ccccc}(\mathrm{8,16})& (\mathrm{10,4})& (\mathrm{12,0})& (\mathrm{14,4})& (\mathrm{16,16})\\ (\mathrm{9,9})& (\mathrm{11,1})& (\mathrm{13,1})& (\mathrm{15,9})& \end{array}$$A data set consists of five $\left(x,y\right)$ pairs of numbers:
$$(\mathrm{0,1})\text{\hspace{1em}}(\mathrm{2,5})\text{\hspace{1em}}(\mathrm{3,7})\text{\hspace{1em}}(\mathrm{5,11})\text{\hspace{1em}}(\mathrm{8,17})$$At 60°F a particular blend of automotive gasoline weights 6.17 lb/gal. The weight y of gasoline on a tank truck that is loaded with x gallons of gasoline is given by the linear equation
$$y=6.17x$$The rate for renting a motor scooter for one day at a beach resort area is $25 plus 30 cents for each mile the scooter is driven. The total cost y in dollars for renting a scooter and driving it x miles is
$$y=0.30x+25$$The pricing schedule for labor on a service call by an elevator repair company is $150 plus $50 per hour on site.
The cost of a telephone call made through a leased line service is 2.5 cents per minute.
Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Plot the scatter diagram with SAT score as the independent variable (x) and GPA as the dependent variable (y). Comment on the appearance and strength of any linear trend.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Plot the scatter diagram with golf score using the original clubs as the independent variable (x) and golf score using the new clubs as the dependent variable (y). Comment on the appearance and strength of any linear trend.
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls
Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Plot the scatter diagram with the number of bidders at the auction as the independent variable (x) and the sales price as the dependent variable (y). Comment on the appearance and strength of any linear trend.
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
There appears to a hint of some positive correlation.
There appears to be clear positive correlation.