10.1 Linear Relationships Between Variables

Learning Objective

  1. To learn what it means for two variables to exhibit a relationship that is close to linear but which contains an element of randomness.

The following table gives examples of the kinds of pairs of variables which could be of interest from a statistical point of view.

x y
Predictor or independent variable Response or dependent variable
Temperature in degrees Celsius Temperature in degrees Fahrenheit
Area of a house (sq.ft.) Value of the house
Age of a particular make and model car Resale value of the car
Amount spent by a business on advertising in a year Revenue received that year
Height of a 25-year-old man Weight of the man

The first line in the table is different from all the rest because in that case and no other the relationship between the variables is deterministic: once the value of x is known the value of y is completely determined. In fact there is a formula for y in terms of x: y=95x+32. Choosing several values for x and computing the corresponding value for y for each one using the formula gives the table

x401502050y4053268122

We can plot these data by choosing a pair of perpendicular lines in the plane, called the coordinate axes, as shown in Figure 10.1 "Plot of Celsius and Fahrenheit Temperature Pairs". Then to each pair of numbers in the table we associate a unique point in the plane, the point that lies x units to the right of the vertical axis (to the left if x<0) and y units above the horizontal axis (below if y<0). The relationship between x and y is called a linear relationship because the points so plotted all lie on a single straight line. The number 95 in the equation y=95x+32 is the slope of the line, and measures its steepness. It describes how y changes in response to a change in x: if x increases by 1 unit then y increases (since 95 is positive) by 95 unit. If the slope had been negative then y would have decreased in response to an increase in x. The number 32 in the formula y=95x+32 is the y-intercept of the line; it identifies where the line crosses the y-axis. You may recall from an earlier course that every non-vertical line in the plane is described by an equation of the form y=mx+b, where m is the slope of the line and b is its y-intercept.

Figure 10.1 Plot of Celsius and Fahrenheit Temperature Pairs

The relationship between x and y in the temperature example is deterministic because once the value of x is known, the value of y is completely determined. In contrast, all the other relationships listed in the table above have an element of randomness in them. Consider the relationship described in the last line of the table, the height x of a man aged 25 and his weight y. If we were to randomly select several 25-year-old men and measure the height and weight of each one, we might obtain a collection of (x,y) pairs something like this:

(68,151)(69,146)(70,157)(70,164)(71,171)(72,160)(72,163)(72,180)(73,170)(73,175)(74,178)(75,188)

A plot of these data is shown in Figure 10.2 "Plot of Height and Weight Pairs". Such a plot is called a scatter diagram or scatter plot. Looking at the plot it is evident that there exists a linear relationship between height x and weight y, but not a perfect one. The points appear to be following a line, but not exactly. There is an element of randomness present.

Figure 10.2 Plot of Height and Weight Pairs

In this chapter we will analyze situations in which variables x and y exhibit such a linear relationship with randomness. The level of randomness will vary from situation to situation. In the introductory example connecting an electric current and the level of carbon monoxide in air, the relationship is almost perfect. In other situations, such as the height and weights of individuals, the connection between the two variables involves a high degree of randomness. In the next section we will see how to quantify the strength of the linear relationship between two variables.

Key Takeaways

  • Two variables x and y have a deterministic linear relationship if points plotted from (x,y) pairs lie exactly along a single straight line.
  • In practice it is common for two variables to exhibit a relationship that is close to linear but which contains an element, possibly large, of randomness.

Exercises

    Basic

  1. A line has equation y=0.5x+2.

    1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
    2. Give the value of the slope of the line; give the value of the y-intercept.
  2. A line has equation y=x0.5.

    1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
    2. Give the value of the slope of the line; give the value of the y-intercept.
  3. A line has equation y=2x+4.

    1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
    2. Give the value of the slope of the line; give the value of the y-intercept.
  4. A line has equation y=1.5x+1.

    1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
    2. Give the value of the slope of the line; give the value of the y-intercept.
  5. Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.

    1. The slope is positive.
    2. The y-intercept is positive.
    3. The slope is zero.
  6. Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.

    1. The y-intercept is negative.
    2. The y-intercept is zero.
    3. The slope is negative.
  7. A data set consists of eight (x,y) pairs of numbers:

    (0,12)(4,16)(8,22)(15,28)(2,15)(5,14)(13,24)(20,30)
    1. Plot the data in a scatter diagram.
    2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
    3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
  8. A data set consists of ten (x,y) pairs of numbers:

    (3,20)(6,9)(11,0)(14,1)(18,9)(5,13)(8,4)(12,0)(17,6)(20,16)
    1. Plot the data in a scatter diagram.
    2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
    3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
  9. A data set consists of nine (x,y) pairs of numbers:

    (8,16)(10,4)(12,0)(14,4)(16,16)(9,9)(11,1)(13,1)(15,9)
    1. Plot the data in a scatter diagram.
    2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
    3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
  10. A data set consists of five (x,y) pairs of numbers:

    (0,1)(2,5)(3,7)(5,11)(8,17)
    1. Plot the data in a scatter diagram.
    2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
    3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.

    Applications

  1. At 60°F a particular blend of automotive gasoline weights 6.17 lb/gal. The weight y of gasoline on a tank truck that is loaded with x gallons of gasoline is given by the linear equation

    y=6.17x
    1. Explain whether the relationship between the weight y and the amount x of gasoline is deterministic or contains an element of randomness.
    2. Predict the weight of gasoline on a tank truck that has just been loaded with 6,750 gallons of gasoline.
  2. The rate for renting a motor scooter for one day at a beach resort area is $25 plus 30 cents for each mile the scooter is driven. The total cost y in dollars for renting a scooter and driving it x miles is

    y=0.30x+25
    1. Explain whether the relationship between the cost y of renting the scooter for a day and the distance x that the scooter is driven that day is deterministic or contains an element of randomness.
    2. A person intends to rent a scooter one day for a trip to an attraction 17 miles away. Assuming that the total distance the scooter is driven is 34 miles, predict the cost of the rental.
  3. The pricing schedule for labor on a service call by an elevator repair company is $150 plus $50 per hour on site.

    1. Write down the linear equation that relates the labor cost y to the number of hours x that the repairman is on site.
    2. Calculate the labor cost for a service call that lasts 2.5 hours.
  4. The cost of a telephone call made through a leased line service is 2.5 cents per minute.

    1. Write down the linear equation that relates the cost y (in cents) of a call to its length x.
    2. Calculate the cost of a call that lasts 23 minutes.

    Large Data Set Exercises

  1. Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Plot the scatter diagram with SAT score as the independent variable (x) and GPA as the dependent variable (y). Comment on the appearance and strength of any linear trend.

    http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls

  2. Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Plot the scatter diagram with golf score using the original clubs as the independent variable (x) and golf score using the new clubs as the dependent variable (y). Comment on the appearance and strength of any linear trend.

    http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

  3. Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Plot the scatter diagram with the number of bidders at the auction as the independent variable (x) and the sales price as the dependent variable (y). Comment on the appearance and strength of any linear trend.

    http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls

Answers

    1. Answers vary.
    2. Slope m=0.5; y-intercept b=2.
    1. Answers vary.
    2. Slope m=2; y-intercept b=4.
    1. y increases.
    2. Impossible to tell.
    3. y does not change.
    1. Scatter diagram needed.
    2. Involves randomness.
    3. Linear.
    1. Scatter diagram needed.
    2. Deterministic.
    3. Not linear.
    1. Deterministic.
    2. 41,647.5 pounds.
    1. y=50x+150.
    2. b. $275.
  1. There appears to a hint of some positive correlation.

  2. There appears to be clear positive correlation.