Inference for Regression - Review the Knowledge You Need to Score High

5 Steps to a 5 AP Statistics 2017 (2016)

STEP 4

Review the Knowledge You Need to Score High

CHAPTER 13 Inference for Regression

IN THIS CHAPTER

Summary: In the last two chapters, we”ve considered inference for population means and proportions and for the difference between two population means or two population proportions. In this chapter, we extend the study of linear regression begun in Chapter 7 to include inference for the slope of a regression line, including both confidence intervals and significance testing. Finally, we will look at the use of technology when doing inference for regression.

Key Ideas

Simple Linear Regression (Review)

Significance Test for the Slope of a Regression Line

Confidence Interval for the Slope of a Regression Line

Inference for Regression Using Technology

Simple Linear Regression

When we studied data analysis earlier in this text, we distinguished between statistics and parameters . Statistics are measurements or values that describe samples, and parameters are measurements that describe populations. We have also seen that statistics can be used to estimate parameters. Thus, we have used to estimate the population mean μ, s to estimate the population standard deviation σ, etc. In Chapter 7 , we introduced the least-squares regression line ( = a + bx ), which was based on a set of ordered pairs. is actually a statistic because it is based on sample data. In this chapter, we study the parameter, μ _y, that is estimated by .

Before we look at the model for linear regression, let”s consider an example to remind us of what we did in Chapter 7 :

example: The following data are pulse rates and heights for a group of 10 female statistics students (The scatterplot of the data and a residual plot indicate that a linear model is appropriate):

What is the least-squares regression line for predicting pulse rate from height?
What is the correlation coefficient between height and pulse rate? Interpret the correlation coefficient in the context of the problem.
What is the predicted pulse rate of a 67″ tall student?
Interpret the slope of the regression line in the context of the problem.

solution:

= 47.17 + 0.302 (Height ). (Done on the TI-83/84 with Height in L1 and Pulse in L2 , the LSRL can be found STAT CALC LinReg(a +bx) L1,L2,Y1 .)
r= 0.21. There is a weak, positive, linear relationship between Height and Pulse rate.
= 47.17 + 0.302(67) = 67.4. (On the Ti-83/84: Y1(67) = 67.42 . Remember that you can paste Y1 to the home screen by entering VARS Y-VARS Function Y1 .)
For each increase in height of one inch, the pulse rate is predicted to increase by 0.302 beats per minute (or: the pulse rate will increase, on average, by 0.302 beats per minute).

When doing inference for regression, we use = a + bx to estimate the true population regression line. Similar to what we have done with other statistics used for inference, we use a and b as estimators of population parameters α and β, the intercept and slope of the population regression, respectively. The conditions necessary for doing inference for regression are:

For each given value ofx , the values of the response variable y- values are independent and normally distributed.
For each given value ofx , the standard deviation, σ, of y- values is the same.
The mean response of they -values for the fixed values of x are linearly related by the equation μ _y= α + βx .

example: Consider a situation in which we are interested in how well a person scores on an agility test after a fixed number of 3-oz. glasses of wine. Let x be the number of glasses consumed. Let x take on the values 1, 2, 3, 4, 5, and 6. Let y be the score on the agility test (scale: 1–100). Then for any given value x _i, there will be a distribution of y -values with mean μ _yi· The conditions for inference for regression are that (i) each of these distributions of y -values is normally distributed, (ii) each of these distributions of y -values has the same standard deviation σ, and (iii) each of the μ _yilies on a line.

Remember that a residual was the error involved when making a prediction from a regression equation (residual = actual value of y – predicted value of y = y_i – _i). Not surprisingly, the standard error of the predictions is a function of the squared residuals:

s is an estimator of σ, the standard deviation of the residuals. Thus, there are actually three parameters to worry about in regression: α, β , and σ , which are estimated by a ,b , ands , respectively.

The final statistic we need to do inference for regression is the standard error of the slope of the regression line given by the following equation. You will not need to use this formula on the exam:

In summary, inference for regression depends upon estimating μ _y= α + β _xwith = a + bx . For each x , the response values of y are independent and follow a normal distribution, each distribution having the same standard deviation. Inference for regression depends on the following statistics:

a, the estimate of the y intercept, α , of μ _y
b, the estimate of the slope, β , of μ _y
s, the standard error of the residuals
s_b, the standard error of the slope of the regression line

In the section that follows, we explore inference for the slope of a regression line in terms of a significance test and a confidence interval for the slope.

Inference for the Slope of a Regression Line

Inference for regression consists of either a significance test or a confidence interval for the slope of a regression line. The null hypothesis in a significance test is usually H ₀ : β = 0, although it is possible to test H ₀ : β = β ₀ . Our interest is the extent to which a least-squares regression line is a good model for the data. That is, the significance test is a test of a linear model for the data.

We note that in theory we could test whether the slope of the regression line is equal to any specific value. However, the usual test is whether the slope of the regression line is zero or not. If the slope of the line is zero, then there is no linear relationship between the x and y variables (remember: ; if r = O, then b = 0).

The alternative hypothesis is often two sided (i.e., H _A: β ≠ 0). We could do a one-sided test if we believed that the data were positively or negatively related.

Significance Test for the Slope of a Regression Line

The basic details of a significance test for the slope of a regression line are given in the following table:

example: The data in the following table give the top 15 states in terms of per pupil expenditure in 1985 and the average teacher salary in the state for that year.

Test the hypothesis, at the 0.01 level of significance, that there is no straight-line relationship between per pupil expenditure and teacher salary. Assume that the conditions necessary for inference for linear regression are present.

solution:

I . Let β = the true slope of the regression line for predicting salary from per pupil expenditure.

II . We will use the t -test for the slope of the regression line. The problem states that the conditions necessary for linear regression are present.

III . The regression equation is
= 12027 + 3.34 PPE
(s = 2281, s _b= 0.5536)

(To do this significance test for the slope of a regression line on the TI-83/84, first enter Per Pupil Expenditure (the explanatory variable) in L1 and Salary (the response variable) in L2 . Then go to STAT TESTS LinRegTTest and enter the information requested. The calculator will return the values of t, p (the P -value), df, a, b, s, r ₂ , and r . Minitab, and some other computer software packages, will not give the the value of r — you”ll have to take the appropriate square root of r ₂ — but will give you the value of s _b. If you need s _bfor some reason—such as constructing a confidence interval for the slope of the regression line—and only have access to a calculator, you can find it by noting that, since , then . Note that Minitab reports the P -value as 0.0000.)

IV . Because P < α, we reject H ₀ . We have evidence that the true slope of the regression line is not zero. We have evidence that there is a linear relationship between amount of per pupil expenditure and teacher salary.

A significance test that the slope of a regression line equals zero is closely related to a test that there is no correlation between the variables. That is, if ρ is the population correlation coefficient, then the test statistic for H ₀ : β = 0 is equal to the test statistic for H ₀ : ρ = 0. You aren”t required to know it for the AP exam, but the t -test statistic for H ₀ : ρ = 0, where r is the sample correlation coefficient, is

Because this and the test for a nonzero slope are equivalent, it should come as no surprise that

Confidence Interval for the Slope of a Regression Line

In addition to doing hypothesis tests on H ₀ : β = β ₀ , we can construct a confidence interval for the true slope of a regression line. The details follow:

example: Consider once again the earlier example on predicting teacher salary from per pupil expenditure. Construct a 95% confidence interval for the slope of the population regression line.

solution: When we were doing a test of H ₀: β = 0 for that problem, we found that = 12027 + 3.34 PPE . The slope of the regression line for the 15 points, and hence our estimate of β , is b = 3.34. We also had t = 6.04.

Our confidence interval is of the form b ± t ^* s _b. We need to find t ^* and s _b. For C = 0.95, df = 15 – 2 = 13, we have t ^* = 2.160 (from Table B; if you have a TI-84 with the invT function, use invT(0.975,13) ). Now, as mentioned earlier, .

Hence, b ± t ^* s _b= 3.34 ± 2.160(0.5530) = (2.15, 4.53). We are 95% confident that the true slope of the regression line is between 2.15 and 4.53. Note that, since 0 is not in this interval, this finding is consistent with our earlier rejection of the hypothesis that the slope equals 0. This is another way of saying that we have statistically significant evidence of a predictive linear relationship between PPE and Salary .

Inference for Regression Using Technology

If you had to do them from the raw data, the computations involved in doing inference for the slope of a regression line would be daunting.

For example, how would you like to compute by hand?

Fortunately, you probably will never have to do this by hand, but instead can rely on computer output you are given, or you will be able to use your calculator to do the computations.

Consider the following data that were gathered by counting the number of cricket chirps in 15 seconds and noting the temperature.

We want to use technology to test the hypothesis that the slope of the regression line is 0 and to construct a confidence interval for the true slope of the regression line.

First let us look at the Minitab regression output for these data.

You should be able to read most of this table, but you are not responsible for all of it. You see the following table entries:

The regression equation, = 44.0 + 0.993 Number, is the least squares regression line (LSRL) for predicting temperature from the number of cricket chirps.
Under “Predictor” are they- intercept and explanatory variable of the regression equation, called “Constant” and “Number” in this example.
Under “Coef” are the values of the “Constant” (which equals they -intercept, the a in = a + bx ; here, a = 44.013) and the slope of the regression line (which is the coefficient of “Number” in this example, the b in = a + bx ; here, b = 0.99340).
For the purposes of this book, we are not concerned with the “Stdev,” “t-ratio,” or “P ” for “Constant” therefore only the “44.013” is meaningful for us.
“Stdev” of “Number” is the standard error of the slope (what we have calleds _b, the variability of the estimates of the slope of the regression line, which equals here “t -ratio” is the value of the t- test statistic df = n – 2; here and P is the P- value associated with the test statistic assuming a two-sided test (here, P = 0.000; if you were doing a one -sided test, you would need to divide the given P- value by 2).
sis the standard error of the residuals, which is the variability of the vertical distances of the y -values from the regression line; (here, s = 1.538).
“R-sq” is the coefficient of determination (or,r ² ; here R-sq = 95.9% ⇒ 95.9% of the variation in temperature that is explained by the regression on the number of chirps in 15 seconds; note that, here, —it”s positive since b = 0.9934 is positive). You don”t need to worry about “R-sq(adj).”

Thankfully, all of the mechanics needed to do a t -test for the slope of a regression line are contained in this printout. You need only to quote the appropriate values in your write-up. Thus, for the problem given above, we see that t = 15.23 ⇒ P -value = 0.000.

Exam Tip: You may be given a problem that has both the raw data and the computer printout based on the data. If so, there is no advantage to doing the computations all over again because they have already been done for you.

A confidence interval for the slope of a regression line follows the same pattern as all confidence intervals (estimate ± (critical value) × (standard error)): b ± t *_sb, based on n – 2 degrees of freedom. A 99% confidence interval for the slope in this situation (df = 10 ⇒ t * = 3.169 from Table B) is 0.9934 ± 3.169(0.06523) = (0.787, 1.200).

If you have to do a confidence interval using the calculator and do not have a TI-84 with the LinRegTInt function, you first need to determine s _b. Because you know that , it follows that , which agrees with the standard error of the slope (“St Dev” of “Number”) given in the computer printout.

A 95% confidence interval for the slope of the regression line for predicting temperature from the number of chirps per minute is then given by 0.9934 ± 2.228(0.0652) = (0.848, 1.139). t * = 2.228 is based on C = 0.95 and df = 12 – 2 = 10. Using LinRegTInt , if you have it, results in the following (note that the “s” given in the printout is the standard error of the residuals, not the standard error of the slope).

Rapid Review

The regression equation for predicting grade point average from number of hours studied is determined to be = 1.95 + 0.05(Hours ). Interpret the slope of the regression line.

Answer: For each additional hour studied, the GPA is predicted to increase by 0.05 points.

Which of the following is not a necessary condition for doing inference for the slope of a regression line? a For each given value of the independent variable, the response variable is normally distributed.
The values of the predictor and response variables are independent.
For each given value of the independent variable, the distribution of the response variable has the same standard deviation.
The mean response values lie on a line.

Answer: (b) is not a condition for doing inference for the slope of a regression line. In fact, we are trying to find out the degree to which they are not independent.

True–False: Significance tests for the slope of a regression line are always based on the hypothesisH ₀ : β = 0 versus the alternative H _A: β ≠ 0.

Answer: False. While the stated null and alternative may be the usual hypotheses in a test about the slope of the regression line, it is possible to test that the slope has some particular nonzero value or that the alternative can be one sided (H _A : B > 0 or H _A : β < 0). Note that most computer programs will test only the two-sided alternative by default. The TI-83/84 will test either a one- or two-sided alternative.

Consider the following Minitab printout:
What is the slope of the regression line?
What is the standard error of the residuals?
What is the standard error of the slope?
Do the data indicate a predictive linear relationship betweenx and y ?

Answer:

0.634
9.282
0.07039
Yes, thet -test statistic = 9.00 ⇒ P -value = .000. That is, the probability is close to zero of getting a slope of 0.634 if, in fact, the true slope was zero.
At -test for the slope of a regression line is to be conducted at the 0.02 level of significance based on 18 data values. As usual, the test is two sided. What is the upper critical value for this test (that is, find the minimum positive value of t * for which a finding would be considered significant)?

Answer: There are 18 – 2 = 16 degrees of freedom. Since the alternative is two sided, the rejection region has 0.01 in each tail. Using Table B, we find the value at the intersection of the df = 16 row and the 0.01 column: t * = 2.583. If you have a TI-84 with the invT function, invT(0.99,16)=2.583 . This is, of course, the same value of t * you would use to construct a 98% confidence interval for the slope of the regression line.

In the printout from question #4, we were given the regression equation = 282 + 0.634x . The t -test for H ₀ : β = 0 yielded a P -value of 0.000. What is the conclusion you would arrive at based on these data?

Answer: Because P is very small, we would reject the null hypothesis that the slope of the regression line is 0. We have strong evidence of a predictive linear relationship between x and y .

Suppose the computer output for regression reportsP = 0.036. What is the P -value for H _A: β > 0 (assuming the test was in the correct direction for the data)?

Answer: 0.018. Computer output for regression assumes the alternative is two sided (H _A: β ≠ 0). Hence the P -value reported assumes the finding could have been in either tail of the t -distribution. The correct P -value for the one-sided test is one-half of this value.

Practice Problems

Multiple-Choice

Which of the following statements is (are) true?
In the computer output for regression,s is the estimator of σ , the standard deviation of the residuals.
Thet -test statistic for the H ₀ : β = 0 has the same value as the t -test statistic for H ₀ : ρ = 0.

III. The t -test for the slope of a regression line is always two sided (H _A : β ≠ 0).

I only
II only
III only
I and II only
I and III only

Use the following output in answering questions 2–4:

A study attempted to establish a linear relationship between IQ score and musical aptitude. The following table is a partial printout of the regression analysis and is based on a sample of 20 individuals.

The value of the t -test statistic for H ₀ : β = 0 is
4.05
–1.72
0.4925
6.143
0.0802
A 99% confidence interval for the slope of the regression line is
0.4925 ± 2.878(6.143)
0.4925 ± 2.861(0.1215)
0.4925 ± 2.861(6.143)
0.4925 ± 2.845(0.1215)
0.4925 ± 2.878(0.1215)
Which of the following best interprets the slope of the regression line?
A student with an IQ one point above another student has a Musical Aptitude score 0.4925 points higher.
As IQ score increases, so does the Musical Aptitude score.
A student with an IQ one point above another student is predicted to have a Musical Aptitude score 0.4925 points higher.
For each additional point of Musical Aptitude, IQ is predicted to increase by 0.4925 points.
There is a strong predictive linear relationship between IQ score and Musical Aptitude.
A group of 12 students take both the SAT Math and the SAT Verbal. The least-squares regression line for predicting Verbal Score from Math Score is determined to be = 106.56 + 0.74(Math Score ). Further, s _b= 0.11. Determine a 95% confidence interval for the slope of the regression line.
0.74 ± 0.245
0.74 ± 0.242
0.74 ± 0.240
0.74 ± 0.071
0.74 ± 0.199

Free-Response

1–5. The following table gives the ages in months of a sample of children and their mean height (in inches) at that age.

Find the correlation coefficient and the least-squares regression line for predicting height (in inches) from age (in months).
Draw a scatterplot of the data and the LSRL on the plot. Does the line appear to be a good model for the data?
Construct a residual plot for the data. Does the line still appear to be a good model for the data?
Use your LSRL to predict the height of a child of 35 months. How confident should you be in this prediction?
Interpret the slope of the regression line found in question #1 in the context of the problem.
In 2002, there were 23 states in which more than 50% of high school graduates took the SAT test. The following printout gives the regression analysis for predicting SAT Math from SAT Verbal from these 23 states.
What is the equation of the least-squares regression line for predicting Math SAT score from Verbal SAT score?
Determine the slope of the regression line and interpret in the context of the problem.
Identify the standard error of the slope of the regression line and interpret it in the context of the problem.
Identify the standard error of the residuals and interpret it in the context of the problem.
Assuming that the conditions needed for doing inference for regression are present, what are the hypotheses being tested in this problem, what test statistic is used in the analysis, what is its value, and what conclusion would you make concerning the hypothesis?
For the regression analysis of question #6:
Construct and interpret a 95% confidence interval for the true slope of the regression line.
Explain what is meant by “95% confidence interval” in the context of the problem.
It has been argued that the average score on the SAT test drops as more students take the test (nationally, about 46% of graduating students took the SAT). The following data are the Minitab output for predicting SAT Math score from the percentage taking the test (PCT) for each of the 50 states. Assuming that the conditions for doing inference for regression are met, test the hypothesis that scores decline as the proportion of students taking the test rises. That is, test to determine if the slope of the regression line is negative. Test at the 0.01 level of significance.
Some bored researchers got the idea that they could predict a person”s pulse rate from his or her height (earlier studies had shown a very weak linear relationship between pulse rate and weight). They collected data on 20 college-age women. The following table is part of the Minitab output of their findings.
Determine thet -ratio and the P -value for the test.
Construct a 99% confidence interval for the slope of the regression line used to predict pulse rate from height.
Do you think there is a predictive linear relationship between height and pulse rate? Explain.
Suppose the researchers were hoping to show that there was a positive linear relationship between pulse rate and height. Are thet -ratio and P -value the same as in Part (a)? If not, what are they?

Cumulative Review Problems

You are testing the hypothesis H ₀ : p = 0.6. You sample 75 people as part of your study and calculate that = 0.7.
What iss for a significance test for p ?
What iss for a confidence interval for p ?
A manufacturer of lightbulbs claims a mean life of 1500 hours. A mean of 1450 hours would represent a significant departure from this claim. Suppose, in fact, the mean life of bulbs is only 1450 hours. In this context, what is meant by the power of the test (no calculation is required)?
Complete the following table by filling in the shape of the sampling distribution of for each situation.
The following is most of a probability distribution for a discrete random variable.

Find mean and standard deviation of this distribution.

Consider the following scatterplot and regression line.
Would you describe the point marked with a box as an outlier, influential point, neither, or both?
What would be the effect on the correlation coefficient of removing the box-point?
What would be the effect on the slope of the regression line of removing the box-point?

Solutions to Practice Problems

Multiple-Choice

The correct answer is (d). II is true since it can be shown that III is not true since, although we often use the alternative H _A : β ≠ 0, we can certainly test a null with an alternative that states that there is a positive or a negative association between the variables.
The correct answer is (a). .
The correct answer is (e). For n = 20, df = 20 – 2 = 18 ⇒ t * = 2.878 for C = 0.99.
The correct answer is (c). Note that (a) is not correct since it doesn”t have “predicted” or “on average” to qualify the increase. (b) is a true statement but is not the best interpretation of the slope. (d) has mixed up the response and explanatory variables. (e) is also true (t = 4.05 ⇒ P -value = 0.0008) but is not an interpretation of the slope.
The correct answer is (a). A 95% confidence interval at 12 – 2 = 10 degrees of freedom has a critical value of t * = 2.228 (from Table B; if you have a TI-84 with the invT function, invT(0.975,10)=2.228). The required interval is 0.74 ± (2.228)(0.11) = 0.74 ± 0.245.

Free-Response

r = 0.9817, height = 25.41 + 0.261(age)

(Assuming that you have put the age data in L1 and the height data in L2 , remember that this can be done on the TI-83/84 as follows: STAT CALC LinReg(a+bx) L1,L2,Y1 .)

The line does appear to be a good model for the data.

(After the regression equation was calculated on the TI-83/84 and the LSRL stored in Y1 , this was constructed in STAT PLOT by drawing a scatterplot with Xlist:L1 and Ylist:L2 .)

The residual pattern seems quite random. A line still appears to be a good model for the data.

(This scatterplot was constructed on the TI-83/84 using STAT PLOT with Xlist:L1 and Ylist:RESID . Remember that the list of residuals for the most recent regression is saved in a list named RESID .)

= 25.41 + 0.261(35) = 34.545 (Y1(35) = 34.54) . You probably shouldn”t be too confident in this prediction. 35 is well outside of the data on which the LSRL was constructed and, even though the line appears to be a good fit for the data, there is no reason to believe that a linear pattern is going to continue indefinitely. (If it did, a 25-year-old would have a predicted height of 25.41 + 0.261(12 × 25) = 103.71″, or 8.64 feet!)
The slope of the regression line is 0.261. This means that, for an increase in age of 1 month, height is predicted to increase by 0.261 inches. You could also say, that, for an increase in age of 1 month, height will increase on average by 0.261 inches.
a. = 185.77 + 0.6419(Verbal ).
b= 0.6419. For each additional point scored on the SAT Verbal test, the score on the SAT Math test is predicted to increase by 0.6419 points (or: will increase on average by 0.6419 points). (Very important on the AP exam: be very sure to say “is predicted” or “on average” if you”d like maximum credit for the problem!)
The standard error of the slope iss _b= 0.1420. This is an estimate of the variability of the standard deviation of the estimated slope for predicting SAT Verbal from SAT Math.
The standard error of the residuals iss = 7.457. This value is a measure of variation in SAT Verbal for a fixed value of SAT Math.
• The hypotheses being tested areH ₀ : β = 0 (which is equivalent to H ₀ : ρ = 0) and H _Aβ ≠ 0, where β is the slope of the regression line for predicting SAT Verbal from SAT Math.

The test statistic used in the analysis is , df = 23 – 2 = 21.

a. df = 23 – 2 = 21⇒ t * = 2.080. The 95% confidence interval is: 0.6419 ± 2.080(0.1420) = (0.35, 0.94). We are 95% confident that, for each 1 point increase in SAT Verbal, the true increase in SAT Math is between 0.35 points and 0.94 points.
The procedure used to generate the confidence interval would produce intervals that contain the true slope of the regression line, on average, 0.95 of the time.
I . Let β = the true slope of the regression line for predicting SAT Math score from the percentage of graduating seniors taking the test.

II . We use a linear regression t test with α = 0.01. The problem states that the conditions for doing inference for regression are met.

III . We see from the printout that

based on 50 – 2 = 48 degrees of freedom. The P -value is 0.000. (Note: The P -value in the printout is for a two-sided test. However, since the P -value for a one-sided test would only be half as large, it is still 0.000.)

IV . Because P < 0.01, we reject the null hypothesis. We have very strong evidence that there is a negative linear relationship between the proportion of students taking SAT math and the average score on the test.

a. , df = 20 – 2 = 18 ⇒ P -value = 0.644.
df = 18 ⇒t * = 2.878; 0.2647 ± 2.878(0.5687) = (-1.37, 1.90).
No. The P -value is very large, giving no grounds to reject the null hypothesis that the slope of the regression line is 0. Furthermore, the correlation coefficient is only , which is very close to 0. Finally, the confidence interval constructed in part (b) contains the value 0 as a likely value of the slope of the population regression line.
Thet -ratio would still be 0.47. The P -value, however, would be half of the 0.644, or 0.322 because the computer output assumes a two-sided test. This is a lower P -value but is still much too large to infer any significant linear relationship between pulse rate and height.

Solutions to Cumulative Review Problems

a.
The power of the test is the probability of correctly rejecting a false hypothesis against a particular alternative. In other words, the power of this test is the probability of rejecting the claim that the true mean is 1500 hours against the alternative that the true mean is only 1450 hours.
P (7) = 1 – (0.15 + 0.25 + 0.40) = 0.20.

μ _x= 2(0.15) + 6(0.25) + 7(0.20) + 9(0.40) = 6.8.

(Remember that this can be done by putting the X -values in L1 , the p (x )-values in L2 , and doing STAT CALC 1-Var Stats L1,L2 .)

a. The point is both an outlier and an influential point. It is an outlier because it is removed from the general pattern of the data. It is an influential observation because it is an outlier in the x direction and its removal would have an impact on the slope of the regression line.
Removing the point would increase the correlation coefficient. That is, the remaining data are better modeled by a line without the box-point than with it.
Removing the point would make the slope of the regression line more positive (steeper) than it is already.

CHAPTER 13

Inference for Regression

The following regression output is for predicting the number of calories per serving from the number of grams of fat per serving for ten fast-food hamburgers.

Dependent variable is Calories

No selector

R squared = 96.8% R squared (adjusted) = 96.4%

s = 57.61 with 10 – 2 = 8 degrees of freedom (df)

Assume conditions for inference have been met. The 95% confidence interval for the slope of the regression line is

(A)

(B)

(C)

(D)

(E)

A researcher wants to do a test of significance to see if the slope of his regression line is significantly different from 0. The scatterplot, residual plot, and a univariate plot of the residuals are shown here. What advice should you give the researcher?

(A) The scatterplot looks linear and there is no curve in the residual plot, so a significance test is appropriate.

(B) The scatterplot has a curve in it, so a significance test is not appropriate.

(D) The variation in the residuals is not uniform, so a significance test is not appropriate.

(E) The univariate plot of the residuals is strongly skewed, so a significance test is not appropriate.

When constructing a confidence interval for the slope of a regression line for a data set with n points, which is the correct consideration for the critical value?

(A) Because the sampling distribution of the slope is approximately normal, z should be used as the critical value.

(B) Because the standard deviation of the slope is unknown, t with n degrees of freedom should be used as the critical value.

(C) Because the standard deviation of the slope is unknown, t with n – 1 degrees of freedom should be used as the critical value.

(D) Because the standard deviation of the slope is unknown, t with n – 2 degrees of freedom should be used as the critical value.

(E) Because the sample size is already considered in the formula for the standard error of the slope, no critical value is needed.

A teacher wondered if there was a relationship between students” ACT math scores and their ACT reading scores. He selected a random sample of students who took the ACT, created a scatterplot, and saw that there appears to be a positive linear relationship. A significance test for the slope for predicting the ACT math score from the ACT reading score showed a P- value very close to 0. Which of the following is a valid conclusion?

(A) Students who have higher ACT reading scores tend to have higher ACT math scores.

(B) To improve their ACT math scores, students should work to improve their ACT reading scores.

(D) There is strong evidence that a higher ACT reading score will result in a higher ACT math score.

(E) There is some evidence that a higher ACT reading score will result in a higher ACT math score.

The following table shows the number of cylinders in the engine and the gas mileage for a random sample of automobiles by a certain manufacturer. Assume all conditions for inference are met.

The appropriate critical value to construct a 90% confidence interval for the slope of the regression line is

(A) 1.960.

(B) 1.943.

(D) 1.645.

(E) 1.282.

Answers