## 5 Steps to a 5 AP Statistics 2017 (2016)

### STEP __5__

### Build Your Test-Taking Confidence

### AP Statistics Practice Test 2

**SECTION II**

Time: 1 hour and 30 minutes

Number of questions: 6

Percentage of total grade: 50

**General Instructions**

There are two parts to this section of the examination. Part A consists of five equally weighted problems that represent 75% of the total weight of this section. Spend about 65 minutes on this part of the exam. Part B consists of one longer problem that represents 25% of the total weight of this section. Spend about 25 minutes on this part of the exam. You are not necessarily expected to complete all parts of every question. Statistical tables and formulas are provided.

- Be sure to write clearly and legibly. If you make an error, you may save time by crossing it out rather than trying to erase it. Erased or crossed-out work will not be graded.
- Show all your work. Indicate clearly the methods you use because you will be graded on the correctness of your methods as well as the accuracy of your final answers. Correct answers without support work may not receive credit.

**Statistics, Section II, Part A, Questions 1–5**

Spend about 65 minutes on this part of the exam; percentage of Section II grade: 75.

**Directions:** Show all your work. Indicate clearly the methods you use because you will be graded on the correctness of your methods as well as on the accuracy of your results and explanation.

__Universities generally post the rosters of all their student athletes, including which sport they play and their heights. The data in the table represent the height data for members of one university”s fencing teams.__- Construct an appropriate comparative graph for the heights of the members of the fencing teams.
- Using your plot in part (a), describe the differences and similarities in the distributions of the heights of the two teams.
__In the fictional story “The Legend of Sleepy Hollow,” the Headless Horseman chases Ichabod Crane through the New England countryside. A video game based on this legend has Crane choosing his route randomly, with a probability of 15% that he stays on the road and 85% that he cuts through the forest. If he stays on the road, there is a 90% chance of the Horseman catching Ichabod. If he cuts through the forest, there is only a 64% change of being caught. Find the probability that the next game has Mr. Crane cut through the forest, given that the Horseman will indeed catch him.____A local farmers” market sells grapefruit and claims that the mass of these citrus fruits is approximately normally distributed with a mean of 428.0 grams and a standard deviation of 21.1 grams. You purchase 10 grapefruit, and the average mass of these 10 fruits is 446.2 grams.__- You wish to test the claim of the farmers” market, so you begin by using a computer model to simulate drawing 300 samples of size 10 from a population approximately
*N*(428.0, 21.2) grams. Above is the dotplot of the means from these 300 samples.

Explain in simple language how the dotplot shows that the sample mean you are using is an unbiased estimator of the population mean.

- Given your bag, with its average mass per grapefruit of 446.2 grams, do you think that the farmers” market is correct in its claim? Justify your answer using the dotplot above.
__Cortisol is a hormone produced by the body to control inflammation. People with chronic inflammation and, hence, chronically elevated cortisol levels can develop problems with their immune system. To explore this, 100 adult volunteers with chronic inflammation will participate in a study to compare the effect of black tea versus coffee on cortisol levels. Each volunteer will be assigned at random to one of the two groups and provided with daily capsules that contain a concentrated form of either black tea or coffee. Each will also have his or her cortisol levels measured at the beginning of the study and then 20 weeks later.__- Describe how you would assign the 100 volunteers to the two groups in such a way as to allow a statistically valid comparison of the two treatments.
- Explain a
*statistical*advantage to using capsules rather than having participants actually drink coffee or tea. - Is it reasonable to generalize the findings of this study to all adults with chronic inflammation? Explain.
__A large school district is planning an enrichment program for next summer. It is planning four course options for students: Music, Sports, Drama, or Academic Enrichment. For planning purposes, the district selected a random sample of 100 freshmen, a random sample of 100 sophomores, and a random sample of 100 juniors. (Since seniors will have graduated, they were not surveyed.) The selected students were asked which program they would choose to attend. The results for each class are shown in the graph below:__- Describe any associations you see between year in school and choice of program.
- District administrators want to determine if there is convincing evidence that students in different years have different preferences for programs throughout the whole district. Identify the hypothesis test they should use and state the degrees of freedom.
- The
*P*-value from the test is 0.0082. What conclusion should the administrators reach?

**Statistics, Section II, Part B, Question 6**

Spend about 25 minutes on this part of the exam; percentage of Section II grade: 25.

**Directions:** Show all of your work. Indicate clearly the methods you use because you will be graded on the correctness of your methods as well as on the accuracy of your results and explanation.

__In 1937, the United States passed the Wagner-Steagall Act, also known as the National Housing Act. This piece of legislation was intended to remedy the unsafe housing conditions in which many low-income families were living. The National Housing Act contributed to today”s common wisdom that when a household devotes more than 30 percent of its income to housing expenses that household is said to be burdened. A household that devotes more than 50 percent of its income to housing expenses is said to be severely burdened.__

A graduate student interning at a nonprofit organization that addresses affordable housing has received a grant to study this issue. However, the grant will only cover in-depth investigation of eight households. The table below represents the percentage of income devoted to housing expenses (rent, utilities, etc.) for each of eight randomly selected low-income households.

- Local officials say that for low-income households in this community the median percentage of income spent on housing is 48 percent. The graduate student would like to test the hypothesis that the median percentage is actually higher than 48. Explain why the graduate student should not use a
*t*-test for this hypothesis. - Rather than a test using the numeric values, the graduate student decides to turn the data into categorical data by noting whether each subject”s income is above or below the hypothesized median percentage of 48. Fill out the table appropriately to reflect the graduate student”s change of the data:
- Explain how the graduate student”s decision to change the nature of the data addresses any issues raised in part (a).
- If it were true that the median percentage of income devoted to housing expenses was 48 percent, then we would expect half of the population to spend less than 48 percent on housing expenses. Using the information in the table above, calculate the approximate probability that one or fewer clients would have a housing expense percentage of less than 48 percent.
- Based on your answer to part (d), do you have convincing evidence that the graduate student”s hypothesis is correct? Explain your answer.

**END OF SECTION II**

** Answers Multiple Choice to Practice Test 2, Section I**

__D____B____A____C____D____E____B____D____D____B____C____D____D____C____B____A____B____A____D____E____D____D____C____B____B____C____C____A____B____B____E____C____B____C____E____C____D____C____A____D__

** Answers Multiple Choice Explained to Practice Test 2, Section 1**

__(D) The location of the tail indicates skew. The tail is located toward the smaller values of time, therefore we call this distribution skewed left (since smaller values are to the left of larger values).____(B) Slope, traditionally written as Δ__*y*/Δ*x*, becomes Δ*CaloriesPredicted*/Δ*Sodium*= 0.2457/1. So, every item with 1 additional mg. of sodium tends to have 0.2457 more calories.__(A) Since water and sunlight also influence plant growth, we block on those variables. Plots 1, 2, 3, and 4 all get more direct sunlight. Plots 5, 6, 7, and 8 all get increased water.____(C) The middle 80% of durations symmetrically straddles the mean of 167 minutes. That leaves 10% of the durations in each tail. Using Standard Normal Probabilities (Table A in the Appendix to this book), we find a z-score that corresponds to a 10% tail area that is ±1.28. Solving the equations ±1.28 = (__*x*– 167)/76 gives us approximately 70 minutes and 264 minutes.__(D) Aliana”s customers were not randomly selected. But neither did she randomly assign customers to her treatments—she spilt her shift into three parts and applied the treatment to all customers during the time period.____(E) You can tell that the surface area estimates tend to be higher than the count estimates, but you cannot tell how either of them compares to the correct value.____(B) Measurement errors are values calculated using the expression Thermometer Reading – 20°C. If the reading is 18°C, then the error would be −2°C. We then need to calculate the probability that the error is less than −2°C. Using Table A (see Appendix), we find____(D) The definition of a__*P*-value is the probability, in repeated sampling, of obtaining results at least as large/small as ours when the null hypothesis is actually true. Therefore, a*P*-value in this case says that there is a 7% chance of a difference at least as large if the new keyboard is no better than the old.__(D) Changing the largest value in a data set, in this case increasing it by two inches, would not affect either Q1 or Q3, therefore the IQR remains unchanged.____(B) The sampling distribution of the sample mean is approximately normal because__*n*= 50 > 30.

.

__(C) The median for Hendersonville is 71 degrees. The third quartile for Sheboygan is 64 degrees.____(D) The 71% refers to__*r*^{2}. So*r*= ≈ ±0.8 Because the association is positive,*r*= +0.84.__(D) In this case 12/43 = 0.28 women were promoted and 9/24 = 0.38 men were promoted. The table does provide some evidence since these values are so different. It remains to be seen whether this difference is statistically significant.____(C) The values 2 and 18.5 represent Q1 and Q3 respectively. Approximately 50% of the values would be between Q1 and Q3.____(B) Point A “pulls up” on the left end of the line. Removing it would drop the left end, increasing the slope. Point B is pulling down near the mean value of__*x*. Removing it would have little impact on the slope.__(A) Bias is defined as any process that systematically over- or underestimates. A process that creates estimates that are, on average, too high or too low is, by definition, biased.____(B) A Type I error means the null hypothesis is true, but you reject it. That means pizza places, on average, do not make over $9,000 per month, but you believe they do. You might open a business and do poorly.____(A) We need to find the__*z*-score that corresponds to a lower-tail area/probability of 0.2800. Using Table A (see Appendix), that*z*-score is −0.58. −0.58 · 2.1 ounces = −1.22 ounces, so this onion is 1.22 ounces below the mean.__(D) Because there is an association between exercise and cholesterol, we need to block on level of exercise.____(E)If we are to use the data that were returned, then we have to find a way to overcome nonresponse by getting survey results from those who did not return the survey.____(D) A factor is an explanatory variable. Replication means that, within an experiment, each treatment is applied to more than one experimental unit. Treatments are combinations of levels from different factors. The variable controlled by researchers is the explanatory variable, not the response variable. D is the only correct option.____(D) The key phrase in this question is “most representative sample of its customers.” Choice (D) ensures that the sample selects customers from each state and that selection is proportional to the number of customers from each state. For example, if 25% of the customers are from California, then 25% of the sample will be from California.____(C) In order to use a normal approximation to a binomial model such as this, we should see if__*np*and*n*(1 –*p*) are both at least 10. (Note, some authors will check to see if both are at least 5 and others will check to see if both are at least 15.)__(B) We need to calculate__*P*(Serenity in first box, Blackstar in second box) as well as*P*(Blackstar in first box, Serenity in second box) and then add them. This would be (0.12)(0.18) + (0.18)(0.12) = 0.0432.__(B) Using the expected value formula for a probability function we get (0.05)(0.45) + (0.10)(0.25) + (0.15)(0.15) + (0.25)(0.10) + (0.50)(0.05) = 0.12, or 12%.____(C) The probability of selecting a player above the third quartile is 0.25 because we are sampling with replacement, this situation meets the conditions of a binomial variable with__*n*= 5 and probability of success = 0.25. Therefore,*P*(*x*≥ 3) =*P*(3) +*P*(4) +*P*(5). Using the formula we get,*P*(3) +*P*(4) +*P*(5) = 0.1035.__(C) The mean of the total weight is the same as the sum of the individual means, or 62 grams + 456 grams = 518 grams. We are told that the shipping is secured in an independent part of the factory, so therefore the variance of the total weight is given by (1.0 grams)__^{2}+ (6 grams)^{2}. We find the standard deviation of the total weight by taking the square root of that variance, or = 6.1 grams.__(A) Recall that two events A and B are independent if__*P*(A|B =*P*(A). The proportion of juniors in the group who vote for pizza is 48/144 = 0.33333. The proportion of all students who vote for pizza is 195/585 = 0.3333. Since these are the same value,*Student is a junior*and*Student choses pizza*are independent.__(B) Because is an unbiased estimator of the mean, the sampling distribution of those sample means has the same value as the population. Therefore mean = 210. The standard deviation of the sampling distribution is given by = 3.75. However, the population is substantially skewed right and the sample size is very small. Therefore we cannot say that the shape of the sampling distribution is approximately normal.____(B) The__*z*-score for the first quartile (*p*= 0.25) is −0.674. . So__(E) The correct option provides the explanation.____(C) The question asks which is least likely to reduce bias in a sample survey. Simple random sampling is unbiased, so using a stratified random sample would not improve on that. You can take a representative sample but still introduce bias unless you address the actions in choices (A), (B), (D), and (E).____(B) Degrees of freedom are calculated by (rows − 1)(columns − 1). In this case, (5 − 1)(2 − 1) = 4____(C)This can be done using a two-way table.__

The required probability is 0.0049/0.03475 = 0.1410

__(E) That is the definition of power.____(C) The number of successes and failures each needs to be greater than 10 for both males and females. Since the success/failure numbers for females are 66/4, we do not meet this condition.____(D) We did not change the significance level, which is the probability of making a Type I error. Increasing the sample size is one way to increase the power of the test.____(C) Each volunteer behaves as his or her own block. Therefore, a matched pairs test is appropriate.____(A) This choice correctly interprets the confidence level of 95%. When you take many, many, many samples of the same size, 95% of the confidence intervals you build around your sample results will contain the parameter you hope to estimate.____(B) Since zero is contained in the interval, zero (representing no difference in the means) is a plausible value. So, we do not have convincing evidence that there is a difference.__

** Free Response Answers to Practice Test 2, Section II**

__Fencing Team__- A back-to-back stem plot with split stems will do nicely:

Key: 6 | 2 = 62 inches

- The distribution of women”s heights seems slightly skewed to the right, while that of the men seems somewhat symmetric. Both the center and spread of the distribution of men”s heights is greater than that of the women”s.
__Using the tree diagram below:____Grapefruit at the farmers” market.__- It appears that the center of this symmetric distribution is approximately 428 grams. In a symmetric distribution, the mean is approximately located at this center. Therefore, because the mean of this distribution is approximately equal to the population mean of 428 grams, we have evidence that we are using an unbiased estimator.
- There are only two simulated bags out of the 300 simulated samples that have a mean weight of more than 446.2 grams. This is a simulated probability of only 0.0067. Given how small a probability, it seems unlikely that the farmers” market is correct in its claim about the population of bags of grapefruit.
__Cortisol__- Put all 100 volunteers” names on equally sized pieces of paper into a hat and mix thoroughly. Then draw the names one at a time out of the hat. The first 50 names drawn are the volunteers who will receive the black tea capsules. The remaining 50 names are the volunteers who will receive coffee capsules.
- Using capsules keeps the volunteers blind to which treatment they are receiving so that any effect can be attributed to the treatment (tea or coffee concentrate) rather than perhaps the placebo effect.
- It is not reasonable to generalize the results of this study to all adults. The sample was a voluntary one, not a random one.
__Summer enrichment program__- There appears to be an association between course option choices and age as the distributions are so different. Specifically, the proportion of sophomores who chose music is higher than that of juniors and much higher than that of the freshmen. The proportion of juniors who chose academics is about the same as that of sophomores, but much lower than that of the freshmen.
*H*_{o}: Choice of enrichment program is the same for the populations of freshmen, sophomores, and juniors.

*H* _{o} : Choice of enrichment program is not the same for all three populations.

- Since the
*P*-value is so low, lower than any reasonable value of alpha, the district administrators should reject the null hypothesis. It appears that the choice of enrichment program is not the same for all three populations of students. __Affordable housing__- The graduate student should not use a
*t*-test for this hypothesis because the sample is so small and contains one outlier of 70.4 percent. - The table should be filled in as follows:
- Once the variable has been changed to a categorical variable (above/below), the outlier ceases to be an issue.
- This is now a binomial probability with
*n*= 8 and*P*(below the median) = 0.50. The approximate probability that 1 or fewer clients would have a housing expense percentage of less than 48 is found by*P*(0) +*P*(1) calculated using the formula once for each probability.

This is 0.00391 + 0.03125 = 0.03516.

- Since the probability is greater than an alpha level of 0.01, I do not have convincing evidence that the graduate student”s hypothesis is correct. (Alternatively: Since the probability is less than an alpha level of 0.05, I do have convincing evidence that the graduate student”s hypothesis is correct.)