## MCAT Physics and Math Review

## Chapter 12: Data-Based and Statistical Reasoning

### Practice Questions

1. Which of the following outliers would most likely be the easiest to correct?

1. A typographical error in data transfer

2. A measurement error in instrument calibration

3. A heavily skewed distribution

4. A correctly measured anomalous result

2. In a sample of hospital patients, the mean age is found to be significantly lower than the median. Which of the following best describes this distribution?

1. Skewed right

2. Skewed left

3. Normal distribution

4. Bimodal distribution

3. What is the median of the following data set?

7, 17, 53, 23, 4, 2, 4

1. 4

2. 7

3. 15.7

4. 23

4. A hypothesis test was correctly conducted and the experimenter failed to reject the null hypothesis. Which of the following must be true?

1. The *p*-value was greater than *α*.

2. A type I error did not occur.

3. The power of the study was too small.

1. I only

2. II only

3. I and II only

4. I and III only

5. A 95% confidence interval will fall within what distance from the mean?

1. ±*σ*

2. ±2*σ*

3. ±3*σ*

4. ±4*σ*

6. Are there any outliers on the following box plot?

1. Yes; 1575 is an outlier.

2. Yes; 2600 is an outlier.

3. Yes; both 1575 and 2600 are outliers.

4. No; there are no outliers.

7. The following titration curve is an example of:

1. a sigmoidal relationship on a log–log graph.

2. a sigmoidal relationship on a linear graph.

3. a logarithmic relationship on a semilog graph.

4. a logarithmic relationship on a log–log graph.

8. Assume that having blonde hair and blue eyes are independent recessive traits. If one parent is a carrier for each gene while the other parent is homozygous recessive for both genes, what is the probability that the first two offspring will both have blonde hair and blue eyes?

1. 6.25%

2. 25%

3. 43.75%

4. 50%

9. Based on the county-level map below, which of the following statements best represents the data about elderly individuals? (Note: The darker the shade of green, the higher the percentage of elderly persons in the county.)

1. Most of the elderly people in the United States live in the center of the country.

2. Most of the people living in the center of the United States are elderly.

3. The center of the United States tends to have a larger proportion of elderly people.

4. There are more elderly people moving to the center of the country than elsewhere.

10.As the confidence level increases, a confidence interval:

1. becomes wider.

2. becomes thinner.

3. shifts to higher values.

4. shifts to lower values.

11.Which of the following measures of distribution is most useful for determining probabilities?

1. Range

2. Average distance from mean

3. Interquartile range

4. Standard deviation

12.It is known that crickets increase their rate of chirping in a direct linear relationship with temperature until a maximum chirping rate is reached. Which of the following graphs best represents this relationship?

1.

2.

3.

4.

13.A new medication for heart failure is being developed and has had a statistically significant effect on contractility in clinical trials. Which of the following would NOT likely cause the drug to be held back from common use?

1. The value of *α* used was 0.5.

2. Similar compounds display toxicity.

3. The effect size is clinically insignificant.

4. The study had low power to detect a difference.

14.The following histogram:

1. contains a bimodal distribution.

2. should be analyzed as two separate distributions.

3. contains one mode.

1. II only

2. I and II only

3. I and III only

4. I, II, and III

15.Which of the following values corresponds to the probability of a type I error?

1. *α*

2. *β*

3. Power

4. Confidence

PRACTICE QUESTIONS

### Answers and Explanations

· **A**Because the error is in data transfer, the original source of data can be consulted to allow for the inclusion of the correct data point. An error in instrument calibration may introduce bias; while this should not affect the standard deviation of a sample, it would certainly affect the mean. The instrument would have to be recalibrated, and the relevant data points would have to be measured again to correct for this type of outlier, eliminating **choice (B)**. A skewed distribution is one that has a long tail. In this case, it may be more challenging to determine if a particular value is an outlier or simply a value in the long tail of the distribution. Repeated sampling or a large sample size is usually required to determine if a sample is truly skewed, eliminating **choice (C)**. An anomalous result is challenging to interpret, and how to correct for the result may be unclear. In some cases, the result should be inflated or weighed more heavily to reflect its significance; in other cases, it should be interpreted as a regular value. In still other cases, it is appropriate to drop the anomalous result. This decision should ideally be made before the study even begins, but this still certainly requires more consideration than simply checking a result from one’s original data set, eliminating **choice (D)**.

· **B**The mean is to the left of the median, which implies that the tail of the distribution is on the left side; therefore, this distribution is skewed left. It would be expected that there would be a low plateau on the left side of the distribution, which accounts for the shift in the mean.

· **B**The median is the central data point in an ordered list. Because this data set has seven numbers, the central point will be in the fourth position. Reordered, the list reads: 2, 4, 4, 7, 17, 23, 53. Thus, the median is 7. **Choice (A)**, 4, is the mode while **choice (C)**, 15.7, is the mean.

· **B**A type I error occurs when the null hypothesis is incorrectly rejected. Because we failed to reject the null hypothesis, this could not have occurred. Statement I is incorrect because in a two-sided test, the *p*-value only needs to exceed Statement III is incorrect because we lack information about power in the question stem. In addition, a study could be extremely well-powered and still fail to reject the null hypothesis if no difference truly exists between two populations.

· **B**Approximately 95% of values fall within two standard deviations (±2*σ*) of the mean for a normal distribution. A confidence interval is constructed using the same values. Approximately 68% of the values are within one standard deviation, and 99% are within three standard deviations, eliminating the other answer choices.

· **C**Outliers can be determined with respect to the interquartile range, *Q*_{3} − *Q*_{1}. The interquartile range for this box plot is 2280 − 2075, or 205. Values that are 1.5 × IQR below *Q*_{1} or above *Q*_{3} are considered outliers. 2075 − 1.5 × 205 is approximately 2075 − 300, or 1775 (actual = 1767.5). Therefore, 1575 is an outlier. 2280 +1.5 × 205 is approximately 2580 (actual = 2587.5). Therefore, 2600 is also an outlier.

· **B**The first term in the answer choices describes the shape of the curve. While we did not discuss sigmoidal curves in this chapter specifically, they do show up in other places in science—in particular, for enzymes, cooperative binding, and titrations. Sigmoidal curves are S-shaped. The second term refers to the type of plot. Because the axes have the same scale throughout, this is a linear graph. Note that even though the *y*-axis represents logarithmic changes in H^{+} concentration (pH = −log [H^{+}]), the actual unit that is used is pH points, which increase linearly in this graph.

· **A**Because one parent is homozygous for both traits, we are only concerned with the other parent. This parent has a 50% chance of transmitting each independent trait, and thus a 25% chance of transmitting both (0.5 × 0.5 = 0.25). This probability is the same for both pregnancies because they are independent events; thus, the probability that both children exhibit both traits is 0.25 × 0.25 = 0.0625, or 6.25%.

· **C**With data about percentages, we can only draw conclusions about percentages. Thus any information about number of people, as in **choice (A)**, is incorrect. This map shows us that a higher percentage of the residents in the middle of the country are elderly in comparison to other parts of the country. There are, of course, exceptions to this rule, including Florida, the Pacific Coast, and parts of Appalachia, which are all in the top category. Even so, there appears to be a clustering of counties with a high percentage of elderly individuals in the middle of the country. We also cannot say that most of the population is elderly in any place on this map because we are not given actual values for the percentages. There may be a plurality, but there is insufficient information to posit a majority, eliminating **choice (B)**. The map gives no indication of migration patterns, so we can also eliminate **choice (D)**.

· **A**To increase the confidence level, one must increase the size of the confidence interval to make it more likely that the true value of the mean is within the range. Therefore, the confidence interval must become wider.

· **D**Standard deviation is the most common measure of distribution. It is the most closely linked to the mean of a distribution and can be used to calculate *p*-values, which are probabilities (specifically, *p*-values are the probability that an observed difference between two populations is due to chance).

· **B**The question stem indicates that there is a linear relationship, so we know that we are looking for a straight line before a plateau. We also know that linear relationships are represented on linear plots. **Choice (B)** matches both criteria because the axes show constant intervals. Constant ratios, as shown in **choices (C)** and **(D)**, are seen in semilog plots like these, as well as log–log plots.

· **D**If a study has low power, it is more difficult to get results that are statistically significant. Therefore, if the results are still statistically significant even with low power, then there is likely a large effect size that makes the effect clinically significant. If the value of α used in the study was 0.5, then statistically significant results do not mean much—traditionally, *α* = 0.05 or a smaller probability is used, eliminating **choice (A)**. Concerns about toxicity should always limit the use of a drug, eliminating **choice (B)**. A statistically significant result is only of interest if it also represents a clinically significant improvement, eliminating **choice (C)**.

· **D**Because the histogram contains two peaks with a valley in between, it is a bimodal distribution. The color separation of two distinct populations provides evidence that there is a qualitative difference in the data between the two peaks, thus the data should be analyzed according to gender. There is indeed only one mode, at 5'6". This is the measurement with the largest number of corresponding data points.

· **A**Type I error is the probability of mistakenly rejecting the null hypothesis. We set the type one error level by selecting a significance level (*α*).