Medium1 markMultiple Choice
AQA GCSE · Question 08.2 · Data Visualization and Representation
The scatter diagram shows the results for 25 customers Rachel sampled. Which of these is a possible equation for a regression line for the data shown?
y = -0.06 + 0.689x
y = -0.06 + 0.023x
y = 4.8 - 0.689x
y = 4.8 - 0.023x
The scatter diagram shows the results for 25 customers Rachel sampled. Which of these is a possible equation for a regression line for the data shown?
y = -0.06 + 0.689x
y = -0.06 + 0.023x
y = 4.8 - 0.689x
y = 4.8 - 0.023x
Answer options:
A.
y = -0.06 + 0.689x
B.
y = -0.06 + 0.023x
C.
y = 4.8 - 0.689x
D.
y = 4.8 - 0.023x
How to approach this question
1. **Check the correlation**: Observe the general trend of the points on the scatter diagram. They go upwards from left to right, indicating a **positive correlation**.
2. **Analyse the equation form (y = a + bx)**: In this equation, 'b' is the gradient. For a positive correlation, the gradient 'b' must be positive. This immediately eliminates options C and D, which have negative gradients (-0.689 and -0.023).
3. **Analyse the y-intercept (a)**: The y-intercept is the value of y when x=0. A line of best fit would cross the y-axis at a low value, close to zero. Both remaining options A and B have a y-intercept of -0.06, which is plausible.
4. **Analyse the gradient (b)**: Compare the gradients of options A (0.689) and B (0.023).
- A gradient of 0.689 means that for every 100 minutes, the waiting time increases by 68.9 minutes. This is a very steep line and does not fit the data.
- A gradient of 0.023 means that for every 100 minutes, the waiting time increases by 2.3 minutes. This is a much shallower slope and visually fits the trend of the data points much better.
5. **Conclusion**: The equation with the positive gradient and the most plausible slope is y = -0.06 + 0.023x.
Full Answer
B.y = -0.06 + 0.023x✓ Correct
The correct answer is y = -0.06 + 0.023x.
1. **Correlation**: The points on the scatter diagram generally go up from left to right, showing a positive correlation. This means the gradient (the coefficient of x) in the regression equation y = a + bx must be positive. This eliminates the two options with negative gradients (y = 4.8 - 0.689x and y = 4.8 - 0.023x).
2. **Y-intercept**: The regression line should pass through the "mean of the points". Visually, a line of best fit would cross the y-axis (where x=0) at a value close to 0, possibly slightly negative. Both remaining options, y = -0.06 + 0.689x and y = -0.06 + 0.023x, have a y-intercept of -0.06, which is plausible.
3. **Gradient**: We need to choose between a steep gradient (0.689) and a shallow gradient (0.023). Let's estimate the gradient from the graph. The points go from roughly (x=30, y=1) to (x=210, y=4.5).
Change in y ≈ 4.5 - 1 = 3.5
Change in x ≈ 210 - 30 = 180
Gradient ≈ 3.5 / 180 ≈ 0.0194.
The value 0.023 is much closer to our estimate than 0.689. Therefore, y = -0.06 + 0.023x is the most plausible equation.
To determine the correct regression line equation, we assess its key features against the scatter plot.
1. **Direction of Correlation**: The points on the graph show a clear trend of increasing waiting time as the minutes after 9am increase. This is a **positive correlation**. The equation for the line of best fit, y = a + bx, must therefore have a positive gradient (b > 0). This eliminates options C and D, as they both have negative gradients.
2. **Y-intercept (a)**: The y-intercept is where the line crosses the y-axis (x=0). A line of best fit for this data would start near the origin, so a y-intercept close to zero is expected. Both options A and B have a y-intercept of a = -0.06, which is reasonable.
3. **Gradient (b)**: We must now decide between the gradient in option A (b = 0.689) and option B (b = 0.023).
* Let's estimate the gradient from the graph. The line seems to pass near (x=0, y=0) and (x=210, y=4.5).
* Gradient = (change in y) / (change in x) ≈ (4.5 - 0) / (210 - 0) ≈ 4.5 / 210 ≈ 0.021.
* The gradient from option B (0.023) is very close to our estimate. The gradient from option A (0.689) is far too large and would represent a much steeper line.
Therefore, y = -0.06 + 0.023x is the most suitable equation.
Common mistakes
✗ Incorrectly identifying the correlation as negative.
✗ Not understanding which part of the equation represents the gradient and which represents the y-intercept.
✗ Being unable to visually estimate the steepness of the gradient.
Practice the full AQA GCSE Statistics Higher Tier Paper 1
42 questions · hints · full answers · grading
More questions from this exam
Q01Two fair spinners, each numbered 1 to 8, are spun. The numbers they land on are added up. What is...EasyQ02Here is the definition of a term used in sampling.
'Those who are actually available to be part o...EasyQ03Which of these data lists is bi-modal **and** has the mean double the median?EasyQ04This graph was seen on the BBC News App. Circle the letter of the statement for the graph which i...EasyQ05.1A researcher wants to survey 500 secondary school students in a large city to find out their favo...Easy
Expert