Medium1 markMultiple Choice
Data Visualization and RepresentationRegressionScatter DiagramLine of Best FitCorrelation

AQA GCSE · Question 08.2 · Data Visualization and Representation

0 1 2 3 4 5 Waiting time (minutes) 0 30 60 90 120 150 180 210 Minutes after 9am

The scatter diagram shows the results for 25 customers Rachel sampled. Which of these is a possible equation for a regression line for the data shown?
y = -0.06 + 0.689x
y = -0.06 + 0.023x
y = 4.8 - 0.689x
y = 4.8 - 0.023x

Answer options:

A.

y = -0.06 + 0.689x

B.

y = -0.06 + 0.023x

C.

y = 4.8 - 0.689x

D.

y = 4.8 - 0.023x

How to approach this question

1. **Check the correlation**: Observe the general trend of the points on the scatter diagram. They go upwards from left to right, indicating a **positive correlation**. 2. **Analyse the equation form (y = a + bx)**: In this equation, 'b' is the gradient. For a positive correlation, the gradient 'b' must be positive. This immediately eliminates options C and D, which have negative gradients (-0.689 and -0.023). 3. **Analyse the y-intercept (a)**: The y-intercept is the value of y when x=0. A line of best fit would cross the y-axis at a low value, close to zero. Both remaining options A and B have a y-intercept of -0.06, which is plausible. 4. **Analyse the gradient (b)**: Compare the gradients of options A (0.689) and B (0.023). - A gradient of 0.689 means that for every 100 minutes, the waiting time increases by 68.9 minutes. This is a very steep line and does not fit the data. - A gradient of 0.023 means that for every 100 minutes, the waiting time increases by 2.3 minutes. This is a much shallower slope and visually fits the trend of the data points much better. 5. **Conclusion**: The equation with the positive gradient and the most plausible slope is y = -0.06 + 0.023x.

Full Answer

B.y = -0.06 + 0.023x✓ Correct
The correct answer is y = -0.06 + 0.023x. 1. **Correlation**: The points on the scatter diagram generally go up from left to right, showing a positive correlation. This means the gradient (the coefficient of x) in the regression equation y = a + bx must be positive. This eliminates the two options with negative gradients (y = 4.8 - 0.689x and y = 4.8 - 0.023x). 2. **Y-intercept**: The regression line should pass through the "mean of the points". Visually, a line of best fit would cross the y-axis (where x=0) at a value close to 0, possibly slightly negative. Both remaining options, y = -0.06 + 0.689x and y = -0.06 + 0.023x, have a y-intercept of -0.06, which is plausible. 3. **Gradient**: We need to choose between a steep gradient (0.689) and a shallow gradient (0.023). Let's estimate the gradient from the graph. The points go from roughly (x=30, y=1) to (x=210, y=4.5). Change in y ≈ 4.5 - 1 = 3.5 Change in x ≈ 210 - 30 = 180 Gradient ≈ 3.5 / 180 ≈ 0.0194. The value 0.023 is much closer to our estimate than 0.689. Therefore, y = -0.06 + 0.023x is the most plausible equation.
To determine the correct regression line equation, we assess its key features against the scatter plot. 1. **Direction of Correlation**: The points on the graph show a clear trend of increasing waiting time as the minutes after 9am increase. This is a **positive correlation**. The equation for the line of best fit, y = a + bx, must therefore have a positive gradient (b > 0). This eliminates options C and D, as they both have negative gradients. 2. **Y-intercept (a)**: The y-intercept is where the line crosses the y-axis (x=0). A line of best fit for this data would start near the origin, so a y-intercept close to zero is expected. Both options A and B have a y-intercept of a = -0.06, which is reasonable. 3. **Gradient (b)**: We must now decide between the gradient in option A (b = 0.689) and option B (b = 0.023). * Let's estimate the gradient from the graph. The line seems to pass near (x=0, y=0) and (x=210, y=4.5). * Gradient = (change in y) / (change in x) ≈ (4.5 - 0) / (210 - 0) ≈ 4.5 / 210 ≈ 0.021. * The gradient from option B (0.023) is very close to our estimate. The gradient from option A (0.689) is far too large and would represent a much steeper line. Therefore, y = -0.06 + 0.023x is the most suitable equation.

Common mistakes

✗ Incorrectly identifying the correlation as negative. ✗ Not understanding which part of the equation represents the gradient and which represents the y-intercept. ✗ Being unable to visually estimate the steepness of the gradient.

Practice the full AQA GCSE Statistics Higher Tier Paper 1

42 questions · hints · full answers · grading

More questions from this exam