<< Chapter < Page Chapter >> Page >

Try it

Identify the potential outlier in the scatter plot. The standard deviation of the residuals or errors is approximately 8.6.

The outlier appears to be at (6, 58). The expected y value on the line for the point (6, 58) is approximately 82. Fifty-eight is 24 units from 82. Twenty-four is more than two standard deviations (2 s = (2)(8.6) = 17.2 ). So 82 is more than two standard deviations from 58, which makes (6, 58) a potential outlier.

Got questions? Get instant answers now!

Numerical identification of outliers

In [link] , the first two columns are the third-exam and final-exam data. The third column shows the predicted ŷ values calculated from the line of best fit: ŷ = –173.5 + 4.83 x . The residuals, or errors, have been calculated in the fourth column of the table: observed y value−predicted y value = y ŷ .

s is the standard deviation of all the y ŷ = ε values where n = the total number of data points. If each residual is calculated and squared, and the results are added, we get the SSE. The standard deviation of the residuals is calculated from the SSE as:

s = S S E n 2

Note

We divide by ( n – 2) because the regression model involves two estimates.

Rather than calculate the value of s ourselves, we can find s using the computer or calculator. For this example, the calculator function LinRegTTest found s = 16.4 as the standard deviation of the residuals

  • 35
  • –17
  • 16
  • –6
  • –19
  • 9
  • 3
  • –1
  • –10
  • –9
  • –1
.

x y ŷ y ŷ
65 175 140 175 – 140 = 35
67 133 150 133 – 150= –17
71 185 169 185 – 169 = 16
71 163 169 163 – 169 = –6
66 126 145 126 – 145 = –19
75 198 189 198 – 189 = 9
67 153 150 153 – 150 = 3
70 163 164 163 – 164 = –1
71 159 169 159 – 169 = –10
69 151 160 151 – 160 = –9
69 159 160 159 – 160 = –1

We are looking for all data points for which the residual is greater than 2 s = 2(16.4) = 32.8 or less than –32.8. Compare these values to the residuals in column four of the table. The only such data point is the student who had a grade of 65 on the third exam and 175 on the final exam; the residual for this student is 35.

How does the outlier affect the best fit line?

Numerically and graphically, we have identified the point (65, 175) as an outlier. We should re-examine the data for this point to see if there are any problems with the data. If there is an error, we should fix the error if possible, or delete the data. If the data is correct, we would leave it in the data set. For this problem, we will suppose that we examined the data and found that this outlier data was an error. Therefore we will continue on and delete the outlier, so that we can explore how it affects the results, as a learning experience.

Compute a new best-fit line and correlation coefficient using the ten remaining points:

On the TI-83, TI-83+, TI-84+ calculators, delete the outlier from L1 and L2. Using the LinRegTTest, the new line of best fit and the correlation coefficient are:

ŷ = –355.19 + 7.39 x and r = 0.9121

The new line with r = 0.9121 is a stronger correlation than the original ( r = 0.6631) because r = 0.9121 is closer to one. This means that the new line is a better fit to the ten remaining data values. The line can better predict the final exam score given the third exam score.

Questions & Answers

how do I find the modal class
Bruce Reply
look for the highest occuring number in the class
Kusi
the probability of an event occuring is defined as?
James Reply
The probability of an even occurring is expected event÷ event being cancelled or event occurring / event not occurring
Gokuna
what is simple bar chat
Toyin Reply
Simple Bar Chart is a Diagram which shows the data values in form of horizontal bars. It shows categories along y-axis and values along x-axis. The x-axis displays above the bars and y-axis displays on left of the bars with the bars extending to the right side according to their values.
Muhammad
statistics is percentage only
Moha Reply
the first word is chance for that we use percentages
muhammad
it is not at all that statistics is a percentage only
Shambhavi
I need more examples
Luwam Reply
how to calculate sample needed
Jim Reply
mole of sample/mole ratio or Va Vb
Gokuna
how to I solve for arithmetic mean
Joe Reply
Yeah. for you to say.
James
yes
niharu
how do I solve for arithmetic mean
Joe Reply
please answer these questions
niharu
add all the data and divide by the number of data sets. For example, if test scores were 70, 60, 70, 80 the total is 280 and the total data sets referred to as N is 4. Therfore the mean or arthritmatic average is 70. I hope this helps.
Jim
*Tan A - Tan B = sin(A-B)/CosA CosB ... *2sinQ/Cos 3Q = tan 3Q - tan Q
Ibraheem Reply
standard error of sample
Umar Reply
what is subjective probability
Avela Reply
how to calculate the Steadman rank correlation
David
what is sampling? i want to know about the definition of sampling.
anup Reply
what is sample...?
Abdul Reply
In terms of Statistics or Research , It is a subset of population for measurement.
Da
can you solve this problem
Gayanne Reply
yes
Harry
which problem
Larwubah
what is the meaning of correlation ratio?
Nayeem
in 2018,walewale hospital recorded 2500cases of infection it was seen that out of this number 350 cases are rti 150 were bronchitis 300 cases were otitis media the rest were peptic ulcer cases calculate proportion of peptic ulcer and percentage of bronchitis
Ayana Reply
what is statistics
peter
yo
Kailesh
12% bronchitis cases
Rajat

Get the best Introductory statistics course in your pocket!





Source:  OpenStax, Introductory statistics. OpenStax CNX. May 06, 2016 Download for free at http://legacy.cnx.org/content/col11562/1.18
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Introductory statistics' conversation and receive update notifications?

Ask