# 12.6 Outliers  (Page 2/11)

 Page 2 / 11

## Try it

Identify the potential outlier in the scatter plot. The standard deviation of the residuals or errors is approximately 8.6.

The outlier appears to be at (6, 58). The expected y value on the line for the point (6, 58) is approximately 82. Fifty-eight is 24 units from 82. Twenty-four is more than two standard deviations (2 s = (2)(8.6) = 17.2 ). So 82 is more than two standard deviations from 58, which makes (6, 58) a potential outlier.

## Numerical identification of outliers

In [link] , the first two columns are the third-exam and final-exam data. The third column shows the predicted ŷ values calculated from the line of best fit: ŷ = –173.5 + 4.83 x . The residuals, or errors, have been calculated in the fourth column of the table: observed y value−predicted y value = y ŷ .

s is the standard deviation of all the y ŷ = ε values where n = the total number of data points. If each residual is calculated and squared, and the results are added, we get the SSE. The standard deviation of the residuals is calculated from the SSE as:

$s=\sqrt{\frac{SSE}{n-2}}$

## Note

We divide by ( n – 2) because the regression model involves two estimates.

Rather than calculate the value of s ourselves, we can find s using the computer or calculator. For this example, the calculator function LinRegTTest found s = 16.4 as the standard deviation of the residuals

• 35
• –17
• 16
• –6
• –19
• 9
• 3
• –1
• –10
• –9
• –1
.

x y ŷ y ŷ
65 175 140 175 – 140 = 35
67 133 150 133 – 150= –17
71 185 169 185 – 169 = 16
71 163 169 163 – 169 = –6
66 126 145 126 – 145 = –19
75 198 189 198 – 189 = 9
67 153 150 153 – 150 = 3
70 163 164 163 – 164 = –1
71 159 169 159 – 169 = –10
69 151 160 151 – 160 = –9
69 159 160 159 – 160 = –1

We are looking for all data points for which the residual is greater than 2 s = 2(16.4) = 32.8 or less than –32.8. Compare these values to the residuals in column four of the table. The only such data point is the student who had a grade of 65 on the third exam and 175 on the final exam; the residual for this student is 35.

## How does the outlier affect the best fit line?

Numerically and graphically, we have identified the point (65, 175) as an outlier. We should re-examine the data for this point to see if there are any problems with the data. If there is an error, we should fix the error if possible, or delete the data. If the data is correct, we would leave it in the data set. For this problem, we will suppose that we examined the data and found that this outlier data was an error. Therefore we will continue on and delete the outlier, so that we can explore how it affects the results, as a learning experience.

## Compute a new best-fit line and correlation coefficient using the ten remaining points:

On the TI-83, TI-83+, TI-84+ calculators, delete the outlier from L1 and L2. Using the LinRegTTest, the new line of best fit and the correlation coefficient are:

ŷ = –355.19 + 7.39 x and r = 0.9121

The new line with r = 0.9121 is a stronger correlation than the original ( r = 0.6631) because r = 0.9121 is closer to one. This means that the new line is a better fit to the ten remaining data values. The line can better predict the final exam score given the third exam score.

what is standard deviation?
It is the measure of the variation of certain values from the Mean (Center) of a frequency distribution of sample values for a particular Variable.
Dominic
Yeah....the simplest one
IRFAN
what is the number of x
10
Elicia
Javed Arif
Jawed
how will you know if a group of data set is a sample or population
population is the whole set and the sample is the subset of population.
umair
if the data set is drawn out of a larger set it is a sample and if it is itself the whole complete set it can be treated as population.
Bhavika
hello everyone if I have the data set which contains measurements of each part during 10 years, may I say that it's the population or it's still a sample because it doesn't contain my measurements in the future? thanks
Alexander
Pls I hv a problem on t test is there anyone who can help?
Peggy
Dominic
Bhavika is right
Dominic
what is the problem peggy?
Bhavika
hi
Sandeep
Hello
hi
Bhavika
hii Bhavika
Dar
Hi eny population has a special definition. if that data set had all of characteristics of definition, that is population. otherwise that is a sample
Hoshyar
three coins are tossed. find the probability of no head
three coins are tossed consecutively or what ?
umair
umair
or .125 is the probability of getting no head when 3 coins are tossed
umair
🤣🤣🤣
Simone
what is two tailed test
if the diameter will be greater than 3 cm then the bullet will not fit in the barrel of the gun so you are bothered for both the sides.
umair
in this test you are worried on both the ends
umair
lets say you are designing a bullet for thw gun od diameter equals 3cm.if the diameter of the bullet is less than 3 cm then you wont be able to shoot it
umair
In order to apply weddles rule for numerical integration what is minimum number of ordinates
excuse me?
Gabriel
why?
didn't understand the question though.
Gabriel
which question? ?
We have rules of numerical integration like Trapezoidal rule, Simpson's 1/3 and 3/8 rules, Boole's rule and Weddle rule for n =1,2,3,4 and 6 but for n=5?
John
geometric mean of two numbers 4 and 16 is:
10
umair
really
iphone
quartile deviation of 8 8 8 is:
iphone
sorry 8 is the geometric mean of 4,16
umair
quartile deviation of 8 8 8 is
iphone
can you please expalin the whole question ?
umair
mcq
iphone
h
iphone
can you please post the picture of that ?
umair
how
iphone
hello
John
10 now
John
how to find out the value
can you be more specific ?
umair
yes
KrishnaReddy
what is the difference between inferential and descriptive statistics
descriptive statistics gives you the result on the the data like you can calculate various things like variance,mean,median etc. however, inferential stats is involved in prediction of future trends using the previous stored data.
umair
if you need more help i am up for the help.
umair
Thanks a lot
Anjali
Inferential Statistics involves drawing conclusions on a population based on analysis of a sample. Descriptive statistics summarises or describes your current data as numerical calculations or graphs.
fred
my pleasure😊. Helping others offers me satisfaction 😊
umair
for poisson distribution mean............variance.
both are equal to mu
Faizan
mean=variance
Faizan
what is a variable
something that changes
Festus
why we only calculate 4 moment of mean? asked in papers.
why we only 4 moment of mean ? asked in BA exam
Faizan
Hello, can you please share the possible questions that are likely to be examined under the topic: regression and correlation analysis.
Refiloe
for normal distribution mean is 2 & variance is 4 find mu 4?
repeat quastion again
Yusuf
find mu 4. it can be wrong but want to prove how.
Faizan
for a normal distribution if mu 4 is 12 then find mu 3?
Question hi wrong ha
Tahir
ye BA mcqs me aya he teen he. 2dafa aya he
Faizan
if X is normally distributed. (n,b). then its mean deviation is?
Faizan
The answer is zero, because all odd ordered central moments of a normal distribution are Zero.
nikita
which question is zero
Faizan
sorry it is (5,16) in place of (n,b)
Faizan
I got. thanks. it is zero.
Faizan
a random variable having binomial distribution is?
Bokaho