<< Chapter < Page Chapter >> Page >
This module provides an overview of Linear Regression and Correlation: The Regression Equation as a part of R. Bloom's custom Collaborative Statistics collection col10617. It has been modified from the original module m17090 in the Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean. This module now includes instructions for finding and graphing the regression equation and scatterplot using the LinRegTTest on the TI-83,83+,84+ calculators.

Understanding the regression equation

Data rarely fit a straight line exactly. Usually, you must be satisfied with rough predictions. Typically, you have a set of data whose scatter plot appears to "fit" a straight line. This is called a Line of Best Fit or Least Squares Line .

A random sample of 11 statistics students produced the following data where x is the third exam score, out of 80, and y is the final exam score, out of 200. Can you predict the final exam score of a random student if you know the third exam score?

x (third exam score) y (final exam score)
65 175
67 133
71 185
71 163
66 126
75 198
67 153
70 163
71 159
69 151
69 159
Table showing the scores on the final exam based on scores from the third exam.
Scatterplot of exam scores with the third exam score on the x-axis and the final exam score on the y-axis.
Scatter plot showing the scores on the final exam based on scores from the third exam.

The third exam score, x , is the independent variable and the final exam score, y , is the dependent variable. We will plot a regression line that best "fits" the data. If each of youwere to fit a line "by eye", you would draw different lines. We can use what is called a least-squares regression line to obtain the best fit line.

Consider the following diagram. Each point of data is of the the form ( x , y ) and each point of the line of best fit using least-squares linear regression has the form ( x , y ^ ) .

The y ^ is read "y hat" and is the estimated value of y . It is the value of y obtained using the regression line. It is not generally equal to the observed y from data.

Scatterplot of the exam scores with a line of best fit tying in the relationship between the third exam and final exam scores. A specific point on the line, specific data point, and the distance between these two points are used in order to show an example of how to compute the sum of squared errors in order to find the points on the line of best fit.

The term y y ^ is called the residual . It is the observed y value − the predicted y ^ value. It can also be called the "error".It is not an error in the sense of a mistake, but measures the vertical distance between the observed value y and the estimated value y ^ . In other words, it measures the vertical distance between the actual data point and the predicted point on the line.

If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y . In the observed data point lies below the line, the residual is negative, and the line overestimates that actual data value for y .

In the Figure 2 diagram above, y 0 - y ^ 0 = ε 0 is the residual for the point shown. Here the point lies above the line and the residual is positive.

ε = the Greek letter epsilon

For each data point, you can calculate the residuals or errors, y i - y ^ i = ε i for i = 1, 2, 3, ..., 11 .

Each ε is a vertical distance.

For the example about the third exam scores and the final exam scores for the 11 statistics students, there are 11 data points. Therefore, there are 11 ε values. If you square each ε and add, you get

( ε 1 ) 2 + ( ε 2 ) 2 + ... + ( ε 11 ) 2 = Σ i = 1 11 ε 2

This is called the Sum of Squared Errors (SSE) .

Using calculus, you can determine the values of a and b that make the SSE a minimum. When you make the SSE a minimum, you have determined the points that are on the line of best fit. It turns out thatthe line of best fit has the equation:

Questions & Answers

the art of managing the production, distribution and consumption.
Satangthem Reply
what is economics
Khawar Reply
what is Open Market Operation
Adu Reply
dominating middlemen men activities circumstances
Christy Reply
what Equilibrium price
Adji Reply
what is gap
who is good with the indifference curve
What is diseconomic
Alixe Reply
what are the types of goods
how can price determination be the central problem of micro economics
simon Reply
marginal cost formula
Nandu Reply
you should differentiate the total cost function in order to get marginal cost function then you can get marginal cost from it
What about total cost
how can price determination be the central problem if micro economics
formula of cross elasticity of demand
Theresia Reply
what is ceteris paribus
Priyanka Reply
what is ceteris parabus
Ceteris paribus - Literally, "other things being equal"; usually used in economics to indicate that all variables except the ones specified are assumed not to change.
What is broker
land is natural resources that is made by nature
What is broker
what is land
What is broker
land is natural resources that is made by nature
whats poppina nigga turn it up for a minute get it
amarsyaheed Reply
what is this?
am from nigeria@ pilo
am from nigeria@ pilo
what is production possibility frontier
it's a summary of opportunity cost depicted on a curve.
please help me solve this question with the aid of appropriate diagrams explain how each of the following changes will affect the market price and quantity of bread 1. A
Manuela Reply
please l need past question about economics
Prosper Reply
ok let me know some of the questions please.
ok am not wit some if den nw buh by tommorow I shall get Dem
Hi guys can I get Adam Smith's WEALTH OF NATIONS fo sale?
hello I'm Babaisa alhaji Mustapha. I'm studying Economics in the university of Maiduguri
my name is faisal Yahaya. i studied economics at Kaduna state university before proceeding to West African union university benin republic for masters
Hi guys..I am from Bangladesh..
Wat d meaning of management
igwe Reply
disaster management cycle
Gogul Reply
cooperate social responsibility
Fedric Wilson Taylor also define management as the act of knowing what to do and seeing that it is done in the best and cheapest way
Difference between extinct and extici spicies
Amanpreet Reply
Researchers demonstrated that the hippocampus functions in memory processing by creating lesions in the hippocampi of rats, which resulted in ________.
Mapo Reply
The formulation of new memories is sometimes called ________, and the process of bringing up old memories is called ________.
Mapo Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, Collaborative statistics: custom version modified by v moyle. OpenStax CNX. Nov 14, 2010 Download for free at http://legacy.cnx.org/content/col11238/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Collaborative statistics: custom version modified by v moyle' conversation and receive update notifications?