<< Chapter < Page | Chapter >> Page > |
So if – well, suppose you want to evaluate your hypothesis H at a certain point with a certain query point low K is X. Okay? And let’s say you want to know what’s the predicted value of Y at this position of X, right? So for linear regression, what we were doing was we would fit theta to minimize sum over I, YI minus theta, transpose XI squared, and return theta transpose X. Okay? So that was linear regression. In contrast, in locally weighted linear regression you’re going to do things slightly different. You’re going to look at this point X and then I’m going to look in my data set and take into account only the data points that are, sort of, in the little vicinity of X. Okay? So we’ll look at where I want to value my hypothesis. I’m going to look only in the vicinity of this point where I want to value my hypothesis, and then I’m going to take, let’s say, just these few points, and I will apply linear regression to fit a straight line just to this sub-set of the data. Okay? I’m using this sub-term sub-set – well let’s come back to that later. So we take this data set and I fit a straight line to it and maybe I get a straight line like that. And what I’ll do is then evaluate this particular value of straight line and that will be the value I return for my algorithm. I think this would be the predicted value for – this would be the value of then my hypothesis outputs in locally weighted regression. Okay?
So we’re gonna fall one up. Let me go ahead and formalize that. In locally weighted regression, we’re going to fit theta to minimize sum over I to minimize that where these terms W superscript I are called weights. There are many possible choice for ways, I’m just gonna write one down. So this E’s and minus, XI minus X squared over two. So let’s look at what these weights really are, right? So notice that – suppose you have a training example XI. So that XI is very close to X. So this is small, right? Then if XI minus X is small, so if XI minus X is close to zero, then this is E’s to the minus zero and E to the zero is one. So if XI is close to X, then WI will be close to one. In other words, the weight associated with the, I training example be close to one if XI and X are close to each other. Conversely if XI minus X is large then – I don’t know, what would WI be?
Student: Zero.
Instructor (Andrew Ng) :Zero, right. Close to zero. Right. So if XI is very far from X then this is E to the minus of some large number and E to the minus some large number will be close to zero. Okay? So the picture is, if I’m quarrying at a certain point X, shown on the X axis, and if my data set, say, look like that, then I’m going to give the points close to this a large weight and give the points far away a small weight. So for the points that are far away, WI will be close to zero. And so as if for the points that are far away, they will not contribute much at all to this summation, right? So I think this is sum over I of one times this quadratic term for points by points plus zero times this quadratic term for faraway points. And so the effect of using this weighting is that locally weighted linear regression fits a set of parameters theta, paying much more attention to fitting the points close by accurately. Whereas ignoring the contribution from faraway points. Okay? Yeah?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?