<< Chapter < Page Chapter >> Page >

So some people actually do this. Apply linear regression to classification problems and sometimes it’ll work okay, but in general it’s actually a pretty bad idea to apply linear regression to classification problems like these and here’s why. Let’s say I change my training set by giving you just one more training example all the way up there, right? Imagine if given this training set is actually still entirely obvious what the relationship between X and Y is, right? It’s just – take this value as greater than Y is one and it’s less then Y is zero. By giving you this additional training example it really shouldn’t change anything. I mean, I didn’t really convey much new information. There’s no surprise that this corresponds to Y equals one. But if you now fit linear regression to this data set you end up with a line that, I don’t know, maybe looks like that, right? And now the predictions of your hypothesis have changed completely if your threshold – your hypothesis at Y equal both 0.5. Okay? So –

Student: In between there might be an interval where it’s zero, right? For that far off point?

Instructor (Andrew Ng) :Oh, you mean, like that?

Student: Right.

Instructor (Andrew Ng) :Yeah, yeah, fine. Yeah, sure. A theta set like that so. So, I guess, these just – yes, you’re right, but this is an example and this example works. This –

Student: [Inaudible] that will change it even more if you gave it all –

Instructor (Andrew Ng) :Yeah. Then I think this actually would make it even worse. You would actually get a line that pulls out even further, right? So this is my example. I get to make it whatever I want, right? But the point of this is that there’s not a deep meaning to this. The point of this is just that it could be a really bad idea to apply linear regression to classification algorithm. Sometimes it work fine, but usually I wouldn’t do it. So a couple of problems with this. One is that, well – so what do you want to do for classification? If you know the value of Y lies between zero and one then to kind of fix this problem let’s just start by changing the form of our hypothesis so that my hypothesis always lies in the unit interval between zero and one. Okay? So if I know Y is either zero or one then let’s at least not have my hypothesis predict values much larger than one and much smaller than zero. And so I’m going to – instead of choosing a linear function for my hypothesis I’m going to choose something slightly different. And, in particular, I’m going to choose this function, H subscript theta of X is going to equal to G of theta transpose X where G is going to be this function and so this becomes more than one plus theta X of theta transpose X. And G of Z is called the sigmoid function and it is often also called the logistic function. It goes by either of these names.

And what G of Z looks like is the following. So when you have your horizontal axis I’m going to plot Z and so G of Z will look like this. Okay? I didn’t draw that very well. Okay. So G of Z tends towards zero as Z becomes very small and G of Z will ascend towards one as Z becomes large and it crosses the vertical axis at 0.5. So this is what sigmoid function, also called the logistic function of. Yeah? Question?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask