<< Chapter < Page Chapter >> Page >

MachineLearning-Lecture11

Instructor (Andrew Ng) :Okay. Good morning. Welcome back.

What I want to do today is actually wrap up our discussion on learning theory and sort of on – and I’m gonna start by talking about Bayesian statistics and regularization, and then take a very brief digression to tell you about online learning. And most of today’s lecture will actually be on various pieces of that, so applying machine learning algorithms to problems like, you know, like the project or other problems you may go work on after you graduate from this class.

But let’s start the talk about Bayesian statistics and regularization. So you remember from last week, we started to talk about learning theory and we learned about bias and variance. And I guess in the previous lecture, we spent most of the previous lecture talking about algorithms for model selection and for feature selection. We talked about cross-validation. Right?

So most of the methods we talked about in the previous lecture were ways for you to try to simply the model. So for example, the feature selection algorithms we talked about gives you a way to eliminate a number of features, so as to reduce the number of parameters you need to fit and thereby reduce overfitting. Right? You remember that? So feature selection algorithms choose a subset of the features so that you have less parameters and you may be less likely to overfit. Right?

What I want to do today is to talk about a different way to prevent overfitting. And there’s a method called regularization and there’s a way that lets you keep all the parameters. So here’s the idea, and I’m gonna illustrate this example with, say, linear regression. So you take the linear regression model, the very first model we learned about, right, we said that we would choose the parameters via maximum likelihood. Right? And that meant that, you know, you would choose the parameters theta that maximized the probability of the data, which is parameters theta that maximized the probability of the data we observe. Right?

And so to give this sort of procedure a name, this is one example of most common frequencies procedure, and frequency, you can think of sort of as maybe one school of statistics. And the philosophical view behind writing this down was we envisioned that there was some true parameter theta out there that generated, you know, the Xs and the Ys. There’s some true parameter theta that govern housing prices, Y is a function of X, and we don’t know what the value of theta is, and we’d like to come up with some procedure for estimating the value of theta. Okay? And so, maximum likelihood is just one possible procedure for estimating the unknown value for theta.

And the way you formulated this, you know, theta was not a random variable. Right? That’s what why said, so theta is just some true value out there. It’s not random or anything, we just don’t know what it is, and we have a procedure called maximum likelihood for estimating the value for theta. So this is one example of what’s called a frequencies procedure.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask