<< Chapter < Page Chapter >> Page >

And then after seeing the Ith training example, you’d update the parameters, you know, using – you’ve see this reel a lot of times now, right, using the standard perceptron learning rule. And the same thing, if you were using logistic regression you can then, again, after seeing each training example, just run, you know, essentially run one-step stochastic gradient descent on just the example you saw. Okay?

And so the reason I’ve put this into the sort of “learning theory” section of this class was because it turns that sometimes you can prove fairly amazing results on your total online error using algorithms like these. I will actually – I don't actually want to spend the time in the main lecture to prove this, but, for example, you can prove that when you use the perceptron algorithm, then even when the features XI, maybe infinite dimensional feature vectors, like we saw for simple vector machines. And sometimes, infinite feature dimensional vectors may use kernel representations. Okay?

But so it turns out that you can prove that when you a perceptron algorithm, even when the data is maybe extremely high dimensional and it seems like you’d be prone to overfitting, right, you can prove that so as the long as the positive and negative examples are separated by a margin, right.

So in this infinite dimensional space, so long as, you know, there is some margin down there separating the positive and negative examples, you can prove that perceptron algorithm will converge to a hypothesis that perfectly separates the positive and negative examples. Okay? And then so after seeing only a finite number of examples, it’ll converge to digital boundary that perfectly separates the positive and negative examples, even though you may in an infinite dimensional space. Okay?

So let’s see. The proof itself would take me sort of almost an entire lecture to do, and there are sort of other things that I want to do more than that. So you want to see the proof of this yourself, it’s actually written up in the lecture notes that I posted online. For the purposes of this class’ syllabus, the proof of this result, you can treat this as optional reading. And by that, I mean, you know, it won’t appear on the midterm and you won’t be asked about this specifically in the problem sets, but I thought it’d be –

I know some of you are curious after the previous lecture so why you can prove that, you know, SVMs can have bounded VC dimension, even in these infinite dimensional spaces, and how do you prove things in these – how do you prove learning theory results in these infinite dimensional feature spaces. And so the perceptron bound that I just talked about was the simplest instance I know of that you can sort of read in like half an hour and understand it.

So if you’re interested, there are lecture notes online for how this perceptron bound is actually proved. It’s a very [inaudible], you can prove it in like a page or so, so go ahead and take a look at that if you’re interested. Okay?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask