<< Chapter < Page | Chapter >> Page > |
How can we find the value of ? Well, is a unit-length vector pointing in the same direction as . Since represents , we therefore find that the point is given by . But this point lies on the decision boundary, and all points on the decision boundary satisfy the equation . Hence,
Solving for yields
This was worked out for the case of a positive training example at A in the figure, where being on the “positive” side of the decision boundary is good. More generally, we define the geometricmargin of with respect to a training example to be
Note that if , then the functional margin equals the geometric margin—this thus gives us a way of relating these two different notions of margin. Also, the geometric marginis invariant to rescaling of the parameters; i.e., if we replace with and with , then the geometric margin does not change.This will in fact come in handy later. Specifically, because of this invariance to the scaling of the parameters, when trying to fit and to training data, we can impose an arbitrary scaling constraint on without changing anything important; for instance, we can demand that , or , or , and any of these can be satisfied simply by rescaling and .
Finally, given a training set , we also define the geometric margin of with respect to to be the smallest of the geometric margins on the individual training examples:
Given a training set, it seems from our previous discussion that a natural desideratum is to try to find a decision boundary that maximizes the (geometric) margin, since thiswould reflect a very confident set of predictions on the training set and a good “fit” to the training data. Specifically, this will result in a classifier that separates the positiveand the negative training examples with a “gap” (geometric margin).
For now, we will assume that we are given a training set that is linearly separable; i.e., that it is possible to separate the positive and negative examples using some separating hyperplane.How we we find the one that achieves the maximum geometric margin? We can pose the following optimization problem:
I.e., we want to maximize , subject to each training example having functional margin at least . The constraint moreover ensures that the functional margin equals to the geometric margin, so we are also guaranteed that all the geometric margins are at least . Thus, solving this problem will result in with the largest possible geometric margin with respect to the training set.
If we could solve the optimization problem above, we'd be done. But the “ ” constraint is a nasty (non-convex) one, and this problem certainly isn't in any format that we can plug into standard optimizationsoftware to solve. So, let's try transforming the problem into a nicer one. Consider:
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?