<< Chapter < Page Chapter >> Page >
max γ , w , b γ ^ | | w | | s.t. y ( i ) ( w T x ( i ) + b ) γ ^ , i = 1 , ... , m

Here, we're going to maximize γ ^ / | | w | | , subject to the functional margins all being at least γ ^ . Since the geometric and functional margins are related by γ = γ ^ / | | w | | , this will give us the answer we want. Moreover, we've gotten rid of the constraint | | w | | = 1 that we didn't like. The downside is that we now have a nasty (again, non-convex) objective γ ^ | | w | | function; and, we still don't have any off-the-shelf software that can solve this form of an optimization problem.

Let's keep going. Recall our earlier discussion that we can add an arbitrary scaling constraint on w and b without changing anything. This is the key idea we'll use now. We will introduce the scaling constraint that the functional margin of w , b with respect to the training set must be 1:

γ ^ = 1 .

Since multiplying w and b by some constant results in the functional margin being multiplied by that same constant, this is indeed a scaling constraint, and can be satisfied by rescaling w , b . Plugging this into our problem above, and noting that maximizing γ ^ / | | w | | = 1 / | | w | | is the same thing as minimizing | | w | | 2 , we now have the following optimization problem:

min γ , w , b 1 2 | | w | | 2 s.t. y ( i ) ( w T x ( i ) + b ) 1 , i = 1 , ... , m

We've now transformed the problem into a form that can be efficiently solved. The above is an optimization problem with a convex quadratic objective and only linear constraints. Itssolution gives us the optimal margin classifier . This optimization problem can be solved using commercial quadratic programming (QP) code. You may be familiar with linear programming, which solves optimization problems that have linear objectives andlinear constraints. QP software is also widely available, which allows convex quadratic objectives and linear constraints.

While we could call the problem solved here, what we will instead do is make a digression to talk about Lagrange duality. This will lead us to our optimization problem'sdual form, which will play a key role in allowing us to use kernels to get optimal margin classifiers to work efficiently in very high dimensional spaces. The dual formwill also allow us to derive an efficient algorithm for solving the above optimization problem that will typically do much better than generic QP software.

Lagrange duality

Let's temporarily put aside SVMs and maximum margin classifiers, and talk about solving constrained optimization problems.

Consider a problem of the following form:

min w f ( w ) s.t. h i ( w ) = 0 , i = 1 , ... , l .

Some of you may recall how the method of Lagrange multipliers can be used to solve it. (Don't worry if you haven't seen it before.) In this method, we definethe Lagrangian to be

L ( w , β ) = f ( w ) + i = 1 l β i h i ( w )

Here, the β i 's are called the Lagrange multipliers . We would then find and set L 's partial derivatives to zero:

L w i = 0 ; L β i = 0 ,

and solve for w and β .

In this section, we will generalize this to constrained optimization problems in which we may have inequality as well as equality constraints. Due to time constraints, we won't really beable to do the theory of Lagrange duality justice in this class, Readers interested in learning more about this topic are encouraged to read, e.g., R. T. Rockarfeller (1970), Convex Analysis,Princeton University Press. but we will give the main ideas and results, which we will then apply to our optimal margin classifier's optimization problem.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask