<< Chapter < Page | Chapter >> Page > |
Thus, if we want to update some subject of the 's, we must update at least two of them simultaneously in order to keep satisfying the constraints. This motivatesthe SMO algorithm, which simply does the following:
To test for convergence of this algorithm, we can check whether the KKT conditions ( [link] ) are satisfied to within some . Here, is the convergence tolerance parameter, and is typically set to around 0.01 to 0.001. (See the paper and pseudocode for details.)
The key reason that SMO is an efficient algorithm is that the update to , can be computed very efficiently. Let's now briefly sketch the main ideas for deriving the efficient update.
Let's say we currently have some setting of the 's that satisfy the constraints in [link] , and suppose we've decided to hold fixed, and want to reoptimize with respect to and (subject to the constraints). From the final equation in [link] , we require that
Since the right hand side is fixed (as we've fixed ), we can just let it be denoted by some constant :
We can thus picture the constraints on and as follows:
From the constraints [link] , we know that and must lie within the box shown. Also plotted is the line , on which we know and must lie. Note also that, from these constraints, we know ; otherwise, can't simultaneously satisfy both the box and the straight line constraint. In this example, . But depending on what the line looks like, this won't always necessarily be the case; but more generally, there will besome lower-bound and some upper-bound on the permissable values for that will ensure that , lie within the box .
Using Equation [link] , we can also write as a function of :
(Check this derivation yourself; we again used the fact that so that .) Hence, the objective can be written
Treating as constants, you should be able to verify that this is just some quadratic function in . I.e., this can also be expressed in the form for some appropriate , , and . If we ignore the “box” constraints [link] (or, equivalently, that ), then we can easily maximize this quadratic function by setting its derivative to zero and solving. We'll let denote the resulting value of . You should also be able to convince yourself that if we had instead wanted to maximize with respect to but subject to the box constraint, then we can find the resulting value optimal simply by taking and “clipping” it to lie in the interval, to get
Finally, having found the , we can use Equation [link] to go back and find the optimal value of .
There're a couple more details that are quite easy but that we'll leave you to read about yourself in Platt's paper: One is the choice of the heuristics usedto select the next , to update; the other is how to update as the SMO algorithm is run.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?