<< Chapter < Page | Chapter >> Page > |
But to compute K – to compute the kernel function, all you need is order N time, because the kernel function is defined as X transpose Z squared, so you just take the inner product between X and Z, which is order N time and you square that and you’ve computed this kernel function, and so you just computed the inner product between two vectors where each vector has N squared elements, but you did it in N square time.
Student: For any kernel you find for X and Z, does Phi exist for X and Z?
Instructor (Andrew Ng) :Let me talk about that later. We’ll talk about what is a valid kernel later. Please raise your hand if this makes sense. So let me just describe a couple of quick generalizations to this. One is that if you define KXZ to be equal to X transpose Z plus C squared, so again, you can compute this kernel in order N time, then that turns out to correspond to a feature vector where I’m just going to add a few more elements at the bottom where you add root 2. Let me read that. That was root 2 CX1 root 2 CX2 root 2 CX3 and C.
And so this is a way of creating a feature vector with both the monomials, meaning the first order terms, as well as the quadratic or the inner product terms between XI and XJ, and the parameter C here allows you to control the relative waiting between the monomial terms, so the first order terms, and the quadratic terms. Again, this is still inner product between vectors of length and square [inaudible] in order N time.
More generally, here are some other examples of kernels. Actually, a generalization of the one I just derived right now would be the following kernel. And so this corresponds to using all N plus DQZ features of all monomials. Monomials just mean the products of XI XJ XK. Just all the polynomial terms up to degree D and plus [inaudible] so on the order of N plus D to the power of D, so this grows exponentially in D.
This is a very high dimensional feature vector, but again, you can implicitly construct the feature vector and take inner products between them. It’s very computationally efficient, because you just compute the inner product between X and Z, add C and you take that real number to the power of D and by plugging this in as a kernel, you’re implicitly working in an extremely high dimensional computing space.
So what I’ve given is just a few specific examples of how to create kernels. I want to go over just a few specific examples of kernels. So let’s you ask you more generally if you’re faced with a new machine-learning problem, how do you come up with a kernel? There are many ways to think about it, but here’s one intuition that’s sort of useful. So given a set of attributes of X, you’re going to use a feature vector of Phi of X and given a set of attributes Z, you’re going to use an input feature vector Phi of Z, and so the kernel is computing the inner product between Phi of X and Phi of Z.
And so one intuition – this is a partial intuition. This isn’t as rigorous intuition that it is used for. It is that if X and Z are very similar, then Phi of X and Phi of Z will be pointing in the same direction, and therefore the inner product would be large. Whereas in contrast, if X and Z are very dissimilar, then Phi of X and Phi of Z may be pointing different directions, and so the inner product may be small. That intuition is not a rigorous one, but it’s sort of a useful one to think about.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?