<< Chapter < Page | Chapter >> Page > |
And in a scenario that I want to consider, sometimes Phi of X will be very high dimensional, and in fact sometimes Phi of X – so for example, Phi of X may contain very high degree polynomial features. Sometimes Phi of X will actually even be an infinite dimensional vector of features, and the question is if Phi of X is an extremely high dimensional, then you can’t actually compute to these inner products very efficiently, it seems, because computers need to represent an extremely high dimensional feature vector and then take [inaudible] inefficient.
It turns out that in many important special cases, we can write down – let’s call the kernel function, denoted by K, which will be this, which would be inner product between those feature vectors. It turns out there will be important special cases where computing Phi of X is computationally very expensive – maybe is impossible.
There’s an infinite dimensional vector, and you can’t compute infinite dimensional vectors. There will be important special cases where Phi of X is very expensive to represent because it is so high dimensional, but nonetheless, you can actually compute a kernel between XI and XJ. You can compute the inner product between these two vectors very inexpensively.
And so the idea of the support vector machine is that everywhere in the algorithm that you see these inner products, we’re going to replace it with a kernel function that you can compute efficiently, and that lets you work in feature spaces Phi of X even if Phi of X are very high dimensional. Let me now say how that’s done. A little bit later today, we’ll actually see some concrete examples of Phi of X and of kernels. For now, let’s just think about constructing kernels explicitly. This best illustrates my example.
Let’s say you have two inputs, X and Z. Normally I should write those as XI and XJ, but I’m just going to write X and Z to save on writing. Let’s say my kernel is K of X, Z equals X transpose Z squared. And so this is – right? X transpose Z – this thing here is X transpose Z and this thing is X transpose Z, so this is X transpose Z squared. And that’s equal to that. And so this kernel corresponds to the feature mapping where Phi of X is equal to – and I’ll write this down for the case of N equals free, I guess.
And so with this definition of Phi of X, you can verify for yourself that this thing becomes the inner product between Phi of X and Phi of Z, because to get an inner product between two vectors is – you can just take a sum of the corresponding elements of the vectors. You multiply them. So if this is Phi of X, then the inner product between Phi of X and Phi of Z will be the sum over all the elements of this vector times the corresponding elements of Phi of Z, and what you get is this one.
And so the cool thing about this is that in order to compute Phi of X, you need [inaudible] just to compute Phi of X. If N is a dimension of X and Z, then Phi of X is a vector of all pairs of XI XJ multiplied of each other, and so the length of Phi of X is N squared. You need order N squared time just to compute Phi of X.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?