<< Chapter < Page Chapter >> Page >

Principal components analysis

In our discussion of factor analysis, we gave a way to model data x R n as “approximately” lying in some k -dimension subspace, where k n . Specifically, we imagined that each point x ( i ) was created by first generating some z ( i ) lying in the k -dimension affine space { Λ z + μ ; z R k } , and then adding Ψ -covariance noise. Factor analysis is based on a probabilistic model, and parameter estimationused the iterative EM algorithm.

In this set of notes, we will develop a method, Principal Components Analysis (PCA), that also tries to identify the subspace in which the data approximately lies.However, PCA will do so more directly, and will require only an eigenvector calculation (easily done with the eig function in Matlab), and does not need to resort to EM.

Suppose we are given a dataset { x ( i ) ; i = 1 , ... , m } of attributes of m different types of automobiles, such as their maximum speed, turn radius, and so on. Let x ( i ) R n for each i ( n m ). But unknown to us, two different attributes—some x i and x j —respectively give a car's maximum speed measured in miles per hour, and the maximum speed measured in kilometers per hour.These two attributes are therefore almost linearly dependent, up to only small differences introduced by rounding off to the nearest mph or kph. Thus, the data really lies approximatelyon an n - 1 dimensional subspace. How can we automatically detect, and perhaps remove, this redundancy?

For a less contrived example, consider a dataset resulting from a survey of pilots for radio-controlled helicopters,where x 1 ( i ) is a measure of the piloting skill of pilot i , and x 2 ( i ) captures how much he/she enjoys flying. Because RC helicopters are very difficult to fly,only the most committed students, ones that truly enjoy flying, become good pilots. So, the two attributes x 1 and x 2 are strongly correlated. Indeed, we might posit that that the data actually likes along some diagonalaxis (the u 1 direction) capturing the intrinsic piloting “karma” of a person, with only a small amount of noise lying off this axis. (See figure.) How can we automatically compute this u 1 direction?

skill vs enjoyment: positive correlation

We will shortly develop the PCA algorithm. But prior to running PCA per se, typically we first pre-processthe data to normalize its mean and variance, as follows:

  1. Let μ = 1 m i = 1 m x ( i ) .
  2. Replace each x ( i ) with x ( i ) - μ .
  3. Let σ j 2 = 1 m i ( x j ( i ) ) 2
  4. Replace each x j ( i ) with x j ( i ) / σ j .

Steps (1-2) zero out the mean of the data, and may be omitted for data known to have zero mean (for instance, time series corresponding to speech or other acoustic signals).Steps (3-4) rescale each coordinate to have unit variance, which ensures that different attributesare all treated on the same “scale.” For instance, if x 1 was cars' maximum speed in mph (taking values in the high tens or low hundreds) and x 2 were the number of seats (taking values around 2-4), then this renormalization rescales the different attributes to make themmore comparable. Steps (3-4) may be omitted if we had apriori knowledge that the different attributes are all on the same scale. One example of this is if each data point represented agrayscale image, and each x j ( i ) took a value in { 0 , 1 , ... , 255 } corresponding to the intensity value of pixel j in image i .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask