<< Chapter < Page Chapter >> Page >

Factor analysis

When we have data x ( i ) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In thissetting, we usually imagine problems where we have sufficient data to be able to discern the multiple-Gaussian structure in the data. For instance,this would be the case if our training set size m was significantly larger than the dimension n of the data.

Now, consider a setting in which n m . In such a problem, it might be difficult to modelthe data even with a single Gaussian, much less a mixture of Gaussian. Specifically, since the m data points span only a low-dimensional subspace of R n , if we model the data as Gaussian, and estimate the mean and covariance using the usualmaximum likelihood estimators,

μ = 1 m i = 1 m x ( i ) Σ = 1 m i = 1 m ( x ( i ) - μ ) ( x ( i ) - μ ) T ,

we would find that the matrix Σ is singular. This means that Σ - 1 does not exist, and 1 / | Σ | 1 / 2 = 1 / 0 . But both of these terms are needed in computing the usual density of a multivariate Gaussian distribution. Another wayof stating this difficulty is that maximum likelihood estimates of the parameters result in a Gaussian that places all of its probability in the affine space spannedby the data, This is the set of points x satisfying x = i = 1 m α i x ( i ) , for some α i 's so that i = 1 m α 1 = 1 . and this corresponds to a singular covariance matrix.

More generally, unless m exceeds n by some reasonable amount, the maximum likelihood estimates of the mean and covariance may be quite poor. Nonetheless, we would still like tobe able to fit a reasonable Gaussian model to the data, and perhaps capture some interesting covariance structure in the data. How can we do this?

In the next section, we begin by reviewing two possible restrictions on Σ , ones that allow us to fit Σ with small amounts of data but neither of which will give a satisfactory solution to our problem. We next discuss someproperties of Gaussians that will be needed later; specifically, how to find marginal and conditonal distributions of Gaussians. Finally, we present the factor analysis model,and EM for it.

Restrictions of Σ

If we do not have sufficient data to fit a full covariance matrix, we may place some restrictions on the space of matrices Σ that we will consider. For instance, we maychoose to fit a covariance matrix Σ that is diagonal. In this setting, the reader may easily verify that the maximum likelihood estimate of the covariance matrix is given by thediagonal matrix Σ satisfying

Σ j j = 1 m i = 1 m ( x j ( i ) - μ j ) 2 .

Thus, Σ j j is just the empirical estimate of the variance of the j -th coordinate of the data.

Recall that the contours of a Gaussian density are ellipses. A diagonal Σ corresponds to a Gaussian where the major axes of these ellipses are axis-aligned.

Sometimes, we may place a further restriction on the covariance matrix that not only must it be diagonal, but its diagonal entries must all be equal. In this setting,we have Σ = σ 2 I , where σ 2 is the parameter under our control. The maximum likelihood estimate of σ 2 can be found to be:

σ 2 = 1 m n j = 1 n i = 1 m ( x j ( i ) - μ j ) 2 .

This model corresponds to using Gaussians whose densities have contours that are circles (in 2 dimensions; or spheres/hyperspheres in higher dimensions).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask