<< Chapter < Page | Chapter >> Page > |
Applying the PCA procedure as outlined before to a set of HIV-1 samples from simulation produces 1,782-dimensional principal components. Since the physical interpretation of the PCs is quite intuitive in this case, the PC coordinates can be split in groups of 3 to obtain the $\mathrm{(x,y,z)}$ components for each of the 594 atoms. These components are 3-dimensional vectors that point in the direction each atom would follow along the first PC. In figure 6 a), the per-atom components of the first PC have been superimposed in purple.
Figure 6 b) shows the effect of interpolating HIV-1 conformations along the first PC, or first mode of motion. Starting from an aligned conformation from the original data set, multiples of the first PC can be added to produce interpolated conformations. Note that the first mode of motion corresponds mostly to the "opening" and "closing" of the flaps, as can be inferred from the relative magnitued of the first PC components in the flap region. Thus, interpolating in the direction of the first PC produces an approximation of this motion, but using only one degree of freedom. This way, the complex dynamics of the system and the 1,782 apparent degrees of freedom have been approximated by just one, effectively reducing the dimensionality of the representation.
Figure 7 (solid line) shows the residual variance left unexplained by discarding the lower-ranked principal components. Residual variance plots always decrease, and in this case, the first PC accounts for approximately 40% of the total variance along the motion (the dashed line shows the percentage of total variance explained up to the given PC). Also, the first 10 PCs account for more than 70% of the total data variance. Given the dominance of only a few degrees of freedom it is possible to represent the flexibility of the protein in a drastically reduced search space.
PCA falls in the category of what is called a linear method, since mathematically, the PCs are computed as a series of linear operations on the input coordinates. Linear methods such as PCA work well only when the collective atom motions are small (or linear), which is hardly the case for most interesting biological processes. Non-linear dimensionality reduction methods do exist, but are normally much more computationally expensive and have other disadvantages as well. However, non-linear methods are much more effective in describing complex processes using much fewer parameters.
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?