Dimensionality Reduction

The main aim of dimensionality reduction is to reduce the number of dimensions (attributes/features) that will be used to learn from the data. Learning from too many dimensions can lead to irrelevant model learning. This is the curse of dimensionality.

Principal Components Analysis (PCA)

For a given attribute pair, we determine a line that can combine them and then use this new attribute instead of the original attribute pair. Then the next attribute is chosen based on the one that maximizes the variance.

Say we have 3 attributes x1,x2,x3x_1, x_2, x_3. We need to find weights that maximize the variance.

  • First, center the data around the origin. To do so, subtract the attribute-wise mean of the data points from each example. This will create a new dataset with x1,x2,...,xdx_1', x_2',...,x_d'.

  • Then, we choose the line through the origin that maximizes the variance, using a formula (shown below). This line will be a linear combination of the xisx_i's. This is z1z_1. This preserves the variance in the horizontal direction.

    For example, consider a YouTube video dataset, if the attributes are #comments and #clicks, then z1z_1 may model the popularity:

  • Now, we choose another line z2z_2 that is orthogonal to z1z_1. This allows us to preserve the variance in the vertical direction.

Therefore, PCA produces attributes that better model the data than the original attributes.

How to find z1z_1?

Compute the sample covariance matrix of the data points [x1,x2,...xd][x_1',x_2',...x_d'], given by [σ12cov(x1,x2)cov(x1,x2)σ22]\begin{bmatrix} \sigma_1^2 & cov(x_1,x_2)\\cov(x_1,x_2) & \sigma_2^2\end{bmatrix} and compute its eigenvalues and eigenvectors. The highest eigenvalue is called the principal eigenvalue (λ1\lambda_1).

λ1λ2...λd\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_d are the eigenvalues associated with eigenvectors V1,V2,...,Vd\vec{V_1}, \vec{V_2}, ..., \vec{V_d} .

Then, use this formula: z1=V1T.Xz_1 = \vec{V_1}^T.X'

Last updated