Dimensionality Reduction
Last updated
Last updated
The main aim of dimensionality reduction is to reduce the number of dimensions (attributes/features) that will be used to learn from the data. Learning from too many dimensions can lead to irrelevant model learning. This is the curse of dimensionality.
For a given attribute pair, we determine a line that can combine them and then use this new attribute instead of the original attribute pair. Then the next attribute is chosen based on the one that maximizes the variance.
Say we have 3 attributes . We need to find weights that maximize the variance.
First, center the data around the origin. To do so, subtract the attribute-wise mean of the data points from each example. This will create a new dataset with .
Then, we choose the line through the origin that maximizes the variance, using a formula (shown below). This line will be a linear combination of the . This is . This preserves the variance in the horizontal direction.
For example, consider a YouTube video dataset, if the attributes are #comments and #clicks, then may model the popularity:
Now, we choose another line that is orthogonal to . This allows us to preserve the variance in the vertical direction.
Therefore, PCA produces attributes that better model the data than the original attributes.
Compute the sample covariance matrix of the data points , given by and compute its eigenvalues and eigenvectors. The highest eigenvalue is called the principal eigenvalue ().
are the eigenvalues associated with eigenvectors .
Then, use this formula: