Dimensionality Reduction

The main aim of dimensionality reduction is to reduce the number of dimensions (attributes/features) that will be used to learn from the data. Learning from too many dimensions can lead to irrelevant model learning. This is the curse of dimensionality.

Principal Components Analysis (PCA)

For a given attribute pair, we determine a line that can combine them and then use this new attribute instead of the original attribute pair. Then the next attribute is chosen based on the one that maximizes the variance.

Say we have 3 attributes $x_1, x_2, x_3$ . We need to find weights that maximize the variance.

First, center the data around the origin. To do so, subtract the attribute-wise mean of the data points from each example. This will create a new dataset with $x_1', x_2',...,x_d'$ .
Then, we choose the line through the origin that maximizes the variance, using a formula (shown below). This line will be a linear combination of the $x_i's$ . This is $z_1$ . This preserves the variance in the horizontal direction.
For example, consider a YouTube video dataset, if the attributes are #comments and #clicks, then $z_1$ may model the popularity:
Now, we choose another line $z_2$ that is orthogonal to $z_1$ . This allows us to preserve the variance in the vertical direction.

Therefore, PCA produces attributes that better model the data than the original attributes.

How to find $z_1$ ?

Compute the sample covariance matrix of the data points $[x_1',x_2',...x_d']$ , given by $\begin{bmatrix} \sigma_1^2 & cov(x_1,x_2)\\cov(x_1,x_2) & \sigma_2^2\end{bmatrix}$ and compute its eigenvalues and eigenvectors. The highest eigenvalue is called the principal eigenvalue ( $\lambda_1$ ).

$\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_d$ are the eigenvalues associated with eigenvectors $\vec{V_1}, \vec{V_2}, ..., \vec{V_d}$ .

Then, use this formula: $z_1 = \vec{V_1}^T.X'$

PreviousBias and Variance NextNeural Networks

Last updated 4 years ago

Principal Components Analysis (PCA)

Say we have 3 attributes

x_1, x_2, x_3

. We need to find weights that maximize the variance.

First, center the data around the origin. To do so, subtract the attribute-wise mean of the data points from each example. This will create a new dataset with $x_1', x_2',...,x_d'$ .

Then, we choose the line through the origin that maximizes the variance, using a formula (shown below). This line will be a linear combination of the $x_i's$ . This is $z_1$ . This preserves the variance in the horizontal direction.

For example, consider a YouTube video dataset, if the attributes are #comments and #clicks, then $z_1$ may model the popularity:

Now, we choose another line $z_2$ that is orthogonal to $z_1$ . This allows us to preserve the variance in the vertical direction.

Therefore, PCA produces attributes that better model the data than the original attributes.

How to find

z_1

Compute the sample covariance matrix of the data points

[x_1',x_2',...x_d']

, given by

\begin{bmatrix} \sigma_1^2 & cov(x_1,x_2)\\cov(x_1,x_2) & \sigma_2^2\end{bmatrix}

and compute its eigenvalues and eigenvectors. The highest eigenvalue is called the principal eigenvalue (

\lambda_1

\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_d

are the eigenvalues associated with eigenvectors

\vec{V_1}, \vec{V_2}, ..., \vec{V_d}

Then, use this formula:

z_1 = \vec{V_1}^T.X'

Principal Components Analysis (PCA)

How to find z1z_1z1​?

Principal Components Analysis (PCA)

How to find z1z_1z1​?

How to find $z_1$ ?

How to find $z_1$ ?