CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page
  • Principal Components Analysis (PCA)
  • How to find ?

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning

Dimensionality Reduction

PreviousBias and VarianceNextNeural Networks

Last updated 5 years ago

Was this helpful?

The main aim of dimensionality reduction is to reduce the number of dimensions (attributes/features) that will be used to learn from the data. Learning from too many dimensions can lead to irrelevant model learning. This is the curse of dimensionality.

Principal Components Analysis (PCA)

For a given attribute pair, we determine a line that can combine them and then use this new attribute instead of the original attribute pair. Then the next attribute is chosen based on the one that maximizes the variance.

Say we have 3 attributes x1,x2,x3x_1, x_2, x_3x1​,x2​,x3​. We need to find weights that maximize the variance.

  • First, center the data around the origin. To do so, subtract the attribute-wise mean of the data points from each example. This will create a new dataset with x1′,x2′,...,xd′x_1', x_2',...,x_d'x1′​,x2′​,...,xd′​.

  • Then, we choose the line through the origin that maximizes the variance, using a formula (shown below). This line will be a linear combination of the xi′sx_i'sxi′​s. This is z1z_1z1​. This preserves the variance in the horizontal direction.

    For example, consider a YouTube video dataset, if the attributes are #comments and #clicks, then z1z_1z1​ may model the popularity:

  • Now, we choose another line z2z_2z2​ that is orthogonal to z1z_1z1​. This allows us to preserve the variance in the vertical direction.

Therefore, PCA produces attributes that better model the data than the original attributes.

How to find z1z_1z1​?

Compute the sample covariance matrix of the data points [x1′,x2′,...xd′][x_1',x_2',...x_d'][x1′​,x2′​,...xd′​], given by [σ12cov(x1,x2)cov(x1,x2)σ22]\begin{bmatrix} \sigma_1^2 & cov(x_1,x_2)\\cov(x_1,x_2) & \sigma_2^2\end{bmatrix}[σ12​cov(x1​,x2​)​cov(x1​,x2​)σ22​​] and compute its eigenvalues and eigenvectors. The highest eigenvalue is called the principal eigenvalue (λ1\lambda_1λ1​).

λ1≥λ2≥...≥λd\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_dλ1​≥λ2​≥...≥λd​ are the eigenvalues associated with eigenvectors V1⃗,V2⃗,...,Vd⃗\vec{V_1}, \vec{V_2}, ..., \vec{V_d}V1​​,V2​​,...,Vd​​ .

Then, use this formula: z1=V1⃗T.X′z_1 = \vec{V_1}^T.X'z1​=V1​​T.X′