CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning
  3. Parametric Estimation

Parametric Estimation for Multivariate Linear Regression

x=[x1...xd]x=\begin{bmatrix}x_1\\.\\.\\.\\x_d\end{bmatrix}x=​x1​...xd​​​

We need to find the parameters W=[w0...wd]W=\begin{bmatrix}w_0\\.\\.\\.\\w_d\end{bmatrix}W=​w0​...wd​​​

so that the linear function g(x∣wd,wd−1,...,w1,w0)=wdxd+wd−1xd−1+...+w1x1+w0g(x|w_d,w_{d-1},...,w_1,w_0) = w_dx_d + w_{d-1}x_{d-1}+...+w_1x_1+w_0g(x∣wd​,wd−1​,...,w1​,w0​)=wd​xd​+wd−1​xd−1​+...+w1​x1​+w0​

minimizes the square error on the dataset {xt,rt}t=1N\{x^t,r^t\}_{t=1}^N{xt,rt}t=1N​ where xt=[x1tx2t...xdt]x^t=\begin{bmatrix}x_1^t\\x_2^t\\.\\.\\.\\x_d^t\end{bmatrix}xt=​x1t​x2t​...xdt​​​

Let D=[1x11x21...xd11x12x22...xd2.....................1x1Nx2N...xdN]N×(d+1)D=\begin{bmatrix}1 & x_1^1&x_2^1&...&x_d^1\\1 & x_1^2&x_2^2&...&x_d^2\\.&.&.&...&.\\.&.&.&...&.\\.&.&.&...&.\\1 & x_1^N&x_2^N&...&x_d^N\\\end{bmatrix}_{N \times (d+1)}D=​11...1​x11​x12​...x1N​​x21​x22​...x2N​​..................​xd1​xd2​...xdN​​​N×(d+1)​ and r=[r1r2...rN]N×1r=\begin{bmatrix}r_1\\r_2\\.\\.\\.\\r_N\end{bmatrix}_{N\times 1}r=​r1​r2​...rN​​​N×1​

Then, W=(DTD)−1DTrW=(D^TD)^{-1}D^TrW=(DTD)−1DTr

Sometimes, the inverse doesn't exist. This usually happens when the number of dimensions is too less.

PreviousParametric Estimation for Simple Linear RegressionNextParametric Estimation for Simple Polynomial Regression

Last updated 5 years ago

Was this helpful?