CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning
  3. Parametric Estimation

Parametric Estimation for Simple Linear Regression

r=f(X)+ϵr=f(X) + \epsilonr=f(X)+ϵ;ϵ∼N(μ,σ2)\epsilon\sim N(\mu,\sigma^2)ϵ∼N(μ,σ2)

is a line that minimizes squared error.

g(xt∣w0,w1)=w1xt+w0g(x^t|w_0,w_1) = w_1x^t+w_0g(xt∣w0​,w1​)=w1​xt+w0​

is the line defined by parameters w0,w1w_0, w_1w0​,w1​. We need to find the line that minimizes squared error, and to do so, we need to compute the values for w0,w1w_0, w_1w0​,w1​ that minimize the squared error.

We have X={xt,rt}t=1NX=\{x^t,r^t\}_{t=1}^NX={xt,rt}t=1N​

We need to compute argminw0,w1  ∑t (rt−(w1xt+w0))2argmin_{w_0, w_1} \,\, \sum_t \,(r^t - (w_1x^t+w_0))^2argminw0​,w1​​∑t​(rt−(w1​xt+w0​))2

To solve for w0,w1w_0, w_1w0​,w1​, take partial derivatives w.r.t w0w_0w0​ and w1w_1w1​ and equate them to 0. We will get 2 equations:

∑t rt=Nw0+w1∑txt\sum_t \, r^t = Nw_0 + w_1 \sum_t x^t∑t​rt=Nw0​+w1​∑t​xt

∑t rtxt=w0∑t xt+w1∑t (xt)2\sum_t \, r^tx^t = w_0\sum_t\, x^t + w_1\sum_t\, (x^t)^2∑t​rtxt=w0​∑t​xt+w1​∑t​(xt)2

To solve for w0,w1w_0, w_1w0​,w1​, we use the closed form solution:

W=A−1yW=A^{-1}yW=A−1y

where W=[w0w1], A=[N∑txt∑txt∑t(xt)2], Y=[∑trt∑trtxt]W=\begin{bmatrix}w_0\\w_1\end{bmatrix}, \, A=\begin{bmatrix}N & \sum_t x^t\\ \sum_t x^t & \sum_t (x^t)^2\end{bmatrix}, \, Y=\begin{bmatrix} \sum_t r^t \\ \sum_t r^tx^t\end{bmatrix}W=[w0​w1​​],A=[N∑t​xt​∑t​xt∑t​(xt)2​],Y=[∑t​rt∑t​rtxt​]

PreviousBayesian Approach to Parameter EstimationNextParametric Estimation for Multivariate Linear Regression

Last updated 5 years ago

Was this helpful?