CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning

Bias and Variance of an Estimator

Consider the following estimators of the mean of a distribution. X is an i.i.d. sample from the distribution.

  1. m1=∑txtNm_1 = \frac{\sum_t x^t}{N}m1​=N∑t​xt​ (this is the MLE)

  2. m2=x1+xN2m_2 = \frac{x^1+x^N}{2}m2​=2x1+xN​

  3. m3=5m_3 = 5m3​=5

Now, draw a sample of size N (say N=3): X={6,1,5}

m1=4, m2=11/5, m3=5m_1 = 4, \, m_2= 11/5, \, m_3=5m1​=4,m2​=11/5,m3​=5

If we consider the means to be random variables, each of them will have a variance.

Say we want to estimate Θ\ThetaΘ (here, Θ=μ\Theta=\muΘ=μ of the distribution from which we are drawing X)

The desirable property of the estimator d of Θ\ThetaΘ is that the expected value of d must be equal to the quantity we want to estimate i.e. E[d]=ΘE[d] = \ThetaE[d]=Θ. d is then called the unbiased estimator.

The bias of an estimator 'd' is given by:

bΘ(d)=E[d]−Θb_\Theta(d) = E[d]-\ThetabΘ​(d)=E[d]−Θ

If bΘ(d)=0b_\Theta(d) = 0bΘ​(d)=0, d is an unbiased estimator.

Is m2m_2m2​ an unbiased estimator of μ\muμ?

m2=x1+xN2m_2 = \frac{x^1+x^N}{2}m2​=2x1+xN​ E[m2]=E[x1+xN2]=12E[x1+xN]=12E[x1]+12E[xN]E[m_2] = E[\frac{x^1+x^N}{2}] = \frac{1}{2}E[x^1+x^N] = \frac{1}{2} E[x^1] + \frac{1}{2} E[x^N]E[m2​]=E[2x1+xN​]=21​E[x1+xN]=21​E[x1]+21​E[xN]

Since E[xt]=μE[x^t] = \muE[xt]=μ (by definition), E[m2]=12μ+12μ=μE[m_2] = \frac{1}{2}\mu + \frac{1}{2}\mu = \muE[m2​]=21​μ+21​μ=μ

Therefore, m2m_2m2​ is an unbiased estimator of the mean μ\muμ.

Is m3m_3m3​ an unbiased estimator of μ\muμ?

E[m3]=E[5]=5E[m_3] = E[5] = 5E[m3​]=E[5]=5

Clearly, E[m3]−μ=0E[m_3]-\mu = 0E[m3​]−μ=0 iff μ=5\mu=5μ=5. Therefore, m3m_3m3​ is not an unbiased estimator of the mean μ.\mu.μ.

The variance of an estimator 'd' is given by

E[(d−E[d])2]E[(d-E[d])^2]E[(d−E[d])2]

More data leads to lower variance.

m3m_3m3​ has the least variance (it is always 5!). m1m_1m1​ has a lower variance than m2m_2m2​.

The square error of an estimator is given by: E[(d−Θ)2]=(E[d]−Θ)2+E[(d−E[d])2]=E[(d-\Theta)^2] = (E[d]-\Theta)^2 + E[(d-E[d])^2] =E[(d−Θ)2]=(E[d]−Θ)2+E[(d−E[d])2]= Bias2+VarianceBias^2 + VarianceBias2+Variance

PreviousParametric Estimation for Multivariate Polynomial RegressionNextBias and Variance of a Regression Algorithm

Last updated 5 years ago

Was this helpful?