CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning

Bias and Variance of a Regression Algorithm

Here, we discuss the Bias and Variance of a Regression Algorithm that outputs g.

The expected error of fixed g (w.r.t. random noise ϵ\epsilonϵ) at x is given by:

E[(r−g(x))2∣x]E[(r-g(x))^2|x]E[(r−g(x))2∣x] (at fixed x)

where r=f(x)+ϵr=f(x)+\epsilonr=f(x)+ϵ

By linearity of expectation i.e. E[X+Y]=E[X]+E[Y]E[X+Y]=E[X]+E[Y]E[X+Y]=E[X]+E[Y], we can show that:

E[(r−g(x))2∣x]=E[(r−E[r∣x])2∣x]+(E[r∣x]−g(x))2(noise)(squared error)E[(r-g(x))^2|x]=\begin{matrix} E[(r-E[r|x])^2|x] & + & (E[r|x]-g(x))^2\\(noise)& &(squared \,error)\end{matrix}E[(r−g(x))2∣x]=E[(r−E[r∣x])2∣x](noise)​+​(E[r∣x]−g(x))2(squarederror)​

Expected error of g built from random sample X (using our regression algorithm) is given by:

EX=[(E[r∣x]−g(x))2∣x]=(E[r∣x]−EX[g(x)])2+EX[(g(x)−EX[g(x)])2](bias)(variance)E_X = [(E[r|x]-g(x))^2|x] = \begin{matrix}(E[r|x]-E_X[g(x)])^2 & + & E_X[(g(x)-E_X[g(x)])^2]\\(bias) & & (variance)\end{matrix}EX​=[(E[r∣x]−g(x))2∣x]=(E[r∣x]−EX​[g(x)])2(bias)​+​EX​[(g(x)−EX​[g(x)])2](variance)​

Bias: Expected error of g on fixed x, that is computed using our algorithm, when we run it on a random sample X.

Variance: The amount g varies based on the sample X.

Bias/Variance Dilemma

As discussed, bias is a measure of the expected error (w.r.t. the random sample X on which the algorithm is run) the algorithm will make when it predicts the value r for an input x, and variance is a measure of how much the prediction on x will vary depending on which random sample X is used.

An ideal algorithm has low bias and low variance. However, this is usually not achievable.

Typically, as we increase the complexity of g, for example, using Polynomial Regression of degree 2 instead of using Linear Regression, the bias decreases (better fit the data) but the variance increases (fit varies more with data). This can be related to the concept of overfitting since a more complex classifier will better fit the data, but the fit will vary more with the data.

PreviousBias and Variance of an EstimatorNextModel Selection

Last updated 5 years ago

Was this helpful?