Bias and Variance of a Regression Algorithm

Here, we discuss the Bias and Variance of a Regression Algorithm that outputs g.

The expected error of fixed g (w.r.t. random noise ϵ\epsilon) at x is given by:

E[(rg(x))2x]E[(r-g(x))^2|x] (at fixed x)

where r=f(x)+ϵr=f(x)+\epsilon

By linearity of expectation i.e. E[X+Y]=E[X]+E[Y]E[X+Y]=E[X]+E[Y], we can show that:

E[(rg(x))2x]=E[(rE[rx])2x]+(E[rx]g(x))2(noise)(squarederror)E[(r-g(x))^2|x]=\begin{matrix} E[(r-E[r|x])^2|x] & + & (E[r|x]-g(x))^2\\(noise)& &(squared \,error)\end{matrix}

Expected error of g built from random sample X (using our regression algorithm) is given by:

EX=[(E[rx]g(x))2x]=(E[rx]EX[g(x)])2+EX[(g(x)EX[g(x)])2](bias)(variance)E_X = [(E[r|x]-g(x))^2|x] = \begin{matrix}(E[r|x]-E_X[g(x)])^2 & + & E_X[(g(x)-E_X[g(x)])^2]\\(bias) & & (variance)\end{matrix}

Bias: Expected error of g on fixed x, that is computed using our algorithm, when we run it on a random sample X.

Variance: The amount g varies based on the sample X.

Bias/Variance Dilemma

As discussed, bias is a measure of the expected error (w.r.t. the random sample X on which the algorithm is run) the algorithm will make when it predicts the value r for an input x, and variance is a measure of how much the prediction on x will vary depending on which random sample X is used.

An ideal algorithm has low bias and low variance. However, this is usually not achievable.

Typically, as we increase the complexity of g, for example, using Polynomial Regression of degree 2 instead of using Linear Regression, the bias decreases (better fit the data) but the variance increases (fit varies more with data). This can be related to the concept of overfitting since a more complex classifier will better fit the data, but the fit will vary more with the data.

Last updated