Bias and Variance

In general, decision trees have low bias and high variance.

To reduce variance, perform bagging:

  • Take multiple versions of the training set (random subsets)

  • Build a separate tree using each subset

  • Given a new example to predict, compute the prediction from each tree. Then for classification, predict the majority class. For regression, compute the average of the predictions.

To get a random subset of approx. 2/3 of the total examples (N), sample N times with replacement. Then, eliminate duplicates and the remaining elements form the random subset.

(Also, at each node, consider only a random subsets of attributes to be used for the decision).

This model is known as a Random Forest.

Last updated