Bias and Variance
In general, decision trees have low bias and high variance.
To reduce variance, perform bagging:
Take multiple versions of the training set (random subsets)
Build a separate tree using each subset
Given a new example to predict, compute the prediction from each tree. Then for classification, predict the majority class. For regression, compute the average of the predictions.
To get a random subset of approx. 2/3 of the total examples (N), sample N times with replacement. Then, eliminate duplicates and the remaining elements form the random subset.
(Also, at each node, consider only a random subsets of attributes to be used for the decision).
This model is known as a Random Forest.
Last updated