Ensemble Learning

This is the concept of combining multiple learners.

Different learners could use:

  • different algorithms

  • different parameters

  • different training sets

There is, however, little theoretical justification for just generating hypotheses using different learning algorithms. Instead, run the same algorithm on different training sets.

Two common ensemble learning techniques:

  • bagging

  • boosting

Boosting

Use multiple weak learners (i.e. the learner's test accuracy is slightly better than random, say 51%).

For example, use several Decesion Stumps (i.e. a Decision Tree with just a root and leaves).

  • First, assign equal weights to all training examples.

  • Run a weak learner on the training set, and get a hypothesis h.

  • Now, increment the weights (using a certain formula) of the misclassified examples (if an example has weight 3, it is repeated 3 times, so now there will be 3 such examples in the updated dataset).

  • Now, train a new weak learner on the updated dataset.

  • Keep repeating this, until a good accuracy is achieved.

  • In effect, each new weak learner learns from the mistakes of the previous weak learner.

  • In the end (we decide when to stop), we will have i hypotheses. Compute a weighted vote (using a certain formula) to determine the prediction for an example.

Theoretically, overfitting may be a concern, but it isn't usually a problem if the learners are weak.

Bagging

A weighted vote of weak learners is used to compute the prediction.

Last updated