CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning
  3. Decision Trees

Bias and Variance

In general, decision trees have low bias and high variance.

To reduce variance, perform bagging:

  • Take multiple versions of the training set (random subsets)

  • Build a separate tree using each subset

  • Given a new example to predict, compute the prediction from each tree. Then for classification, predict the majority class. For regression, compute the average of the predictions.

To get a random subset of approx. 2/3 of the total examples (N), sample N times with replacement. Then, eliminate duplicates and the remaining elements form the random subset.

(Also, at each node, consider only a random subsets of attributes to be used for the decision).

This model is known as a Random Forest.

PreviousUsing Decision Trees for RegressionNextDimensionality Reduction

Last updated 5 years ago

Was this helpful?