Introduction to Random Forest


Teaching: 10 min
  • How is a Random Forest an ensemble method ?

  • How is bootstrap aggregation applied to our decision trees ?

  • How is feature bagging applied to decision tree modeling ?

  • To understand how ensemble learning is used with Decision Trees to create a Random Forest.

Based on what was previously covered in decision trees and ensemble methods it should come as little surprise as to where the random forest gets its name, or at a high level how they’re constructed but lets go over it anyways.

A random forest is compromised of a set of decision trees, each of which is trained on a random subset of the training data. These trees predictions can then be aggregated to provide a single prediction from a series of predictions.

How do you build a random forest ?

A random forest is built using the following procedure :

  • Choose the number of trees you’d like in your forest (M)
  • Choose the number of samples you’d like for each tree (n)
  • Choose the number of features you’d like in each tree (f)
  • For each tree in M:
    • Select n samples with replacement from all observations
    • Select f features at random
    • train a decision tree using the data set of n samples with f features
    • save the decision tree

What does this look like ?

First lets remind ourselves what our data looks like as one entity:

Full Data



And now once we’ve applied the bagging methods to this data set :

Sub Samples of Observations


As you can see the bootstrapping and feature bagging process produces wildly different decision trees than from that of just the single decision tree applied to all of the data.

These multiple classifiers give us a number of things :

  • A set of models that were trained without some features, meaning that in aggregate they’re able to make predictions even with missing data.

  • A set of models that viewed different subsets of data, meaning that they’ve all gotten slightly different ideas of how to make decisions based on different ideas of what the population looks like. This means that in aggregate they’re able to make predictions even when the training data doesn’t look exactly like what we’re trying to predict.

How does a Random Forest make a prediction ?

  • Given an observation (o).
  • For each tree (t) in the model :
    • predict the outcome (p) using t applied to o
    • store p in list P
  • If the model is a classifier :
    • return max_count(p)
  • If the model is a regressor :
    • return avg(p)

Key Points

  • A random forest uses an ensemble of decision trees trained on subsets of both observations and features in order to make its predictions.