Introduction to Random Forest

Overview

Teaching: 10 min
Questions
  • What is a random forest ?

  • How are random forests used ?

  • When might I want to use a random forest ?

Objectives
  • To gain a high level understanding of what a random forest is and situations where they might be used.

A Brief Overview:

The random forest, first described by Breimen et al (2001), is an ensemble approach for building predictive models.

The “forest” in this approach is a series of decision trees that act as “weak” classifiers that as individuals are poor predictors but in aggregate form a robust prediction.

Due to their simple nature, lack of assumptions, and general high performance they’ve been used in probably every domain where machine learning has been applied.

When to choose :

Random forests don’t make any strong assumptions about the scale and normality of incoming data. They perform well with mixed numerical and categorical data, don’t require much tuning to get a reasonable first version of a predictive model, are fast to train, are intuitive to understand, provide feature importance as a feature of the model, are inherently able to handle missing data, and have been implemented in every language.

As such, random forests make a great starting point for any project where you’re building a predictive model or exploring the feasibility of applying machine learning to a new domain.

Key Points

  • A random forest is an ensemble model; using many weakly predictive decision trees to make predictions.

  • Random forests make great starting points for any predictive modeling project.