Overview
Teaching: 10 min
Questions
What is a random forest ?
How are random forests used ?
When might I want to use a random forest ?
Objectives
To gain a high level understanding of what a random forest is and situations where they might be used.
A Brief Overview:
The random forest, first described by Breimen et al (2001), is an ensemble approach for building predictive models.
The “forest” in this approach is a series of decision trees that act as “weak” classifiers that as individuals are poor predictors but in aggregate form a robust prediction.
Due to their simple nature, lack of assumptions, and general high performance they’ve been used in probably every domain where machine learning has been applied.
When to choose :
Random forests don’t make any strong assumptions about the scale and normality of incoming data. They perform well with mixed numerical and categorical data, don’t require much tuning to get a reasonable first version of a predictive model, are fast to train, are intuitive to understand, provide feature importance as a feature of the model, are inherently able to handle missing data, and have been implemented in every language.
As such, random forests make a great starting point for any project where you’re building a predictive model or exploring the feasibility of applying machine learning to a new domain.
Key Points
A random forest is an ensemble model; using many weakly predictive decision trees to make predictions.
Random forests make great starting points for any predictive modeling project.