Overview
Teaching: 10 min
Questions
What is an ensemble method ?
What is bagging ?
What is feature bagging ?
Objectives
To gain a high level understanding of ensemble learning strategies.
Ensemble Methods
Ensemble methods are algorithms that combine multiple algorithms into a single predictive model in order to decrease variance, decrease bias, or improve predictions.
Ensemble methods are usually broken into two categories :

parallel methods , where the models that make up the building blocks of the larger methods are generated independent of each other. (i.e. they can be trained / generated as trivially parallel problems applied to the data set.)

sequential methods, where the learners are generated in a sequential order and are dependent on each other. (i.e they can only be trained one at a time, as the next model will require information from the training upstream of it.)
The random forest algorithm relies on a parallel ensemble method called bagging to generate its weak classifiers.
Bagging
Bagging is a colloquial term for bootstrap aggregation. Bootstrap aggregation is a method that allows us to decrease the variance of an estimate by averaging multiple estimates that are measured from random sub samples of a population.
Bootstrap Sampling
The first portion of bagging is the application of bootstrap sampling to obtain subsets of the data. These subsets are then fed into one model that will comprise the final ensemble method. This is a straight forward process, given a set of observation data, n observations are selected at random and with replacement to form the sub sample. This subsample is what is then fed into the machine learning algorithm of choice to train the model.
Aggregation
After all of the models have been built, their outputs must be aggregated into a single coherent prediction for the larger model. In the case of a classifier model this is usually just a winner take all strategy, which ever category receives the most votes is the final outcome predicted. In the case of a regression problem a simple average of predicted outcome values is used.
Feature bagging
Feature bagging (or the random subspace method) is a type of ensemble method that is applied to the features (columns) of a data set instead of to the observations (rows). It is used as a method of reducing the correlation between features by training base predictors on random subsets of features instead of the complete feature space each time.
What does this look like :
A complete set of observations might be graphed as a joy plot and look like this :
The bagging method might then sample this observation repeatedly, creating a series of observations that look like these:
Each of the above distributions would then be fed into a distinct model for training.
The code to generate these plots is available here
Key Points
Ensemble learning is a framework where multiple models decisions are combined to make a decision.
Bootstrapping is a statistical technique where an estimation of a population is estimated by repeatedly sampling and measuring a metric of instance.
Feature bagging is the random selection of features to be used in a model.