Subscribe to get access
??Subscribe to read the rest of the comics, the fun you can’t miss ??
Backward feature selection starts with the full model including all features and iteratively removes the least significant feature based on adjusted R-squared until no further improvement can be made.
Subscribe to get access
Read more of this content when you subscribe today.
Another example: amount of nuts collected by squirrels
Let’s take a practical example of backward feature selection in a forest environment, where the task is to predict the amount of nuts collected by squirrels based on several environmental features.
Goal: Predict the number of nuts collected by squirrels () based on several environmental features such as:
: Number of trees in the area
: Temperature in the forest
: Amount of rainfall
: Distance to the nearest water source
: Number of competing squirrels
The initial linear regression model would be:
Where:
is the number of nuts collected by squirrels,
are the features,
are the regression coefficients,
is the error term.
Step-by-Step Process of Backward Feature Selection
Step 1: Train the Model with All Features
The initial model is trained using all the features:
Recall that adjusted R-squared adjusts the R-squared value to account for the number of predictors in the model, making it more suitable for comparing models with different numbers of features. The Adjusted R-squared is given by:
Where: is the R-squared value of the model,
is the number of observations,
is the number of predictors (features).
Procedure:
Step 1: Train the Model with All Features
Start with the model using all 5 features:
Calculate the Adjusted R-squared for this model. Assume:
Step 2: Remove One Feature and Evaluate
Remove one feature at a time and recalculate the Adjusted R-squared for each model.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
Step 3: Choose the Feature to Remove
Compare the Adjusted R-squared values:
- Removing
:
- Removing
:
- Removing
:
(significant decrease)
- Removing
:
- Removing
:
(highest Adjusted R-squared)
Since removing results in the highest Adjusted R-squared, we remove
from the model.
Step 4: Repeat the Process
With the remaining features , repeat the process:
- Model without
:
Assume Adjusted R-squared.
- Model without
:
Assume Adjusted R-squared.
- Model without
:
Assume Adjusted R-squared.
- Model without
:
Assume Adjusted R-squared.
Removing results in the highest Adjusted R-squared, so we remove
next.
Step 5: Final Model
Continue until removing more features causes a significant drop in Adjusted R-squared.
Assuming after several iterations, the final model with the best Adjusted R-squared value is:
Summary of Steps
- Train the model with all features and calculate Adjusted R-squared.
- Remove one feature at a time, recalculate Adjusted R-squared, and select the feature that causes the least decrease.
- Repeat the process with remaining features until no further improvement is possible.
- Finalize the model with the features that yield the highest Adjusted R-squared.
Final Model:
In this process, the features Temperature () and Amount of Rainfall (
) are selected as the most significant predictors of the amount of nuts collected by squirrels.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.