










Backward feature selection starts with the full model including all features and iteratively removes the least significant feature based on adjusted R-squared until no further improvement can be made.
















Another example: amount of nuts collected by squirrels
Let’s take a practical example of backward feature selection in a forest environment, where the task is to predict the amount of nuts collected by squirrels based on several environmental features.
Goal: Predict the number of nuts collected by squirrels () based on several environmental features such as:
: Number of trees in the area
: Temperature in the forest
: Amount of rainfall
: Distance to the nearest water source
: Number of competing squirrels
The initial linear regression model would be:
Where:
is the number of nuts collected by squirrels,
are the features,
are the regression coefficients,
is the error term.
Step-by-Step Process of Backward Feature Selection
Step 1: Train the Model with All Features
The initial model is trained using all the features:
Recall that adjusted R-squared adjusts the R-squared value to account for the number of predictors in the model, making it more suitable for comparing models with different numbers of features. The Adjusted R-squared is given by:
Where: is the R-squared value of the model,
is the number of observations,
is the number of predictors (features).
Procedure:
Step 1: Train the Model with All Features
Start with the model using all 5 features:
Calculate the Adjusted R-squared for this model. Assume:
Step 2: Remove One Feature and Evaluate
Remove one feature at a time and recalculate the Adjusted R-squared for each model.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
- Model without
:
Assume the Adjusted R-squared.
Step 3: Choose the Feature to Remove
Compare the Adjusted R-squared values:
- Removing
:
- Removing
:
- Removing
:
(significant decrease)
- Removing
:
- Removing
:
(highest Adjusted R-squared)
Since removing results in the highest Adjusted R-squared, we remove
from the model.
Step 4: Repeat the Process
With the remaining features , repeat the process:
- Model without
:
Assume Adjusted R-squared.
- Model without
:
Assume Adjusted R-squared.
- Model without
:
Assume Adjusted R-squared.
- Model without
:
Assume Adjusted R-squared.
Removing results in the highest Adjusted R-squared, so we remove
next.
Step 5: Final Model
Continue until removing more features causes a significant drop in Adjusted R-squared.
Assuming after several iterations, the final model with the best Adjusted R-squared value is:
Summary of Steps
- Train the model with all features and calculate Adjusted R-squared.
- Remove one feature at a time, recalculate Adjusted R-squared, and select the feature that causes the least decrease.
- Repeat the process with remaining features until no further improvement is possible.
- Finalize the model with the features that yield the highest Adjusted R-squared.
Final Model:
In this process, the features Temperature () and Amount of Rainfall (
) are selected as the most significant predictors of the amount of nuts collected by squirrels.