Subscribe to get access
??Subscribe to read the rest of the comics, the fun you can’t miss ??
Stepwise feature selection is a systematic approach to identifying the most relevant features for a predictive model by combining both forward and backward selection techniques. The process begins with either an empty model. Then, we add or remove features one at a time based on their statistical significance and contribution to the model’s performance. At each step, features are evaluated using some metric, such as adjusted R-squared for example, to determine their impact. This iterative method continues until no further significant improvements can be made by adding or removing features, resulting in a refined model with only the most impactful predictors.
Now, let’s step into an intuitive example of stepwise feature selection using a dataset where we aim to predict the weight of fish based on four potential input features: temperature, food, water cleanliness, and wind.
Step-by-Step Example of Stepwise Feature Selection using Adjusted R-Squared
Step 1: Start with No Features
We begin with an empty model and no features.
Step 2: Evaluate Each Feature Individually
We fit a separate simple linear regression model for each feature and evaluate their performance using adjusted R-squared. Let’s assume we have the following performance results:
- Temperature: Adjusted
- Food: Adjusted
- Water Cleanliness: Adjusted
- Wind: Adjusted
Since “Food” has the highest adjusted R-squared value, it is the most significant single predictor. We add “Food” to our model.
Step 3: Add the Best Feature
Now our model includes the feature “Food”:
Step 4: Evaluate Adding Each Remaining Feature
Next, we consider adding each of the remaining features to the current model one by one:
- Add Temperature:
- Combined Adjusted
- Add Water Cleanliness:
- Combined Adjusted
- Add Wind:
- Combined Adjusted
“Temperature” adds the most value to our model when combined with “Food” (highest increase in adjusted R-squared), so we add “Temperature” to the model.
Step 5: Add the Best Feature
Now our model includes “Food” and “Temperature”:
Step 6: Evaluate Removing Features
After adding “Temperature,” we check the previously added features to see if any can be removed without significantly decreasing the adjusted R-squared:
- Remove Food:
- Adjusted
Since removing “Food” significantly decreases the adjusted R-squared, we keep “Food” in the model.
Step 7: Evaluate Adding Each Remaining Feature
Next, we consider adding each of the remaining features to the current model one by one:
- Add Water Cleanliness:
- Combined Adjusted
- Add Wind:
- Combined Adjusted
“Water Cleanliness” adds the most value to our model, so we add “Water Cleanliness” to the model.
Step 8: Add the Best Feature
Now our model includes “Food,” “Temperature,” and “Water Cleanliness”:
Step 9: Evaluate Removing Features
After adding “Water Cleanliness,” we check the previously added features to see if any can be removed without significantly decreasing the adjusted R-squared:
- Remove Food:
- Adjusted
- Remove Temperature:
- Adjusted
Since removing “Temperature” significantly decreases the adjusted R-squared, we keep “Temperature” in the model.
Step 10: Evaluate Adding the Last Feature
Finally, we consider adding “Wind” to the model:
- Combined Adjusted
Adding “Wind” does not significantly improve the adjusted R-squared, so we stop here.
Final Model
The final model includes the features “Food,” “Temperature,” and “Water Cleanliness”:
Summary
In this stepwise feature selection process using adjusted R-squared, we started with no features and iteratively added the feature that provided the most significant improvement in adjusted R-squared while also checking and potentially removing previously added features. Our final model includes “Food,” “Temperature,” and “Water Cleanliness” as predictors for fish weight.