Stepwise Feature Selection +example

Subscribe to get access

??Subscribe to read the rest of the comics, the fun you can’t miss ??

Stepwise feature selection is a systematic approach to identifying the most relevant features for a predictive model by combining both forward and backward selection techniques. The process begins with either an empty model. Then, we add or remove features one at a time based on their statistical significance and contribution to the model’s performance. At each step, features are evaluated using some metric, such as adjusted R-squared for example, to determine their impact. This iterative method continues until no further significant improvements can be made by adding or removing features, resulting in a refined model with only the most impactful predictors.

Now, let’s step into an intuitive example of stepwise feature selection using a dataset where we aim to predict the weight of fish based on four potential input features: temperature, food, water cleanliness, and wind.

Step-by-Step Example of Stepwise Feature Selection using Adjusted R-Squared

Step 1: Start with No Features

We begin with an empty model and no features.

Step 2: Evaluate Each Feature Individually

We fit a separate simple linear regression model for each feature and evaluate their performance using adjusted R-squared. Let’s assume we have the following performance results:

  1. Temperature: Adjusted R^2 = 0.20
  2. Food: Adjusted R^2 = 0.55
  3. Water Cleanliness: Adjusted R^2 = 0.35
  4. Wind: Adjusted R^2 = 0.05

Since “Food” has the highest adjusted R-squared value, it is the most significant single predictor. We add “Food” to our model.

Step 3: Add the Best Feature

Now our model includes the feature “Food”:

\text{Weight} = \beta_0 + \beta_1 \cdot \text{Food}

Step 4: Evaluate Adding Each Remaining Feature

Next, we consider adding each of the remaining features to the current model one by one:

  1. Add Temperature:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature}
  • Combined Adjusted R^2 = 0.68
  1. Add Water Cleanliness:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Water Cleanliness}
  • Combined Adjusted R^2 = 0.62
  1. Add Wind:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Wind}
  • Combined Adjusted R^2 = 0.60

“Temperature” adds the most value to our model when combined with “Food” (highest increase in adjusted R-squared), so we add “Temperature” to the model.

Step 5: Add the Best Feature

Now our model includes “Food” and “Temperature”:

\text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature}

Step 6: Evaluate Removing Features

After adding “Temperature,” we check the previously added features to see if any can be removed without significantly decreasing the adjusted R-squared:

  1. Remove Food:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Temperature}
  • Adjusted R^2 = 0.30

Since removing “Food” significantly decreases the adjusted R-squared, we keep “Food” in the model.

Step 7: Evaluate Adding Each Remaining Feature

Next, we consider adding each of the remaining features to the current model one by one:

  1. Add Water Cleanliness:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature} + \beta_3 \cdot \text{Water Cleanliness}
  • Combined Adjusted R^2 = 0.72
  1. Add Wind:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature} + \beta_3 \cdot \text{Wind}
  • Combined Adjusted R^2 = 0.65

“Water Cleanliness” adds the most value to our model, so we add “Water Cleanliness” to the model.

Step 8: Add the Best Feature

Now our model includes “Food,” “Temperature,” and “Water Cleanliness”:

\text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature} + \beta_3 \cdot \text{Water Cleanliness}

Step 9: Evaluate Removing Features

After adding “Water Cleanliness,” we check the previously added features to see if any can be removed without significantly decreasing the adjusted R-squared:

  1. Remove Food:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Temperature} + \beta_2 \cdot \text{Water Cleanliness}
  • Adjusted R^2 = 0.60
  1. Remove Temperature:
    \text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Water Cleanliness}
  • Adjusted R^2 = 0.73

Since removing “Temperature” significantly decreases the adjusted R-squared, we keep “Temperature” in the model.

Step 10: Evaluate Adding the Last Feature

Finally, we consider adding “Wind” to the model:

\text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature} + \beta_3 \cdot \text{Water Cleanliness} + \beta_4 \cdot \text{Wind}

  • Combined Adjusted R^2 = 0.73

Adding “Wind” does not significantly improve the adjusted R-squared, so we stop here.

Final Model

The final model includes the features “Food,” “Temperature,” and “Water Cleanliness”:

\text{Weight} = \beta_0 + \beta_1 \cdot \text{Food} + \beta_2 \cdot \text{Temperature} + \beta_3 \cdot \text{Water Cleanliness}

Summary

In this stepwise feature selection process using adjusted R-squared, we started with no features and iteratively added the feature that provided the most significant improvement in adjusted R-squared while also checking and potentially removing previously added features. Our final model includes “Food,” “Temperature,” and “Water Cleanliness” as predictors for fish weight.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!