

Polynomial regression is a form of regression analysis in which the relationship between the independent variable and the dependent variable is modeled as an nth degree polynomial. This approach allows for a more flexible fit compared to linear regression, as it can capture complex relationships and trends within the data. By increasing the degree of the polynomial, one can model curvature in the data, making it possible to accommodate various shapes that might not be represented by a straight line. This technique is particularly useful in scenarios where the linear assumption fails to reflect the underlying patterns, as it can provide insights into the behavior of the data across different ranges of the independent variable. However, with greater flexibility comes the risk of overfitting, where the model becomes too complex and captures noise rather than the actual signal, making it crucial to balance model complexity with generalization.
Therefore, in this post, we will perform feature engineering using polynomial features and L1 Lasso selection. It imports necessary libraries, generates polynomial features from input data, and applies Lasso for feature selection based on regularization strength. The function returns selected features along with their names. It includes an example workflow with a regression dataset, showcasing model training, predicting, and calculating mean squared error. Additionally, it emphasizes improvements like clarity in comments and ensuring the intercept term is always included.
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
def taylor_fs(X, y, max_order, alpha=0.1, threshold = 0.001):
"""
Feature engineering with polynomial features and L1 Lasso selection.
Args:
X (numpy.ndarray): Input features.
y (numpy.ndarray): Target labels.
max_order (int): Maximum order of polynomial features.
alpha (float, optional): Regularization strength for Lasso. Defaults to 0.1.
Returns:
numpy.ndarray: Selected features after Lasso.
"""
poly = PolynomialFeatures(max_order)
X_poly = poly.fit_transform(X) # Generate polynomial features
lasso = Lasso(alpha=alpha)
lasso.fit(X_poly, y)
selected_features_mask = np.abs(lasso.coef_) > threshold
selected_features_mask[0] = True #keep intercept
selected_features = X_poly[:, selected_features_mask]
return selected_features, poly.get_feature_names_out() [selected_features_mask]
# Example usage with a regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)
max_order = 2 # Example: using polynomial features up to order 2
alpha = 0.1 #Adjust alpha for more or less regularization
X_selected, selected_feature_names = taylor_fs(X, y, max_order, alpha)
X_train, X_test, y_train, y_test = train_test_split(X_selected, y, test_size=0.2, random_state=42)
model = Lasso(alpha=0.01) #train a model with the selected features.
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Selected feature names:", selected_feature_names)
print("Shape of selected features:", X_selected.shape)
print("Mean Squared Error:", mse)
#Example showing how to get the original feature names.
poly = PolynomialFeatures(max_order)
poly.fit(X)
original_feature_names = poly.get_feature_names_out()
print("Original Feature Names:\n", original_feature_names)
Explanations:
- Polynomial Feature Generation:
- Uses
sklearn.preprocessing.PolynomialFeatures
to efficiently create polynomial features. - This handles the core “Taylor” part of the algorithm, generating higher-order terms.
- Uses
- L1 Lasso Selection:
- Applies
sklearn.linear_model.Lasso
for L1 regularization, which performs feature selection by shrinking coefficients to zero. - The
alpha
parameter controls the regularization strength.
- Applies
- Feature Selection Mask:
- Creates a boolean mask (
lasso.coef_ != 0
) to identify non-zero coefficients, indicating selected features. However, if we want to keep only the features with magnitude bigger thanthreshold=0.001
for example, we can modify the function toselected_features_mask = np.abs(lasso.coef_) > threshold
. - The intercept is kept, by setting the first element of the mask to
True
.
- Creates a boolean mask (
- Feature Names:
poly.get_feature_names_out()
is used to get the names of the generated polynomial features, and the results are filtered based on the selection mask to provide the names of the selected features.
- Example Usage:
- Demonstrates how to use the
taylor_fs
function with amake_regression
dataset. - Includes a train-test split and MSE calculation to evaluate the performance of a model trained on the selected features.
- Shows how to get the original feature names.
- Demonstrates how to use the
- Clearer Comments and Variable Names:
- Improved comments and variable names for better readability.
- Handles Intercept:
- Ensures that the intercept term is always included in the selected features.
- Alpha Parameter:
- added the alpha parameter to the function, and an example of how to adjust it.
- Shape of selected features:
- added a print statement to show the shape of the selected features.
- Original Feature Names:
- added the ability to print out the original feature names, for better understanding of the transformations.
Output:
Selected feature names: ['1' 'x0' 'x1' 'x2' 'x3' 'x4' 'x0^2' 'x0 x1' 'x0 x2' 'x0 x3' 'x0 x4'
'x1^2' 'x1 x2' 'x1 x3' 'x1 x4' 'x2^2' 'x2 x3' 'x2 x4' 'x3^2' 'x3 x4'
'x4^2']
Shape of selected features: (100, 21)
Mean Squared Error: 160.48103362528636
Original Feature Names:
['1' 'x0' 'x1' 'x2' 'x3' 'x4' 'x0^2' 'x0 x1' 'x0 x2' 'x0 x3' 'x0 x4'
'x1^2' 'x1 x2' 'x1 x3' 'x1 x4' 'x2^2' 'x2 x3' 'x2 x4' 'x3^2' 'x3 x4'
'x4^2']
Potential drawbacks of Lasso for polynomial regression
The Correlation Issue:
Terms like x, x², x³, and so on, are highly correlated. If ‘x’ increases, so do its higher-order powers. In the context of Lasso, this correlation can lead to the algorithm arbitrarily selecting one of the correlated terms and zeroing out the others. This can result in a model where, for example, x² might be included, but x³ is excluded, even if both contribute to the underlying relationship.
Mitigation Strategies:
- Ridge Regression or Elastic Net:
- Ridge regression (L2 regularization) shrinks coefficients but doesn’t force them to zero. This can help preserve the contributions of correlated terms.
- Elastic Net combines L1 and L2 regularization, offering a balance between feature selection and handling correlated variables.
- Careful Feature Engineering:
- Consider using orthogonal polynomials, which are designed to minimize correlation between terms. However, this can reduce the interpretability of the raw polynomial terms.
In summary:
Lasso can be a useful tool for regularizing polynomial regression, but its behavior with correlated polynomial terms requires careful consideration. Alternatives like Ridge regression and Elastic Net, or careful feature engineering, may be necessary to address the challenges.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.