Lasso (Least Absolute Shrinkage and Selection Operator):
Penalty: Lasso uses the
penalty, which is the sum of the absolute values of the coefficients.
- Mathematical Form: The Lasso regression objective function is:
whereare the actual values,
are the predicted values,
are the coefficients,
is the number of observations,
is the number of features, and
is the regularization parameter.
- Effect: Lasso can shrink some coefficients to exactly zero, effectively performing feature selection by excluding some features from the model.
On the
Penalty of Lasso:
- The
penalty adds the absolute value of the magnitude of the coefficients as a penalty term to the loss function.
- It encourages sparsity, meaning it can reduce some coefficients to zero, leading to simpler models with fewer features.
- Useful for feature selection.
Implementation:
In the following, we will:
- Import Libraries: Essential libraries are imported for machine learning functionalities.
- Load Dataset: A synthetic dataset is created using
make_regression
with 100 samples and 20 features. - Split Dataset: The dataset is split into training and test sets using
train_test_split
. - Train Lasso Model: A Lasso regression model is instantiated with a specified
alpha
value (regularization strength) and trained using the training data. - Evaluate Model: The model’s performance is evaluated on both the training and test sets using mean squared error (MSE) and R² score.
- Visualize Results: The coefficients of the Lasso model are plotted to visualize the effect of L1 regularization on the feature coefficients.
You can adjust the alpha
parameter to see how different levels of regularization affect the model’s performance and the coefficients. We will talk about how to choose alpha in a later post in tuning hyperparameters.
Python codes:
# Step 1: Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score
# Step 2: Load the dataset (using a synthetic dataset for this example)
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)
# Step 3: Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Train the Lasso regression model
alpha = 0.1 # Regularization strength
lasso = Lasso(alpha=alpha)
lasso.fit(X_train, y_train)
# Step 5: Evaluate the model
y_train_pred = lasso.predict(X_train)
y_test_pred = lasso.predict(X_test)
print("Training set evaluation:")
print("Mean Squared Error:", mean_squared_error(y_train, y_train_pred))
print("R^2 Score:", r2_score(y_train, y_train_pred))
print("\nTest set evaluation:")
print("Mean Squared Error:", mean_squared_error(y_test, y_test_pred))
print("R^2 Score:", r2_score(y_test, y_test_pred))
# Step 6: Visualize the results (if applicable, here we'll plot the coefficients)
plt.figure(figsize=(10, 6))
plt.plot(lasso.coef_, marker='o', linestyle='none')
plt.title('Lasso Regression Coefficients')
plt.xlabel('Feature Index')
plt.ylabel('Coefficient Value')
plt.grid(True)
plt.show()
Analyzing results: the results looks like this
Training set evaluation:
Mean Squared Error: 0.11797011924966153
R^2 Score: 0.9999962881305299
Test set evaluation:
Mean Squared Error: 0.1186764119872078
R^2 Score: 0.9999927871282901

So, you can see from the graph that most of the coefficients are shrunken to 0. From there, we can detect the features that are important to the prediction of the output.
R code
The corresponding codes in R is
# Step 1: Install and load the necessary packages
install.packages("glmnet")
install.packages("caret")
install.packages("e1071")
library(glmnet)
library(caret)
library(e1071)
# Step 2: Load the dataset (using a synthetic dataset for this example)
set.seed(42)
n <- 100
p <- 20
X <- matrix(rnorm(n * p), n, p)
beta <- rnorm(p)
y <- X %*% beta + rnorm(n) * 0.1
# Step 3: Split the dataset into training and test sets
trainIndex <- createDataPartition(y, p = 0.8, list = FALSE)
X_train <- X[trainIndex, ]
X_test <- X[-trainIndex, ]
y_train <- y[trainIndex]
y_test <- y[-trainIndex]
# Step 4: Train the Lasso regression model
alpha <- 1 # Lasso regression (L1 penalty)
lasso_model <- cv.glmnet(X_train, y_train, alpha = alpha)
# Step 5: Evaluate the model
y_train_pred <- predict(lasso_model, X_train, s = "lambda.min")
y_test_pred <- predict(lasso_model, X_test, s = "lambda.min")
train_mse <- mean((y_train - y_train_pred)^2)
test_mse <- mean((y_test - y_test_pred)^2)
train_r2 <- 1 - sum((y_train - y_train_pred)^2) / sum((y_train - mean(y_train))^2)
test_r2 <- 1 - sum((y_test - y_test_pred)^2) / sum((y_test - mean(y_test))^2)
cat("Training set evaluation:\n")
cat("Mean Squared Error:", train_mse, "\n")
cat("R^2 Score:", train_r2, "\n")
cat("\nTest set evaluation:\n")
cat("Mean Squared Error:", test_mse, "\n")
cat("R^2 Score:", test_r2, "\n")
# Step 6: Visualize the results (if applicable, here we'll plot the coefficients)
lasso_coefficients <- coef(lasso_model, s = "lambda.min")
plot(lasso_coefficients, main = "Lasso Regression Coefficients", xlab = "Feature Index", ylab = "Coefficient Value", pch = 16, col = "blue")
Here, we:
- Install and Load Packages: Install and load the necessary packages (
glmnet
for Lasso regression,caret
for data partitioning, ande1071
for additional functions). - Load Dataset: A synthetic dataset is created with 100 samples and 20 features.
- Split Dataset: The dataset is split into training and test sets using
createDataPartition
from thecaret
package. - Train Lasso Model: A Lasso regression model is trained using
cv.glmnet
with cross-validation to find the optimal lambda (regularization parameter). - Evaluate Model: The model’s performance is evaluated on both the training and test sets using mean squared error (MSE) and R² score.
- Visualize Results: The coefficients of the Lasso model are plotted to visualize the effect of L1 regularization on the feature coefficients.
Drawbacks of Lasso regression
- Handling of Correlated Variables:
- When dealing with highly correlated variables, lasso tends to select only one of them and arbitrarily sets the coefficients of the others to zero. This can be problematic when all of the correlated variables are actually relevant.
- This arbitrary selection can make the model’s interpretation challenging, as it may not accurately reflect the true relationships between variables.
- Limitations When p > n:
- In situations where the number of predictors (p) is greater than the number of observations (n), lasso is limited in the number of variables it can select.
- This limitation can be a significant issue in fields like genomics, where datasets often have many more features than samples.
- Bias:
- The L1 penalty used in lasso can introduce bias into the model by shrinking coefficients towards zero. While this helps with feature selection and reduces variance, it can also lead to underfitting.
- Sensitivity to Data Scaling:
- Lasso is sensitive to the scaling of the input features. Therefore it is very important to scale your data before using lasso.
- Not always the best for prediction accuracy:
- While lasso is great for feature selection, sometimes ridge regression will produce a model with higher prediction accuracy, due to the bias that lasso introduces.
In summary, while lasso is valuable for feature selection and handling high-dimensional data, it’s essential to be aware of its limitations, particularly when dealing with correlated variables or when the number of predictors exceeds the number of observations.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.