Lasso Regression: methods & codes

Lasso (Least Absolute Shrinkage and Selection Operator):

  • L_1 Penalty: Lasso uses the L_1 penalty, which is the sum of the absolute values of the coefficients.
  • Mathematical Form: The Lasso regression objective function is:
    \text{Minimize} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - \hat{y}i)^2 + \lambda \sum{j=1}^{p} |w_j| \right)
    where y_i are the actual values, \hat{y}_i are the predicted values, w_j are the coefficients, n is the number of observations, p is the number of features, and \lambda is the regularization parameter.
  • Effect: Lasso can shrink some coefficients to exactly zero, effectively performing feature selection by excluding some features from the model.

On the L_1 Penalty of Lasso:

  • The L_1 penalty adds the absolute value of the magnitude of the coefficients as a penalty term to the loss function.
  • It encourages sparsity, meaning it can reduce some coefficients to zero, leading to simpler models with fewer features.
  • Useful for feature selection.

Implementation:

In the following, we will:

  1. Import Libraries: Essential libraries are imported for machine learning functionalities.
  2. Load Dataset: A synthetic dataset is created using make_regression with 100 samples and 20 features.
  3. Split Dataset: The dataset is split into training and test sets using train_test_split.
  4. Train Lasso Model: A Lasso regression model is instantiated with a specified alpha value (regularization strength) and trained using the training data.
  5. Evaluate Model: The model’s performance is evaluated on both the training and test sets using mean squared error (MSE) and R² score.
  6. Visualize Results: The coefficients of the Lasso model are plotted to visualize the effect of L1 regularization on the feature coefficients.

You can adjust the alpha parameter to see how different levels of regularization affect the model’s performance and the coefficients. We will talk about how to choose alpha in a later post in tuning hyperparameters.

Python codes:

# Step 1: Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score

# Step 2: Load the dataset (using a synthetic dataset for this example)
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=42)

# Step 3: Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Train the Lasso regression model
alpha = 0.1  # Regularization strength
lasso = Lasso(alpha=alpha)
lasso.fit(X_train, y_train)

# Step 5: Evaluate the model
y_train_pred = lasso.predict(X_train)
y_test_pred = lasso.predict(X_test)

print("Training set evaluation:")
print("Mean Squared Error:", mean_squared_error(y_train, y_train_pred))
print("R^2 Score:", r2_score(y_train, y_train_pred))

print("\nTest set evaluation:")
print("Mean Squared Error:", mean_squared_error(y_test, y_test_pred))
print("R^2 Score:", r2_score(y_test, y_test_pred))

# Step 6: Visualize the results (if applicable, here we'll plot the coefficients)
plt.figure(figsize=(10, 6))
plt.plot(lasso.coef_, marker='o', linestyle='none')
plt.title('Lasso Regression Coefficients')
plt.xlabel('Feature Index')
plt.ylabel('Coefficient Value')
plt.grid(True)
plt.show()

Analyzing results: the results looks like this

Training set evaluation:
Mean Squared Error: 0.11797011924966153
R^2 Score: 0.9999962881305299

Test set evaluation:
Mean Squared Error: 0.1186764119872078
R^2 Score: 0.9999927871282901

So, you can see from the graph that most of the coefficients are shrunken to 0. From there, we can detect the features that are important to the prediction of the output.

R code

The corresponding codes in R is

# Step 1: Install and load the necessary packages
install.packages("glmnet")
install.packages("caret")
install.packages("e1071")
library(glmnet)
library(caret)
library(e1071)

# Step 2: Load the dataset (using a synthetic dataset for this example)
set.seed(42)
n <- 100
p <- 20
X <- matrix(rnorm(n * p), n, p)
beta <- rnorm(p)
y <- X %*% beta + rnorm(n) * 0.1

# Step 3: Split the dataset into training and test sets
trainIndex <- createDataPartition(y, p = 0.8, list = FALSE)
X_train <- X[trainIndex, ]
X_test <- X[-trainIndex, ]
y_train <- y[trainIndex]
y_test <- y[-trainIndex]

# Step 4: Train the Lasso regression model
alpha <- 1  # Lasso regression (L1 penalty)
lasso_model <- cv.glmnet(X_train, y_train, alpha = alpha)

# Step 5: Evaluate the model
y_train_pred <- predict(lasso_model, X_train, s = "lambda.min")
y_test_pred <- predict(lasso_model, X_test, s = "lambda.min")

train_mse <- mean((y_train - y_train_pred)^2)
test_mse <- mean((y_test - y_test_pred)^2)
train_r2 <- 1 - sum((y_train - y_train_pred)^2) / sum((y_train - mean(y_train))^2)
test_r2 <- 1 - sum((y_test - y_test_pred)^2) / sum((y_test - mean(y_test))^2)

cat("Training set evaluation:\n")
cat("Mean Squared Error:", train_mse, "\n")
cat("R^2 Score:", train_r2, "\n")

cat("\nTest set evaluation:\n")
cat("Mean Squared Error:", test_mse, "\n")
cat("R^2 Score:", test_r2, "\n")

# Step 6: Visualize the results (if applicable, here we'll plot the coefficients)
lasso_coefficients <- coef(lasso_model, s = "lambda.min")

plot(lasso_coefficients, main = "Lasso Regression Coefficients", xlab = "Feature Index", ylab = "Coefficient Value", pch = 16, col = "blue")

Here, we:

  1. Install and Load Packages: Install and load the necessary packages (glmnet for Lasso regression, caret for data partitioning, and e1071 for additional functions).
  2. Load Dataset: A synthetic dataset is created with 100 samples and 20 features.
  3. Split Dataset: The dataset is split into training and test sets using createDataPartition from the caret package.
  4. Train Lasso Model: A Lasso regression model is trained using cv.glmnet with cross-validation to find the optimal lambda (regularization parameter).
  5. Evaluate Model: The model’s performance is evaluated on both the training and test sets using mean squared error (MSE) and R² score.
  6. Visualize Results: The coefficients of the Lasso model are plotted to visualize the effect of L1 regularization on the feature coefficients.

Drawbacks of Lasso regression

  • Handling of Correlated Variables:
    • When dealing with highly correlated variables, lasso tends to select only one of them and arbitrarily sets the coefficients of the others to zero. This can be problematic when all of the correlated variables are actually relevant.
    • This arbitrary selection can make the model’s interpretation challenging, as it may not accurately reflect the true relationships between variables.
  • Limitations When p > n:
    • In situations where the number of predictors (p) is greater than the number of observations (n), lasso is limited in the number of variables it can select.
    • This limitation can be a significant issue in fields like genomics, where datasets often have many more features than samples.
  • Bias:
    • The L1 penalty used in lasso can introduce bias into the model by shrinking coefficients towards zero. While this helps with feature selection and reduces variance, it can also lead to underfitting.
  • Sensitivity to Data Scaling:
    • Lasso is sensitive to the scaling of the input features. Therefore it is very important to scale your data before using lasso.
  • Not always the best for prediction accuracy:
    • While lasso is great for feature selection, sometimes ridge regression will produce a model with higher prediction accuracy, due to the bias that lasso introduces.

In summary, while lasso is valuable for feature selection and handling high-dimensional data, it’s essential to be aware of its limitations, particularly when dealing with correlated variables or when the number of predictors exceeds the number of observations.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!