Ridge regression: method & R codes

Motivation

Now, recall that for LASSO

Ridge Regression:

Ridge regression:

Ridge adds the L_2 penalty, which is the sum of the squares of the coefficients, to the loss function in linear regression. Ridge regression shrinks the coefficients but does not set any of them to zero, meaning it retains all features but reduces their impact. It is useful for dealing with multicollinearity (when features are highly correlated). Its objective function is:
\text{Minimize} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - \hat{y}i)^2 + \lambda \sum_{j=1}^p w_j^2 \right)
where y_i are the actual values, \hat{y}_i are the predicted values, w_j are the coefficients, n is the number of observations, p is the number of features, and \lambda is the regularization parameters

Choosing Between Lasso and Ridge:

L1 versus L2:

  • Lasso: Preferred when you expect only a few features to be important, and it is particularly useful in situations where you aim to reduce overfitting in your model. It performs feature selection by shrinking the less important feature coefficients to zero, yielding simpler and more interpretable models that can enhance performance on unseen data.
  • Ridge: Preferred when you expect many features to contribute to the outcome, especially in scenarios involving complex datasets with numerous predictor variables. It helps in managing multicollinearity effectively and maintaining all features within the model, thus ensuring that every contributing factor retains its influence on the predicted results, leading to potentially more robust predictive performance overall.

Implementation:

In the following codes, we will follow these steps:

Step 1: Install and Load Packages

This installs and loads the required libraries for Ridge regression.

install.packages("glmnet")
install.packages("caret")
install.packages("e1071")
library(glmnet)
library(caret)
library(e1071)

Step 2: Generate Synthetic Data

Creates a synthetic dataset with 100 observations and 20 features.

set.seed(42)
n <- 100
p <- 20
X <- matrix(rnorm(n * p), n, p)  # Generate random features
beta <- rnorm(p)  # Random coefficients
y <- X %*% beta + rnorm(n) * 0.1  # Compute target variable with noise

Step 3: Split Data into Training and Test Sets

Divides the dataset into 80% training and 20% testing.

trainIndex <- createDataPartition(y, p = 0.8, list = FALSE)
X_train <- X[trainIndex, ]
X_test <- X[-trainIndex, ]
y_train <- y[trainIndex]
y_test <- y[-trainIndex]

Step 4: Train Ridge Regression Model

Applies Ridge regression (alpha = 0) with cross-validation.

alpha <- 0  # Ridge regression (L2 penalty)
ridge_model <- cv.glmnet(X_train, y_train, alpha = alpha)

Step 5: Evaluate the Model

Generates predictions and computes performance metrics.

y_train_pred <- predict(ridge_model, X_train, s = "lambda.min")
y_test_pred <- predict(ridge_model, X_test, s = "lambda.min")

train_mse <- mean((y_train - y_train_pred)^2)  # Mean Squared Error for training set
test_mse <- mean((y_test - y_test_pred)^2)  # Mean Squared Error for test set

train_r2 <- 1 - sum((y_train - y_train_pred)^2) / sum((y_train - mean(y_train))^2)  # R² for training set
test_r2 <- 1 - sum((y_test - y_test_pred)^2) / sum((y_test - mean(y_test))^2)  # R² for test set

cat("Training set evaluation:\n")
cat("Mean Squared Error:", train_mse, "\n")
cat("R^2 Score:", train_r2, "\n")

cat("\nTest set evaluation:\n")
cat("Mean Squared Error:", test_mse, "\n")
cat("R^2 Score:", test_r2, "\n")

Step 6: Visualize Ridge Regression Coefficients

Plots the model’s coefficients to assess feature importance.

ridge_coefficients <- coef(ridge_model, s = "lambda.min")

plot(ridge_coefficients, main = "Ridge Regression Coefficients", 
     xlab = "Feature Index", ylab = "Coefficient Value", 
     pch = 16, col = "blue")

Complete code:

# Step 1: Install and load the necessary packages
install.packages("glmnet")
install.packages("caret")
install.packages("e1071")
library(glmnet)
library(caret)
library(e1071)

# Step 2: Load the dataset (using a synthetic dataset for this example)
set.seed(42)
n <- 100
p <- 20
X <- matrix(rnorm(n * p), n, p)
beta <- rnorm(p)
y <- X %*% beta + rnorm(n) * 0.1

# Step 3: Split the dataset into training and test sets
trainIndex <- createDataPartition(y, p = 0.8, list = FALSE)
X_train <- X[trainIndex, ]
X_test <- X[-trainIndex, ]
y_train <- y[trainIndex]
y_test <- y[-trainIndex]

# Step 4: Train the Ridge regression model
alpha <- 0  # Ridge regression (L2 penalty)
ridge_model <- cv.glmnet(X_train, y_train, alpha = alpha)

# Step 5: Evaluate the model
y_train_pred <- predict(ridge_model, X_train, s = "lambda.min")
y_test_pred <- predict(ridge_model, X_test, s = "lambda.min")

train_mse <- mean((y_train - y_train_pred)^2)
test_mse <- mean((y_test - y_test_pred)^2)
train_r2 <- 1 - sum((y_train - y_train_pred)^2) / sum((y_train - mean(y_train))^2)
test_r2 <- 1 - sum((y_test - y_test_pred)^2) / sum((y_test - mean(y_test))^2)

cat("Training set evaluation:\n")
cat("Mean Squared Error:", train_mse, "\n")
cat("R^2 Score:", train_r2, "\n")

cat("\nTest set evaluation:\n")
cat("Mean Squared Error:", test_mse, "\n")
cat("R^2 Score:", test_r2, "\n")

# Step 6: Visualize the results (if applicable, here we'll plot the coefficients)
ridge_coefficients <- coef(ridge_model, s = "lambda.min")

plot(ridge_coefficients, main = "Ridge Regression Coefficients", xlab = "Feature Index", ylab = "Coefficient Value", pch = 16, col = "blue")

Run in Colab

Output:

Training set evaluation:
Mean Squared Error: 0.04788766 
R^2 Score: 0.9979284 

Test set evaluation:
Mean Squared Error: 0.08317506 
R^2 Score: 0.9952078 

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!