Logistic regression with L1 or L2 penalty with codes in Python and R

Subscribe to get access

Read more of this content when you subscribe today.

Logistic regression with L1 or L2 penalty introduces regularization to prevent overfitting and improve model generalization. The L1 penalty (also known as Lasso) encourages sparsity in the model by driving some coefficients to zero, effectively performing feature selection. This makes it useful for datasets with many irrelevant features.

In contrast, the L2 penalty (Ridge) penalizes large coefficients by shrinking them toward zero, but does not produce sparse models; all features are retained with reduced importance. L1 regularization is particularly beneficial when only a few predictors are expected to have a strong influence, while L2 is preferable when all predictors contribute somewhat equally. Both methods help control model complexity and improve performance on unseen data.

R & Python codes

Python:

First, we import the necessary libraries for numerical operations, dataset loading, data splitting, data scaling, logistic regression, and performance metrics.

# Step 1: Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

We load the Iris dataset and extract the features (X) and target labels (y). Then we filter the dataset to include only two classes (0 and 1) for binary classification.

# Step 2: Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Use only two classes for binary classification
# Selecting only two classes (0 and 1) for demonstration purposes
X = X[y != 2]
y = y[y != 2]

We split the dataset into training and testing sets. Here, 70% of the data is used for training and 30% for testing.

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

We scale the features to ensure all values are on the same scale. This step is essential for logistic regression.

# Step 4: Scale the features (important for logistic regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

We perform logistic regression using an L1 penalty. The liblinear solver is specified because it is compatible with L1 regularization.

# Step 5: Perform Logistic Regression with L1 penalty
# Set solver to 'liblinear' since it's compatible with L1 penalty
logreg_l1 = LogisticRegression(penalty='l1', solver='liblinear', C=1.0)
logreg_l1.fit(X_train_scaled, y_train)

We make predictions on the test set and evaluate the model’s performance using accuracy and classification report metrics.

# Step 6: Make predictions and evaluate the model
y_pred = logreg_l1.predict(X_test_scaled)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:")
print(report)

We check and print the coefficients of the model to understand the importance of each feature.

# Step 7: Check the model coefficients
print("Model Coefficients (L1 Penalty):")
print(logreg_l1.coef_)

combined code (click to dowload):

# Step 1: Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Step 2: Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Use only two classes for binary classification
# Selecting only two classes (0 and 1) for demonstration purposes
X = X[y != 2]
y = y[y != 2]

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Scale the features (important for logistic regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: Perform Logistic Regression with L1 penalty
# Set solver to 'liblinear' since it's compatible with L1 penalty
logreg_l1 = LogisticRegression(penalty='l1', solver='liblinear', C=1.0)
logreg_l1.fit(X_train_scaled, y_train)

# Step 6: Make predictions and evaluate the model
y_pred = logreg_l1.predict(X_test_scaled)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:")
print(report)

# Step 7: Check the model coefficients
print("Model Coefficients (L1 Penalty):")
print(logreg_l1.coef_)

R codes

First, we load the necessary libraries. The glmnet library is used for fitting generalized linear models, and caret is used for creating data partitions and evaluating models.

# Step 1: Load necessary libraries
library(glmnet)
library(caret)

We load the Iris dataset and set a random seed for reproducibility. Then we prepare the dataset for binary classification by selecting only two classes (setosa and versicolor).

# Step 2: Load the Iris dataset and prepare it for binary classification
data(iris)
set.seed(42)

# Select only two classes (0 and 1) for binary classification
iris_binary <- subset(iris, Species != "virginica")
X <- as.matrix(iris_binary[, -5])  # Features
y <- as.factor(ifelse(iris_binary$Species == "setosa", 0, 1))  # Binary target

We split the dataset into training and testing sets. Here, 70% of the data is used for training and 30% for testing.

# Step 3: Split the dataset into training and testing sets
train_index <- createDataPartition(y, p=0.7, list=FALSE)
X_train <- X[train_index, ]
X_test <- X[-train_index, ]
y_train <- y[train_index]
y_test <- y[-train_index]

We scale the features to ensure all values are on the same scale. This step is essential for logistic regression.

# Step 4: Scale the features
X_train_scaled <- scale(X_train)
X_test_scaled <- scale(X_test)

We fit a logistic regression model with an L1 penalty (Lasso). The alpha = 1 parameter specifies Lasso regularization, and family = "binomial" indicates logistic regression.

# Step 5: Fit Logistic Regression with L1 penalty (Lasso)
# alpha = 1 for Lasso, family = "binomial" for logistic regression
cv_fit <- cv.glmnet(X_train_scaled, y_train, alpha=1, family="binomial")

We get the best lambda value (regularization strength) from the cross-validated model.

# Step 6: Get the best lambda (regularization strength)
best_lambda <- cv_fit$lambda.min
print(paste("Best Lambda:", best_lambda))

We make predictions on the test set using the best lambda value. Predictions are made as probabilities, and we convert them to binary class labels.

# Step 7: Make predictions on the test set
y_pred_prob <- predict(cv_fit, s=best_lambda, newx=X_test_scaled, type="response")
y_pred <- ifelse(y_pred_prob > 0.5, 1, 0)

We evaluate the model’s performance using a confusion matrix.

# Step 8: Evaluate the model
confusion_matrix <- confusionMatrix(as.factor(y_pred), y_test)
print(confusion_matrix)

Output:

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 15  0
         1  0 15
                                     
               Accuracy : 1          
                 95% CI : (0.8843, 1)
    No Information Rate : 0.5        
    P-Value [Acc > NIR] : 9.313e-10  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         
                                     
            Sensitivity : 1.0        
            Specificity : 1.0        
         Pos Pred Value : 1.0        
         Neg Pred Value : 1.0        
             Prevalence : 0.5        
         Detection Rate : 0.5        
   Detection Prevalence : 0.5        
      Balanced Accuracy : 1.0        
                                     
       'Positive' Class : 0     

We check and print the coefficients of the model to understand the importance of each feature.

# Step 9: Check the model coefficients
coef(cv_fit, s=best_lambda)

The output looks something like this

5 x 1 sparse Matrix of class "dgCMatrix"
                     s1
(Intercept)   0.1779808
Sepal.Length  .        
Sepal.Width  -1.4467258
Petal.Length  5.6937270
Petal.Width   1.5331430

Combined codes:

# Step 1: Load necessary libraries
library(glmnet)
library(caret)

# Step 2: Load the Iris dataset and prepare it for binary classification
data(iris)
set.seed(42)

# Select only two classes (0 and 1) for binary classification
iris_binary <- subset(iris, Species != "virginica")
X <- as.matrix(iris_binary[, -5])  # Features
y <- as.factor(ifelse(iris_binary$Species == "setosa", 0, 1))  # Binary target

# Step 3: Split the dataset into training and testing sets
train_index <- createDataPartition(y, p=0.7, list=FALSE)
X_train <- X[train_index, ]
X_test <- X[-train_index, ]
y_train <- y[train_index]
y_test <- y[-train_index]

# Step 4: Scale the features
X_train_scaled <- scale(X_train)
X_test_scaled <- scale(X_test)

# Step 5: Fit Logistic Regression with L1 penalty (Lasso)
# alpha = 1 for Lasso, family = "binomial" for logistic regression
cv_fit <- cv.glmnet(X_train_scaled, y_train, alpha=1, family="binomial")

# Step 6: Get the best lambda (regularization strength)
best_lambda <- cv_fit$lambda.min
print(paste("Best Lambda:", best_lambda))

# Step 7: Make predictions on the test set
y_pred_prob <- predict(cv_fit, s=best_lambda, newx=X_test_scaled, type="response")
y_pred <- ifelse(y_pred_prob > 0.5, 1, 0)

# Step 8: Evaluate the model
confusion_matrix <- confusionMatrix(as.factor(y_pred), y_test)
print(confusion_matrix)

# Step 9: Check the model coefficients
coef(cv_fit, s=best_lambda)


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!