Early Stopping & Restore Best Weights & Codes in PyTorch on MNIST dataset

Early stopping in deep learning is a crucial regularization technique that helps prevent overfitting during the training of models. Monitoring the model’s performance on a validation dataset allows for the interruption of training once performance begins to degrade, thus avoiding unnecessary computation and preserving the effectiveness of the model. This technique can significantly enhance the model’s predictive capabilities by ensuring that it generalizes well to unseen data. Additionally, early stopping not only saves time in training but also contributes to more efficient resource usage, making it an indispensable tool in the deep learning practitioner’s toolkit. It encourages the development of robust models that maintain their validity over various datasets and operating conditions, ultimately leading to more reliable applications in real-world scenarios.

Restoring the best weights is a crucial step when using early stopping to ensure that the model’s performance is maximized based on the validation set. This involves saving the model’s weights whenever the validation performance improves and then reloading these best weights once training stops.

Steps to Restore Best Weights

Save the best model weights during training: Whenever the validation loss decreases, save the current model weights.
Track the best validation loss: Keep track of the best validation loss encountered during training.
Load the best model weights after early stopping: Once early stopping is triggered, reload the best model weights.

Implementation Example in PyTorch on the MNIST dataset:

First, we import the necessary libraries for neural networks, optimizers, data loading, and preprocessing.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

We define data preprocessing steps and load the MNIST dataset. We normalize the data and create data loaders for training and validation.

# Data preprocessing and loading
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('.', train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

We define a simple neural network model with input, hidden, and output layers. Let’s use a model that has an input layer with 784 units (since MNIST images are 28×28 pixels), a hidden layer with 128 units, and an output layer with 10 units (for the 10 digit classes).

# Define a simple model
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

We set up the loss function and optimizer for training.

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

We initialize parameters for early stopping.

# Parameters for early stopping
patience = 5
best_loss = float('inf')
counter = 0
best_model_wts = model.state_dict()

We implement the training loop with early stopping. Here, we train the model through 50 epochs. The model trains on the training data and evaluates the model using the validation data. If the validation loss does not improve for a certain number of epochs (patience), training stops early.

# Training loop with early stopping
for epoch in range(50):
    model.train()
    for inputs, targets in train_loader:
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the inputs
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

    # Validation phase
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs = inputs.view(inputs.size(0), -1)  # Flatten the inputs
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            val_loss += loss.item()

    val_loss /= len(val_loader)

    print(f"Epoch {epoch+1}, Validation Loss: {val_loss}")

    # Check if validation loss improved
    if val_loss < best_loss:
        best_loss = val_loss
        counter = 0
        best_model_wts = model.state_dict()  # Save the best model weights
    else:
        counter += 1
        if counter >= patience:
            print("Early stopping triggered")
            break

Finally, we save the best model weights to disk and restore the best model weights after training is complete.

# Save the best model to disk
torch.save(best_model_wts, 'best_model.pth')
print("Best model saved to best_model.pth")

# Restore the best model weights
model.load_state_dict(best_model_wts)
print("Best model weights restored")

If we want to restore the saved weights from the disk:

# Load the best model weights from disk
model.load_state_dict(torch.load('best_model.pth'))
print("Best model weights restored")

Combined code:

download codes

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data preprocessing and loading
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('.', train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Define a simple model
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Parameters for early stopping
patience = 5
best_loss = float('inf')
counter = 0
best_model_wts = model.state_dict()

# Training loop with early stopping
for epoch in range(50):
    model.train()
    for inputs, targets in train_loader:
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the inputs
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

    # Validation phase
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs = inputs.view(inputs.size(0), -1)  # Flatten the inputs
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            val_loss += loss.item()

    val_loss /= len(val_loader)

    print(f"Epoch {epoch+1}, Validation Loss: {val_loss}")

    # Check if validation loss improved
    if val_loss < best_loss:
        best_loss = val_loss
        counter = 0
        best_model_wts = model.state_dict()  # Save the best model weights
    else:
        counter += 1
        if counter >= patience:
            print("Early stopping triggered")
            break

# Save the best model to disk
torch.save(best_model_wts, 'best_model.pth')
print("Best model saved to best_model.pth")

# Restore the best model weights
model.load_state_dict(best_model_wts)
print("Best model weights restored")

# Load the best model weights from disk
model.load_state_dict(torch.load('best_model.pth'))
print("Best model weights restored")

Summary Note:

State Dictionary: model.state_dict() is used to save and load the model’s weights. This dictionary contains all the parameters of the model.
Saving Best Weights: During training, whenever the validation loss improves, update best_model_wts with the current state dictionary of the model.
Restoring Best Weights: After early stopping is triggered, restore the model’s weights from best_model_wts.

Practical Considerations

Model Checkpointing: It’s often a good idea to save the best model weights to disk using torch.save for added safety, especially when training large models or on systems where interruptions might occur.
Validation Metric: Ensure that the validation metric used for early stopping is appropriate for your task. While validation loss is common, other metrics like accuracy or F1-score might be more suitable depending on the application.
Patience Parameter: The choice of patience can significantly affect the training outcome. It may require some experimentation to find the right value for your specific task and dataset.

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Early Stopping & Restore Best Weights & Codes in PyTorch on MNIST dataset

Steps to Restore Best Weights

Implementation Example in PyTorch on the MNIST dataset:

Summary Note:

Practical Considerations

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Steps to Restore Best Weights

Implementation Example in PyTorch on the MNIST dataset:

Summary Note:

Practical Considerations

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply