Learning Rate strategy & PyTorch codes

The learning rate is a hyperparameter that determines the size of the steps taken during the optimization process to update the model parameters. One can analogize it to riding a bike in a valley:

  • If you bike slowly, you move with great precision, but it’s too slow to reach the destination. This is similar to having a low learning rate in deep learning: your model learns slowly but carefully. It makes small adjustments to the weights, which can help in avoiding overshooting the optimal solution but can also make the learning process very slow.
  • Biking fast, however, requires more effort and can be harder to control. This is like having a high learning rate in deep learning: your model makes larger adjustments to the weights, which can speed up learning but also risks overshooting the optimal solution, leading to instability or missing the best performance altogether. It’s like pedaling too fast can lead to dangerous dancing bicycle moves on the mountain slopes.

Just as you need to find the right gear to balance speed and control while riding up the hill, in deep learning, you need to find the right learning rate. Too fast or too slow can cause issues. But how?

Learning Rate strategies

Using Fixed Learning Rate: A constant learning rate throughout the training process. It requires careful tuning before training.


Use a learning rate scheduler, i.e., to adjust the learning rate during training based on certain conditions or epochs. For example:
• Step Decay: Reduces the learning rate by a factor at specific epochs.
• Exponential Decay: Decreases the learning rate exponentially over epochs.
• Plateau-Based Reduction: Reduces the learning rate when a metric (e.g., validation loss) stops improving.


Use an adaptive learning rate algorithm:
Algorithms like AdaGrad, RMSprop, and Adam adjust the learning rate for each parameter individually based on their updates, allowing more efficient and effective training.

Example in PyTorch

Here’s a simple example using PyTorch to demonstrate setting a learning rate and using a learning rate scheduler:

First, import the necessary modules from PyTorch:

import torch
import torch.optim as optim #optimization module
import torch.nn as nn

Then, define a simple feed-forward neural network model with two linear (fully connected) layers and a ReLU (Rectified Linear Unit) activation function in between. The input size is 784 (which is typical for flattened 28×28 grayscale images, like those in the MNIST dataset), the hidden layer size is 128, and the output size is 10 (which is typical for classification tasks with 10 classes):

model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

Create a Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01 to update the model weights:

learning_rate = 0.01
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

Create a learning rate scheduler that multiplies the learning rate by gamma=0.1 every step_size=10 epochs, effectively reducing the learning rate over time:

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

Now, let’s define the example data. We create a batch of 64 images of size 28×28 (flattened) and a batch of 64 labels (integer values between 0 and 9). We then wrap this data in a DataLoader:

# Example data
inputs = torch.randn(64, 784)  
targets = torch.randint(0, 10, (64,))

# Wrap in a DataLoader
from torch.utils.data import TensorDataset, DataLoader
dataset = TensorDataset(inputs, targets)
data_loader = DataLoader(dataset, batch_size=64)

Define a loss function. Here we use CrossEntropyLoss which is suitable for classification tasks:

# Define loss function
criterion = nn.CrossEntropyLoss()

Finally, we define the main training loop. For each epoch (iteration over the entire dataset), it iterates over the data_loader to get batches of inputs and targets. It then computes the model’s outputs and the loss. It performs backpropagation using loss.backward() to compute the gradients, then updates the model parameters with optimizer.step(). After each epoch, it updates the learning rate with scheduler.step() and prints the current epoch and learning rate:

for epoch in range(50):
    for inputs, targets in data_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

    scheduler.step()

    print(f"Epoch {epoch+1}, Learning Rate: {scheduler.get_last_lr()[0]}")

Download the code on Github


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!