Batch normalization & Codes in PyTorch

Data normalization is a crucial preprocessing step for various machine learning algorithms. It ensures features are treated fairly, without depending on their measurement units.

Recall that Min-Max Scaling scales the data to a fixed range, typically [0, 1] or [-1, 1]:
x' = \frac{x - \text{min}(x)}{\text{max}(x) - \text{min}(x)}

Min-Max Scaling, however, depends on the min and max values of the feature. So, it is sensitive to outliers.

Also, recall that Standardization (normalization) scales the data to have zero mean and unit variance:
x' = \frac{x - \mu}{\sigma} latex
So, it is less sensitive to outliers compared to min-max scaling.

Batch normalization standardizes the inputs to a layer for each mini-batch. Normalizing the inputs to each layer ensures that the distribution of inputs remains consistent throughout training, which can lead to faster convergence.

Batch normalization Steps:

  1. Compute the mean (\mu_B) and variance (\sigma_B^2) of the mini-batch.
  2. Normalize: Normalize the batch by subtracting the mean and dividing by the standard deviation.

Given an input x from the mini-batch, batch normalization transforms it as:
\hat{x} = \frac{x - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}
Where \mu_B is the mini-batch mean, \sigma_B^2 is the mini-batch variance, \epsilon is a small constant added for numerical stability.

Then, to restore the representation power of the model, a transformation step is taken:
y = \gamma \hat{x} + \beta
Where \gamma and \beta are learnable parameters.
The output y is then passed to the next layer, and the process continues.

Batch normalization layers are typically placed after the linear transformation and before the activation function. Small batch sizes can lead to noisy estimates of the mean and variance, which can destabilize training. If batch sizes are too small, consider alternatives like Layer Normalization or Group Normalization.

Batch normalization is a technique to improve the training of deep neural networks. It standardizes the inputs to a layer for each mini-batch, stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks. Here’s an in-depth look at batch normalization, its importance, how it works, and how to implement it.

Implementation in PyTorch

Here’s how you can implement batch normalization in PyTorch:

First, import the necessary modules from PyTorch.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

Next, create some toy data. In this case, x is a tensor of 100 samples, where each sample is a 10-dimensional feature vector, and y is a tensor of 100 target values.

# Toy data
x = torch.randn(100, 10)  # 100 samples, 10 features each
y = torch.randn(100, 1)  # 100 targets

Now, wrap the data in a Dataset and DataLoader for minibatch training. The DataLoader will allow us to iterate over the dataset in minibatches of size 20.

# Wrap in a Dataset and DataLoader for minibatch training
dataset = TensorDataset(x, y)
data_loader = DataLoader(dataset, batch_size=20)  # Minibatch size of 20

Define a simple model that includes a batch normalization layer. The nn.BatchNorm1d(5) line adds a batch normalization layer, which normalizes the output from the preceding layer. The number 5 in BatchNorm1d(5) is the size of the input feature dimension to the batch normalization layer, which should match the output size of the previous layer.

# Define a simple model with batch normalization
model = nn.Sequential(
    nn.Linear(10, 5),
    nn.BatchNorm1d(5),  # Batch normalization
    nn.ReLU(),
    nn.Linear(5, 1)
)

Define the loss function as mean squared error, which is a common choice for regression tasks, and an optimizer, in this case, Stochastic Gradient Descent (SGD).

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

Lastly, define the training loop. For each epoch (a full pass over the dataset), iterate over the DataLoader, which yields minibatches of inputs and targets. For each minibatch, perform the forward pass, compute the loss, perform backpropagation to compute gradients, and then update the model’s parameters. The loss for the last minibatch of each epoch is printed out.

# Training loop
for epoch in range(50):
    for batch_x, batch_y in data_loader:
        optimizer.zero_grad()
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

    print(f'Epoch {epoch+1}, Loss: {loss.item()}')

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!