While there are various methods for uncertainty modeling in neural networks, Monte Carlo (MC) methods are widely used due to their simplicity and ease of implementation, particularly when predicting probabilities or modeling distributions is computationally challenging. They rely on repeated random sampling to approximate the distribution of outcomes. A simple illustration is Bootstrap Resampling.

In bootstrap resampling, multiple datasets are created by sampling with replacement from the training set and a separate model is trained on each dataset.
Bootstrap procedure: For each bootstrapped dataset, a model is trained, yielding a distribution of models. Each model makes predictions, and the variance across these predictions indicates uncertainty. You can see more details with examples at Bootstrap: introductory comics & quizzes. While it’s simple, it has a high computational cost due to the need to train multiple models. So, it’s not scalable for big models and large datasets.
Monte Carlo Dropout (MC Dropout)
This method comes from Dropout as a bayesian approximation: Representing model uncertainty in deep learning, a research paper by Yarin Gal and Zoubin Ghahramani that presents a theoretical framework that casts dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes.
Key Points:
- Model Uncertainty: Traditional deep learning models do not capture model uncertainty, whereas Bayesian models provide a mathematically grounded framework for reasoning about uncertainty. However, Bayesian methods are often computationally expensive.
- Dropout as Bayesian Inference: The paper shows that dropout can be interpreted as performing approximate Bayesian inference. This means that dropout can be used to estimate model uncertainty without the prohibitive computational cost associated with traditional Bayesian methods.
- Improved Predictive Performance: By using dropout as a Bayesian approximation, the model can extract information that was previously discarded, leading to improvements in predictive log-likelihood and RMSE compared to state-of-the-art methods.
- Applications: The approach has been tested on various network architectures and non-linearities, showing considerable improvements in tasks of regression and classification. It has also been applied to deep reinforcement learning.

To implement Monte Carlo (MC) Dropout in deep learning, you need to use dropout layers during both training and inference. In typical training, dropout is only used during training to prevent overfitting, but in MC Dropout, we enable it during inference as well. This allows us to obtain different predictions for the same input by activating different neurons on each forward pass, simulating an ensemble of models.
Here’s an example implementation of MC Dropout in PyTorch using a toy example with feed-forward neural network. Of course, this approach can be applied to more complex models (e.g., CNNs, RNNs) by adding dropout layers and following the same steps for inference with multiple stochastic passes. Scroll down for Tensorflow implementation
Step 1: Import Libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
Step 2: Define the Model with Dropout Layers
Define a model with dropout layers, ensuring you use nn.Dropout
in key locations. Here’s an example using a simple feedforward network.
class MC_Dropout_Model(nn.Module):
def __init__(self, input_size, hidden_size, output_size, dropout_rate=0.5):
super(MC_Dropout_Model, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.dropout = nn.Dropout(dropout_rate)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout(x) # Apply dropout during inference
x = self.fc2(x)
return x
Step 3: Enable MC Dropout During Inference
By default, model.eval()
disables dropout, but for MC Dropout, we want dropout enabled even in evaluation mode. The workaround is to call the model with model.train()
during inference but avoid updating any weights.
To obtain a prediction with MC Dropout, perform multiple forward passes and average the predictions.
Step 4: Define MC Dropout Prediction Function
This function performs n_samples
forward passes and averages the predictions to obtain an uncertainty estimate.
def mc_dropout_predict(model, x, n_samples=100):
model.train() # Keep dropout layers active during inference
predictions = []
with torch.no_grad():
for _ in range(n_samples):
predictions.append(model(x))
# Stack predictions for each pass along a new dimension and calculate mean and variance
predictions = torch.stack(predictions)
mean_prediction = predictions.mean(dim=0)
uncertainty = predictions.var(dim=0)
return mean_prediction, uncertainty
Step 5: Example Usage
Here’s an example usage with dummy data:
# Initialize model, dummy data, and parameters
input_size = 10
hidden_size = 20
output_size = 1
dropout_rate = 0.5
n_samples = 100
model = MC_Dropout_Model(input_size, hidden_size, output_size, dropout_rate)
# Dummy input tensor
x = torch.randn((1, input_size))
# Get predictions and uncertainty
mean_prediction, uncertainty = mc_dropout_predict(model, x, n_samples=n_samples)
print("Mean Prediction:", mean_prediction)
print("Uncertainty (Variance):", uncertainty)
Explanation:
- Model Definition: We define a simple neural network with a dropout layer.
- Inference with Dropout: Instead of using
model.eval()
, we keep the model intrain()
mode during inference to ensure dropout layers remain active. - MC Dropout Prediction: By performing multiple forward passes, we get a set of predictions for the input. Averaging these predictions gives the mean prediction, and calculating the variance provides an estimate of uncertainty.
Combined codes:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
# Define the model with Dropout layers
class MC_Dropout_Model(nn.Module):
def __init__(self, input_size, hidden_size, output_size, dropout_rate=0.5):
super(MC_Dropout_Model, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.dropout = nn.Dropout(dropout_rate)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout(x) # Apply dropout during inference
x = self.fc2(x)
return x
# Function to perform MC Dropout inference
def mc_dropout_predict(model, x, n_samples=100):
model.train() # Keep dropout layers active during inference
predictions = []
with torch.no_grad():
for _ in range(n_samples):
predictions.append(model(x))
# Stack predictions for each pass along a new dimension and calculate mean and variance
predictions = torch.stack(predictions)
mean_prediction = predictions.mean(dim=0)
uncertainty = predictions.var(dim=0)
return mean_prediction, uncertainty
# Example Usage
if __name__ == "__main__":
# Initialize model, dummy data, and parameters
input_size = 10
hidden_size = 20
output_size = 1
dropout_rate = 0.5
n_samples = 100
model = MC_Dropout_Model(input_size, hidden_size, output_size, dropout_rate)
# Dummy input tensor
x = torch.randn((1, input_size))
# Get predictions and uncertainty
mean_prediction, uncertainty = mc_dropout_predict(model, x, n_samples=n_samples)
print("Mean Prediction:", mean_prediction)
print("Uncertainty (Variance):", uncertainty)
Which drop out rate that should I use?
The dropout rate in neural networks is typically chosen through experimentation, as it can vary depending on the specific dataset and model architecture. However, a commonly used starting point is a dropout rate of 0.5 (or 50%) for hidden layers. This means that during training, each neuron has a 50% chance of being “dropped out” or ignored, which helps prevent overfitting by ensuring that the network does not become too reliant on any single neuron. In the original paper titled “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, the authors also typically used a dropout rate of 0.5 (or 50%) for hidden layers.
It’s important to note that the optimal dropout rate can differ for different layers and different problems. You might need to experiment with different rates (e.g., 0.2, 0.3, 0.4, etc.) to find the best value for your specific use case.
More complex example
An example with Bayesian DenseNet – 169 can be seen in the paper Risk-aware machine learning classifier for skin lesion diagnosis by A. Mobiny, A. Singh, H. Van Nguyen – Journal of Clinical Medicine, 2019, and the corresponding implementation on GitHub.
More methods with implementation:
Deep Gaussian Processes implementation from GPyTorch package. They have the implementation for Doubly Stochastic Variational Inference for Deep Gaussian Processes and many other algorithms.
Tensorflow implementation
Sure, let’s break down the code step by step with explanations followed by each corresponding code chunk:
Import Libraries
We start by importing TensorFlow and numpy libraries.
import tensorflow as tf
import numpy as np
Define the Model with Dropout Layers
We define a neural network model with dropout layers. The model consists of an input layer, a hidden layer with ReLU activation, a dropout layer, and an output layer.
# Define the model with Dropout layers
class MC_Dropout_Model(tf.keras.Model):
def __init__(self, input_shape, hidden_units, output_units, dropout_rate=0.5):
super(MC_Dropout_Model, self).__init__()
self.fc1 = tf.keras.layers.Dense(hidden_units, activation='relu', input_shape=input_shape)
self.dropout = tf.keras.layers.Dropout(dropout_rate)
self.fc2 = tf.keras.layers.Dense(output_units)
def call(self, inputs, training=False):
x = self.fc1(inputs)
x = self.dropout(x, training=training) # Apply dropout even during inference
return self.fc2(x)
Function to Perform Monte Carlo Dropout Inference
We define a function to perform Monte Carlo Dropout inference. It makes multiple forward passes through the model with dropout activated to estimate the mean and variance of predictions.
# Function to perform MC Dropout inference
def mc_dropout_predict(model, x, n_samples=100):
predictions = []
for _ in range(n_samples):
# Forward pass with dropout activated
predictions.append(model(x, training=True)) # Keep dropout active
# Stack predictions across samples and calculate mean and variance
predictions = tf.stack(predictions)
mean_prediction = tf.reduce_mean(predictions, axis=0)
uncertainty = tf.math.reduce_variance(predictions, axis=0)
return mean_prediction, uncertainty
Example Usage
We provide an example of how to use the model and the function to perform Monte Carlo Dropout inference.
# Example Usage
if __name__ == "__main__":
# Initialize model parameters
input_shape = (10,)
hidden_units = 20
output_units = 1
dropout_rate = 0.5
n_samples = 100
# Create the model instance
model = MC_Dropout_Model(input_shape=input_shape, hidden_units=hidden_units, output_units=output_units, dropout_rate=dropout_rate)
# Dummy input tensor
x = tf.random.normal((1, 10)) # Batch size of 1, input dimension of 10
# Run MC Dropout to get predictions and uncertainty
mean_prediction, uncertainty = mc_dropout_predict(model, x, n_samples=n_samples)
print("Mean Prediction:", mean_prediction.numpy())
print("Uncertainty (Variance):", uncertainty.numpy())
All codes
import tensorflow as tf
import numpy as np
# Define the model with Dropout layers
class MC_Dropout_Model(tf.keras.Model):
def __init__(self, input_shape, hidden_units, output_units, dropout_rate=0.5):
super(MC_Dropout_Model, self).__init__()
self.fc1 = tf.keras.layers.Dense(hidden_units, activation='relu', input_shape=input_shape)
self.dropout = tf.keras.layers.Dropout(dropout_rate)
self.fc2 = tf.keras.layers.Dense(output_units)
def call(self, inputs, training=False):
x = self.fc1(inputs)
x = self.dropout(x, training=training) # Apply dropout even during inference
return self.fc2(x)
# Function to perform MC Dropout inference
def mc_dropout_predict(model, x, n_samples=100):
predictions = []
for _ in range(n_samples):
# Forward pass with dropout activated
predictions.append(model(x, training=True)) # Keep dropout active
# Stack predictions across samples and calculate mean and variance
predictions = tf.stack(predictions)
mean_prediction = tf.reduce_mean(predictions, axis=0)
uncertainty = tf.math.reduce_variance(predictions, axis=0)
return mean_prediction, uncertainty
# Example Usage
if __name__ == "__main__":
# Initialize model parameters
input_shape = (10,)
hidden_units = 20
output_units = 1
dropout_rate = 0.5
n_samples = 100
# Create the model instance
model = MC_Dropout_Model(input_shape=input_shape, hidden_units=hidden_units, output_units=output_units, dropout_rate=dropout_rate)
# Dummy input tensor
x = tf.random.normal((1, 10)) # Batch size of 1, input dimension of 10
# Run MC Dropout to get predictions and uncertainty
mean_prediction, uncertainty = mc_dropout_predict(model, x, n_samples=n_samples)
print("Mean Prediction:", mean_prediction.numpy())
print("Uncertainty (Variance):", uncertainty.numpy())
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.