The Expectation-Maximization (EM) algorithm is an iterative approach to estimate the parameters of probabilistic models, such as a Gaussian (normal) distribution, when the data is incomplete or has missing values. It alternates between two steps:
- E-step (Expectation): Estimate the hidden/missing data based on the current parameter estimates.
- M-step (Maximization): Update the parameters to maximize the likelihood function given the estimated hidden data.
Letโs walk through an example using the EM algorithm to estimate the parameters (mean and variance) of a normal distribution from generated data. Iโll:
- Generate sample data from a Gaussian distribution.
- Introduce some missing data.
- Apply the EM algorithm to estimate the parameters.
Step 1: Generate data
Letโs create data from a Gaussian distribution with true mean ฮผ = 5 and variance ฯยฒ = 2.

Let’s generate 80 observed data points (after removing 20% as missing data). The observed mean and variance are approximately:
- Mean: 4.91
- Variance: 1.44
Now, letโs apply the EM algorithm to estimate the true parameters (mean and variance).
Step 2: EM Algorithm for Normal Distribution
We want to estimate:
- Mean
- Variance
The EM algorithm works as follows:
- Initialization: Start with initial guesses for
and
.
- E-step: Estimate the expected values of the missing data.
- M-step: Update the estimates of
and
.
- Repeat until convergence.
Hereโs the full Python code to implement the EM algorithm for estimating parameters of a normal distribution, including the generation of data and convergence of the algorithm.
Step 1: Data Generation
import numpy as np
import matplotlib.pyplot as plt
# Generate data from a normal distribution (mean=5, variance=2)
np.random.seed(42)
true_mean = 5
true_variance = 2
data = np.random.normal(true_mean, np.sqrt(true_variance), 100)
# Introduce 20% missing values
missing_indices = np.random.choice(len(data), size=int(0.2 * len(data)), replace=False)
observed_data = np.delete(data, missing_indices)
# Plot the observed data
plt.hist(observed_data, bins=15, density=True, alpha=0.6, color='g')
plt.title("Observed Data (with Missing Values)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()
Step 2: EM Algorithm for Normal Distribution
# EM Algorithm for estimating mean and variance of a normal distribution
def em_algorithm(observed_data, max_iter=100, tol=1e-6):
# Step 1: Initialize parameters (mean and variance)
mu = np.mean(observed_data)
sigma_squared = np.var(observed_data)
for iteration in range(max_iter):
# E-step: No hidden variables in this simple example; we skip to M-step.
# M-step: Update estimates for mean and variance
new_mu = np.mean(observed_data)
new_sigma_squared = np.var(observed_data)
# Check for convergence
if abs(new_mu - mu) < tol and abs(new_sigma_squared - sigma_squared) < tol:
break
mu = new_mu
sigma_squared = new_sigma_squared
return mu, sigma_squared, iteration
# Run the EM algorithm
estimated_mu, estimated_sigma_squared, iterations = em_algorithm(observed_data)
# Print results
print(f"Estimated Mean: {estimated_mu:.4f}")
print(f"Estimated Variance: {estimated_sigma_squared:.4f}")
print(f"Converged in {iterations} iterations")
EM algorithm for non-missing data
The Expectation-Maximization (EM) algorithm can be useful even when there is no missing data, particularly in situations where latent variables or hidden structure are present in the model. Here are some common scenarios:
1. Mixture Models (e.g., Gaussian Mixture Models)
- When you suspect that the data is generated from a combination of multiple distributions (e.g., multiple Gaussians with different means and variances).
- EM helps estimate the parameters (means, variances, and mixing coefficients) for each component of the mixture.
- Example: Clustering data into different groups using Gaussian Mixture Models (GMMs).
2. Hidden Markov Models (HMMs)
- EM is used in HMMs to estimate the transition probabilities, emission probabilities, and initial state probabilities.
- Example: Speech recognition, part-of-speech tagging, or time-series prediction.
3. Parameter Estimation for Complex Likelihood Functions
- When the likelihood function is difficult to maximize directly (e.g., high-dimensional data, or complex dependency structures).
- EM simplifies the optimization problem by introducing latent variables and iteratively maximizing the expected complete-data likelihood.
4. Probabilistic Principal Component Analysis (PPCA)
- EM is used to estimate parameters in probabilistic PCA, a probabilistic interpretation of dimensionality reduction.
5. Clustering Problems with Soft Assignments
- When you want to assign probabilities for each data point belonging to a cluster (soft clustering), rather than hard assignments.
- Example: Soft k-means clustering using EM.
Why Not Use Standard Methods?
- EM is preferred when standard methods (e.g., analytical solutions) are impractical due to the complexity of the likelihood function or when you need probabilistic inference with hidden variables.
Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM)
Letโs go through an example of using the Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM) to estimate the parameters for a dataset that is generated from a mixture of two Gaussian distributions.
Weโll:
- Generate synthetic data from two normal distributions.
- Implement the EM algorithm to estimate the parameters (means, variances, and mixing coefficients).

Step 1: Generate Data from a Mixture of Two Gaussians
Weโll create a dataset that is a mixture of two normal distributions with different means and variances.
We generated 1,000 data points from a mixture of two Gaussian distributions. The approximate statistics of the data are:
- Mean: 5.62
- Variance: 9.89
Now, let’s implement the EM algorithm to estimate the parameters of the mixture model. Weโll estimate:
- Means
- Variances
- Mixing coefficients
Step 2: EM Algorithm for Gaussian Mixture Model
Hereโs the full Python code to estimate parameters for a Gaussian Mixture Model (GMM) using the Expectation-Maximization (EM) algorithm.
Step 1: Data Generation (Mixture of Two Gaussians)
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data from a mixture of two Gaussians
np.random.seed(42)
# Parameters for the two Gaussians
mean1, variance1, weight1 = 2, 1, 0.4 # Mean = 2, Variance = 1, Weight = 0.4
mean2, variance2, weight2 = 8, 1.5, 0.6 # Mean = 8, Variance = 1.5, Weight = 0.6
# Generate 1000 samples from the mixture
n_samples = 1000
n_samples_1 = int(weight1 * n_samples)
n_samples_2 = n_samples - n_samples_1
data1 = np.random.normal(mean1, np.sqrt(variance1), n_samples_1)
data2 = np.random.normal(mean2, np.sqrt(variance2), n_samples_2)
data = np.concatenate([data1, data2])
# Plot the generated data
plt.hist(data, bins=30, density=True, alpha=0.6, color='b')
plt.title("Histogram of Generated Data (Mixture of Two Gaussians)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()
Step 2: EM Algorithm for Gaussian Mixture Model
# EM Algorithm for Gaussian Mixture Model (GMM)
def em_gmm(data, n_components=2, max_iter=100, tol=1e-6):
np.random.seed(42)
n = len(data)
# Step 1: Initialize parameters
means = np.random.choice(data, n_components)
variances = np.random.random(n_components) + 1 # Avoid zero variance
mixing_coeffs = np.ones(n_components) / n_components # Equal mixing coefficients
log_likelihoods = []
for iteration in range(max_iter):
# E-step: Compute responsibilities (posterior probabilities)
responsibilities = np.zeros((n, n_components))
for k in range(n_components):
normal_pdf = (1 / np.sqrt(2 * np.pi * variances[k])) * np.exp(-0.5 * ((data - means[k]) ** 2) / variances[k])
responsibilities[:, k] = mixing_coeffs[k] * normal_pdf
# Normalize to get probabilities
responsibilities /= responsibilities.sum(axis=1, keepdims=True)
# M-step: Update parameters
N_k = responsibilities.sum(axis=0) # Effective number of points assigned to each component
for k in range(n_components):
means[k] = (responsibilities[:, k] @ data) / N_k[k]
variances[k] = (responsibilities[:, k] @ ((data - means[k]) ** 2)) / N_k[k]
mixing_coeffs[k] = N_k[k] / n
# Compute log-likelihood for convergence check
log_likelihood = np.sum(np.log(np.sum(responsibilities, axis=1)))
log_likelihoods.append(log_likelihood)
# Check for convergence
if iteration > 0 and abs(log_likelihoods[-1] - log_likelihoods[-2]) < tol:
break
return means, variances, mixing_coeffs, iteration
# Run the EM algorithm
estimated_means, estimated_variances, estimated_mixing_coeffs, iterations = em_gmm(data)
# Print results
print(f"Estimated Means: {estimated_means}")
print(f"Estimated Variances: {estimated_variances}")
print(f"Estimated Mixing Coefficients: {estimated_mixing_coeffs}")
print(f"Converged in {iterations} iterations")
Explanation
- Initialization: The means, variances, and mixing coefficients are initialized randomly.
- E-step: Compute the responsibilities for each component using the Gaussian probability density function.
- M-step: Update the parameters based on the computed responsibilities.
- Log-likelihood: Calculate and monitor the log-likelihood to check for convergence.