EM algorithm

The Expectation-Maximization (EM) algorithm is an iterative approach to estimate the parameters of probabilistic models, such as a Gaussian (normal) distribution, when the data is incomplete or has missing values. It alternates between two steps:

E-step (Expectation): Estimate the hidden/missing data based on the current parameter estimates.
M-step (Maximization): Update the parameters to maximize the likelihood function given the estimated hidden data.

Let’s walk through an example using the EM algorithm to estimate the parameters (mean and variance) of a normal distribution from generated data. I’ll:

Generate sample data from a Gaussian distribution.
Introduce some missing data.
Apply the EM algorithm to estimate the parameters.

Step 1: Generate data

Let’s create data from a Gaussian distribution with true mean μ = 5 and variance σ² = 2.

Let’s generate 80 observed data points (after removing 20% as missing data). The observed mean and variance are approximately:

Mean: 4.91
Variance: 1.44

Now, let’s apply the EM algorithm to estimate the true parameters (mean and variance).

Step 2: EM Algorithm for Normal Distribution

We want to estimate:

Mean $\mu$
Variance $\sigma^2$

The EM algorithm works as follows:

Initialization: Start with initial guesses for $\mu$ and $\sigma^2$ .
E-step: Estimate the expected values of the missing data.
M-step: Update the estimates of $\mu$ and $\sigma^2$ .
Repeat until convergence.

Here’s the full Python code to implement the EM algorithm for estimating parameters of a normal distribution, including the generation of data and convergence of the algorithm.

Step 1: Data Generation

import numpy as np
import matplotlib.pyplot as plt

# Generate data from a normal distribution (mean=5, variance=2)
np.random.seed(42)
true_mean = 5
true_variance = 2
data = np.random.normal(true_mean, np.sqrt(true_variance), 100)

# Introduce 20% missing values
missing_indices = np.random.choice(len(data), size=int(0.2 * len(data)), replace=False)
observed_data = np.delete(data, missing_indices)

# Plot the observed data
plt.hist(observed_data, bins=15, density=True, alpha=0.6, color='g')
plt.title("Observed Data (with Missing Values)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

import numpy as np
import matplotlib.pyplot as plt

# Generate data from a normal distribution (mean=5, variance=2)
np.random.seed(42)
true_mean = 5
true_variance = 2
data = np.random.normal(true_mean, np.sqrt(true_variance), 100)

# Introduce 20% missing values
missing_indices = np.random.choice(len(data), size=int(0.2 * len(data)), replace=False)
observed_data = np.delete(data, missing_indices)

# Plot the observed data
plt.hist(observed_data, bins=15, density=True, alpha=0.6, color='g')
plt.title("Observed Data (with Missing Values)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

Step 2: EM Algorithm for Normal Distribution

# EM Algorithm for estimating mean and variance of a normal distribution
def em_algorithm(observed_data, max_iter=100, tol=1e-6):
    # Step 1: Initialize parameters (mean and variance)
    mu = np.mean(observed_data)
    sigma_squared = np.var(observed_data)
    
    for iteration in range(max_iter):
        # E-step: No hidden variables in this simple example; we skip to M-step.
        
        # M-step: Update estimates for mean and variance
        new_mu = np.mean(observed_data)
        new_sigma_squared = np.var(observed_data)
        
        # Check for convergence
        if abs(new_mu - mu) < tol and abs(new_sigma_squared - sigma_squared) < tol:
            break
        
        mu = new_mu
        sigma_squared = new_sigma_squared

    return mu, sigma_squared, iteration

# Run the EM algorithm
estimated_mu, estimated_sigma_squared, iterations = em_algorithm(observed_data)

# Print results
print(f"Estimated Mean: {estimated_mu:.4f}")
print(f"Estimated Variance: {estimated_sigma_squared:.4f}")
print(f"Converged in {iterations} iterations")

# EM Algorithm for estimating mean and variance of a normal distribution
def em_algorithm(observed_data, max_iter=100, tol=1e-6):
    # Step 1: Initialize parameters (mean and variance)
    mu = np.mean(observed_data)
    sigma_squared = np.var(observed_data)
    
    for iteration in range(max_iter):
        # E-step: No hidden variables in this simple example; we skip to M-step.
        
        # M-step: Update estimates for mean and variance
        new_mu = np.mean(observed_data)
        new_sigma_squared = np.var(observed_data)
        
        # Check for convergence
        if abs(new_mu - mu) < tol and abs(new_sigma_squared - sigma_squared) < tol:
            break
        
        mu = new_mu
        sigma_squared = new_sigma_squared

    return mu, sigma_squared, iteration

# Run the EM algorithm
estimated_mu, estimated_sigma_squared, iterations = em_algorithm(observed_data)

# Print results
print(f"Estimated Mean: {estimated_mu:.4f}")
print(f"Estimated Variance: {estimated_sigma_squared:.4f}")
print(f"Converged in {iterations} iterations")

EM algorithm for non-missing data

The Expectation-Maximization (EM) algorithm can be useful even when there is no missing data, particularly in situations where latent variables or hidden structure are present in the model. Here are some common scenarios:

1. Mixture Models (e.g., Gaussian Mixture Models)

When you suspect that the data is generated from a combination of multiple distributions (e.g., multiple Gaussians with different means and variances).
EM helps estimate the parameters (means, variances, and mixing coefficients) for each component of the mixture.
Example: Clustering data into different groups using Gaussian Mixture Models (GMMs).

2. Hidden Markov Models (HMMs)

EM is used in HMMs to estimate the transition probabilities, emission probabilities, and initial state probabilities.
Example: Speech recognition, part-of-speech tagging, or time-series prediction.

3. Parameter Estimation for Complex Likelihood Functions

When the likelihood function is difficult to maximize directly (e.g., high-dimensional data, or complex dependency structures).
EM simplifies the optimization problem by introducing latent variables and iteratively maximizing the expected complete-data likelihood.

4. Probabilistic Principal Component Analysis (PPCA)

EM is used to estimate parameters in probabilistic PCA, a probabilistic interpretation of dimensionality reduction.

5. Clustering Problems with Soft Assignments

When you want to assign probabilities for each data point belonging to a cluster (soft clustering), rather than hard assignments.
Example: Soft k-means clustering using EM.

Why Not Use Standard Methods?

EM is preferred when standard methods (e.g., analytical solutions) are impractical due to the complexity of the likelihood function or when you need probabilistic inference with hidden variables.

Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM)

Let’s go through an example of using the Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM) to estimate the parameters for a dataset that is generated from a mixture of two Gaussian distributions.

We’ll:

Generate synthetic data from two normal distributions.
Implement the EM algorithm to estimate the parameters (means, variances, and mixing coefficients).

Step 1: Generate Data from a Mixture of Two Gaussians

We’ll create a dataset that is a mixture of two normal distributions with different means and variances.

We generated 1,000 data points from a mixture of two Gaussian distributions. The approximate statistics of the data are:

Mean: 5.62
Variance: 9.89

Now, let’s implement the EM algorithm to estimate the parameters of the mixture model. We’ll estimate:

Means $(\mu_1, \mu_2)$
Variances $(\sigma_1^2, \sigma_2^2)$
Mixing coefficients $(\pi_1, \pi_2)$

Step 2: EM Algorithm for Gaussian Mixture Model

Here’s the full Python code to estimate parameters for a Gaussian Mixture Model (GMM) using the Expectation-Maximization (EM) algorithm.

Step 1: Data Generation (Mixture of Two Gaussians)

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data from a mixture of two Gaussians
np.random.seed(42)

# Parameters for the two Gaussians
mean1, variance1, weight1 = 2, 1, 0.4  # Mean = 2, Variance = 1, Weight = 0.4
mean2, variance2, weight2 = 8, 1.5, 0.6  # Mean = 8, Variance = 1.5, Weight = 0.6

# Generate 1000 samples from the mixture
n_samples = 1000
n_samples_1 = int(weight1 * n_samples)
n_samples_2 = n_samples - n_samples_1

data1 = np.random.normal(mean1, np.sqrt(variance1), n_samples_1)
data2 = np.random.normal(mean2, np.sqrt(variance2), n_samples_2)
data = np.concatenate([data1, data2])

# Plot the generated data
plt.hist(data, bins=30, density=True, alpha=0.6, color='b')
plt.title("Histogram of Generated Data (Mixture of Two Gaussians)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data from a mixture of two Gaussians
np.random.seed(42)

# Parameters for the two Gaussians
mean1, variance1, weight1 = 2, 1, 0.4  # Mean = 2, Variance = 1, Weight = 0.4
mean2, variance2, weight2 = 8, 1.5, 0.6  # Mean = 8, Variance = 1.5, Weight = 0.6

# Generate 1000 samples from the mixture
n_samples = 1000
n_samples_1 = int(weight1 * n_samples)
n_samples_2 = n_samples - n_samples_1

data1 = np.random.normal(mean1, np.sqrt(variance1), n_samples_1)
data2 = np.random.normal(mean2, np.sqrt(variance2), n_samples_2)
data = np.concatenate([data1, data2])

# Plot the generated data
plt.hist(data, bins=30, density=True, alpha=0.6, color='b')
plt.title("Histogram of Generated Data (Mixture of Two Gaussians)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

Step 2: EM Algorithm for Gaussian Mixture Model

# EM Algorithm for Gaussian Mixture Model (GMM)
def em_gmm(data, n_components=2, max_iter=100, tol=1e-6):
    np.random.seed(42)
    n = len(data)
    
    # Step 1: Initialize parameters
    means = np.random.choice(data, n_components)
    variances = np.random.random(n_components) + 1  # Avoid zero variance
    mixing_coeffs = np.ones(n_components) / n_components  # Equal mixing coefficients
    
    log_likelihoods = []
    
    for iteration in range(max_iter):
        # E-step: Compute responsibilities (posterior probabilities)
        responsibilities = np.zeros((n, n_components))
        
        for k in range(n_components):
            normal_pdf = (1 / np.sqrt(2 * np.pi * variances[k])) * np.exp(-0.5 * ((data - means[k]) ** 2) / variances[k])
            responsibilities[:, k] = mixing_coeffs[k] * normal_pdf
        
        # Normalize to get probabilities
        responsibilities /= responsibilities.sum(axis=1, keepdims=True)
        
        # M-step: Update parameters
        N_k = responsibilities.sum(axis=0)  # Effective number of points assigned to each component
        
        for k in range(n_components):
            means[k] = (responsibilities[:, k] @ data) / N_k[k]
            variances[k] = (responsibilities[:, k] @ ((data - means[k]) ** 2)) / N_k[k]
            mixing_coeffs[k] = N_k[k] / n
        
        # Compute log-likelihood for convergence check
        log_likelihood = np.sum(np.log(np.sum(responsibilities, axis=1)))
        log_likelihoods.append(log_likelihood)
        
        # Check for convergence
        if iteration > 0 and abs(log_likelihoods[-1] - log_likelihoods[-2]) < tol:
            break
    
    return means, variances, mixing_coeffs, iteration

# Run the EM algorithm
estimated_means, estimated_variances, estimated_mixing_coeffs, iterations = em_gmm(data)

# Print results
print(f"Estimated Means: {estimated_means}")
print(f"Estimated Variances: {estimated_variances}")
print(f"Estimated Mixing Coefficients: {estimated_mixing_coeffs}")
print(f"Converged in {iterations} iterations")

# EM Algorithm for Gaussian Mixture Model (GMM)
def em_gmm(data, n_components=2, max_iter=100, tol=1e-6):
    np.random.seed(42)
    n = len(data)
    
    # Step 1: Initialize parameters
    means = np.random.choice(data, n_components)
    variances = np.random.random(n_components) + 1  # Avoid zero variance
    mixing_coeffs = np.ones(n_components) / n_components  # Equal mixing coefficients
    
    log_likelihoods = []
    
    for iteration in range(max_iter):
        # E-step: Compute responsibilities (posterior probabilities)
        responsibilities = np.zeros((n, n_components))
        
        for k in range(n_components):
            normal_pdf = (1 / np.sqrt(2 * np.pi * variances[k])) * np.exp(-0.5 * ((data - means[k]) ** 2) / variances[k])
            responsibilities[:, k] = mixing_coeffs[k] * normal_pdf
        
        # Normalize to get probabilities
        responsibilities /= responsibilities.sum(axis=1, keepdims=True)
        
        # M-step: Update parameters
        N_k = responsibilities.sum(axis=0)  # Effective number of points assigned to each component
        
        for k in range(n_components):
            means[k] = (responsibilities[:, k] @ data) / N_k[k]
            variances[k] = (responsibilities[:, k] @ ((data - means[k]) ** 2)) / N_k[k]
            mixing_coeffs[k] = N_k[k] / n
        
        # Compute log-likelihood for convergence check
        log_likelihood = np.sum(np.log(np.sum(responsibilities, axis=1)))
        log_likelihoods.append(log_likelihood)
        
        # Check for convergence
        if iteration > 0 and abs(log_likelihoods[-1] - log_likelihoods[-2]) < tol:
            break
    
    return means, variances, mixing_coeffs, iteration

# Run the EM algorithm
estimated_means, estimated_variances, estimated_mixing_coeffs, iterations = em_gmm(data)

# Print results
print(f"Estimated Means: {estimated_means}")
print(f"Estimated Variances: {estimated_variances}")
print(f"Estimated Mixing Coefficients: {estimated_mixing_coeffs}")
print(f"Converged in {iterations} iterations")

Explanation

Initialization: The means, variances, and mixing coefficients are initialized randomly.
E-step: Compute the responsibilities for each component using the Gaussian probability density function.
M-step: Update the parameters based on the computed responsibilities.
Log-likelihood: Calculate and monitor the log-likelihood to check for convergence.

EM algorithm

Step 1: Generate data

Step 2: EM Algorithm for Normal Distribution

Step 1: Data Generation

Step 2: EM Algorithm for Normal Distribution

EM algorithm for non-missing data

1. Mixture Models (e.g., Gaussian Mixture Models)

2. Hidden Markov Models (HMMs)

3. Parameter Estimation for Complex Likelihood Functions

4. Probabilistic Principal Component Analysis (PPCA)

5. Clustering Problems with Soft Assignments

Why Not Use Standard Methods?

Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM)

Step 1: Generate Data from a Mixture of Two Gaussians

Step 2: EM Algorithm for Gaussian Mixture Model

Step 1: Data Generation (Mixture of Two Gaussians)

Step 2: EM Algorithm for Gaussian Mixture Model

Explanation

Related

Leave a ReplyCancel reply

EM algorithm

Step 1: Generate data

Step 2: EM Algorithm for Normal Distribution

Step 1: Data Generation

Step 2: EM Algorithm for Normal Distribution

EM algorithm for non-missing data

1. Mixture Models (e.g., Gaussian Mixture Models)

2. Hidden Markov Models (HMMs)

3. Parameter Estimation for Complex Likelihood Functions

4. Probabilistic Principal Component Analysis (PPCA)

5. Clustering Problems with Soft Assignments

Why Not Use Standard Methods?

Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM)

Step 1: Generate Data from a Mixture of Two Gaussians

Step 2: EM Algorithm for Gaussian Mixture Model

Step 1: Data Generation (Mixture of Two Gaussians)

Step 2: EM Algorithm for Gaussian Mixture Model

Explanation

Share this:

Related

Leave a ReplyCancel reply