Maximum Likelihood Estimation: A Comprehensive Guide with Examples

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probabilistic model such that the observed data is most probable under the model. This approach works by maximizing a likelihood function, which quantifies how likely it is to observe the given data for various parameter values. MLE is widely applicable across different fields, from economics and biology to machine learning and engineering, as it provides a framework for making inferences about populations based on sample data. By seeking the parameter values that make the observed data most likely, researchers can infer characteristics of the underlying process that generated the data, thus enabling more accurate predictions and better decision-making in uncertain environments.


Principle of MLE

Suppose we have a dataset \mathbf{X} = \{x_1, x_2, ..., x_n\} generated from a probability distribution with density function f(x \mid \theta), where \theta is the parameter to be estimated.

MLE finds the value of \theta that maximizes the likelihood function:

L(\theta) = P(X \mid \theta) = \prod_{i=1}^{n} f(x_i \mid \theta)

We usually work with the log-likelihood for simplicity:

\ell(\theta) = \log L(\theta) = \sum_{i=1}^{n} \log f(x_i \mid \theta)


Steps to Find MLE

  1. Write the likelihood function L(\theta)
  2. Take the log to get \ell(\theta)
  3. Compute the derivative \frac{d\ell(\theta)}{d\theta} and solve \frac{d\ell(\theta)}{d\theta} = 0 to find \hat{\theta}
  4. Check second-order condition:
  • \frac{d^2\ell}{d\theta^2} < 0: maximum
  • \frac{d^2\ell}{d\theta^2} > 0: minimum
  • \frac{d^2\ell}{d\theta^2} = 0: check higher-order derivatives

Example 1: Gaussian Distribution \mathcal{N}(\mu, \sigma^2)

Let X_1, ..., X_n \sim \mathcal{N}(\mu, \sigma^2). The PDF is:

f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

Log-likelihood:

\ell(\mu, \sigma^2) = -\frac{n}{2} \log (2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2

Solving:

\hat{\mu} = \frac{1}{n} \sum x_i, \quad \hat{\sigma}^2 = \frac{1}{n} \sum (x_i - \hat{\mu})^2


Example 2: Bernoulli Distribution

Suppose x_i \sim \text{Bernoulli}(p). PMF:

f(x_i \mid p) = p^{x_i}(1 - p)^{1 - x_i}

Likelihood:

L(p) = \prod p^{x_i}(1 - p)^{1 - x_i} = p^{\sum x_i} (1 - p)^{n - \sum x_i}

Let S = \sum x_i, then:

\ell(p) = S \log p + (n - S) \log (1 - p)

Derivative:

\frac{d\ell}{dp} = \frac{S}{p} - \frac{n - S}{1 - p}

Solving:

\hat{p} = \frac{S}{n} = \frac{\sum x_i}{n}


Example 3: Poisson Distribution

Assume x_i \sim \text{Poisson}(\lambda). PMF:

P(X_i = k) = \frac{\lambda^k e^{-\lambda}}{k!}

Likelihood:

L(\lambda) = \lambda^{\sum x_i} e^{-n\lambda} \prod \frac{1}{x_i!}

Log-likelihood (ignoring constants):

\ell(\lambda) = \sum x_i \log \lambda - n\lambda

Derivative:

\frac{d\ell}{d\lambda} = \frac{\sum x_i}{\lambda} - n

Solving:

\hat{\lambda} = \frac{1}{n} \sum x_i


Summary Table of MLEs

DistributionPMF/PDFMLE
Normal \mathcal{N}(\mu, \sigma^2)f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)\hat{\mu} = \bar{x}, \quad \hat{\sigma}^2 = \frac{1}{n} \sum (x_i - \bar{x})^2
Bernoullif(x) = p^x(1 - p)^{1 - x}\hat{p} = \frac{\sum x_i}{n}
Poissonf(x) = \frac{\lambda^x e^{-\lambda}}{x!}\hat{\lambda} = \frac{1}{n} \sum x_i

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!