Skip to content

Relationship Between MAE, MSE, RMSE

📐 Definitions (for clarity)

Let errors be e_i = y_i - \hat{y}_i.

MAE
\text{MAE} = \frac{1}{n}\sum |e_i|

MSE
\text{MSE} = \frac{1}{n}\sum e_i^2

RMSE
\text{RMSE} = \sqrt{\text{MSE}}

📐 Relationship Between (\text{MAE})^2 and \text{MSE}

Let the errors be
e_i = y_i - \hat{y}_i

Then:

Mean Absolute Error (MAE)
\text{MAE} = \frac{1}{n}\sum_{i=1}^{n} |e_i|

Mean Squared Error (MSE)
\text{MSE} = \frac{1}{n}\sum_{i=1}^{n} e_i^2

Key Relationship

1. Jensen’s Inequality gives:

(\text{MAE})^2 \le \text{MSE}

Why?

Because the square function is convex, so:
\left( \frac{1}{n}\sum |e_i| \right)^2 \le \frac{1}{n}\sum |e_i|^2 = \text{MSE}

This means: MSE is always ≥ (MAE)²

When are they equal?

(\text{MAE})^2 = \text{MSE}

only when all errors have the same magnitude, i.e.:

|e_1| = |e_2| = \dots = |e_n|

This almost never happens in real data.

Intuition

MSE penalizes large errors more because of the square.
MAE treats all errors linearly.

So:

  • If your error distribution has outliers,
    \text{MSE} \gg (\text{MAE})^2
  • If your errors are uniform and small,
    \text{MSE} \approx (\text{MAE})^2

A useful ratio

A common diagnostic is:
\frac{\text{MSE}}{(\text{MAE})^2}

  • ≈ 1 → errors are uniform
  • ≫ 1 → heavy-tailed errors or outliers
  • < 1 → impossible (violates Jensen)

Relationship Between MAE² and MSE (recap)

From Jensen’s inequality:

(\text{MAE})^2 \le \text{MSE}

Equality only when all errors have the same magnitude.

Now Add RMSE Into the Picture

Since:

\text{RMSE} = \sqrt{\text{MSE}}

we immediately get:

\text{RMSE} \ge \text{MAE}

because:

\sqrt{\text{MSE}} \ge \sqrt{(\text{MAE})^2} = \text{MAE}

This is a strict inequality unless all errors have identical magnitude.

Summary of All Relationships

1. MSE vs MAE²
\text{MSE} \ge (\text{MAE})^2

2. RMSE vs MAE
\text{RMSE} \ge \text{MAE}

3. RMSE vs MAE²
\text{RMSE}^2 = \text{MSE} \ge (\text{MAE})^2

4. Combined chain
\text{RMSE} \ge \text{MAE} \ge 0

Intuition

RMSE penalizes large errors more strongly than MAE because of the square.

  • If your error distribution has outliers, RMSE will be much larger than MAE.
  • If your errors are uniform, RMSE ≈ MAE.

A useful diagnostic ratio

\frac{\text{RMSE}}{\text{MAE}}

Interpretation:

≈ 1 → errors are uniform, no large outliers
> 1.2 → moderate spread in error magnitudes
> 2 → heavy-tailed errors or significant outliers

This ratio is widely used in ML model evaluation to understand error distribution shape.


Discover more from Knowledge sparks

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!