📐 Definitions (for clarity)

Let errors be $e_i = y_i - \hat{y}_i$ .

MAE
$\text{MAE} = \frac{1}{n}\sum |e_i|$

MSE
$\text{MSE} = \frac{1}{n}\sum e_i^2$

RMSE
$\text{RMSE} = \sqrt{\text{MSE}}$

📐 Relationship Between $(\text{MAE})^2$ and $\text{MSE}$

Let the errors be
$e_i = y_i - \hat{y}_i$

Then:

Mean Absolute Error (MAE)
$\text{MAE} = \frac{1}{n}\sum_{i=1}^{n} |e_i|$

Mean Squared Error (MSE)
$\text{MSE} = \frac{1}{n}\sum_{i=1}^{n} e_i^2$

Key Relationship

1. Jensen’s Inequality gives:

$(\text{MAE})^2 \le \text{MSE}$

Why?

Because the square function is convex, so:
$\left( \frac{1}{n}\sum |e_i| \right)^2 \le \frac{1}{n}\sum |e_i|^2 = \text{MSE}$

This means: MSE is always ≥ (MAE)²

When are they equal?

$(\text{MAE})^2 = \text{MSE}$

only when all errors have the same magnitude, i.e.:

$|e_1| = |e_2| = \dots = |e_n|$

This almost never happens in real data.

Intuition

MSE penalizes large errors more because of the square.
MAE treats all errors linearly.

So:

If your error distribution has outliers,
$\text{MSE} \gg (\text{MAE})^2$
If your errors are uniform and small,
$\text{MSE} \approx (\text{MAE})^2$

A useful ratio

A common diagnostic is:
$\frac{\text{MSE}}{(\text{MAE})^2}$

≈ 1 → errors are uniform
≫ 1 → heavy-tailed errors or outliers
< 1 → impossible (violates Jensen)

Relationship Between MAE² and MSE (recap)

From Jensen’s inequality:

$(\text{MAE})^2 \le \text{MSE}$

Equality only when all errors have the same magnitude.

Now Add RMSE Into the Picture

Since:

$\text{RMSE} = \sqrt{\text{MSE}}$

we immediately get:

$\text{RMSE} \ge \text{MAE}$

because:

$\sqrt{\text{MSE}} \ge \sqrt{(\text{MAE})^2} = \text{MAE}$

This is a strict inequality unless all errors have identical magnitude.

Summary of All Relationships

1. MSE vs MAE²
$\text{MSE} \ge (\text{MAE})^2$

2. RMSE vs MAE
$\text{RMSE} \ge \text{MAE}$

3. RMSE vs MAE²
$\text{RMSE}^2 = \text{MSE} \ge (\text{MAE})^2$

4. Combined chain
$\text{RMSE} \ge \text{MAE} \ge 0$

Intuition

RMSE penalizes large errors more strongly than MAE because of the square.

If your error distribution has outliers, RMSE will be much larger than MAE.
If your errors are uniform, RMSE ≈ MAE.

A useful diagnostic ratio

$\frac{\text{RMSE}}{\text{MAE}}$

Interpretation:

≈ 1 → errors are uniform, no large outliers
> 1.2 → moderate spread in error magnitudes
> 2 → heavy-tailed errors or significant outliers

This ratio is widely used in ML model evaluation to understand error distribution shape.

Relationship Between MAE, MSE, RMSE

📐 Definitions (for clarity)

📐 Relationship Between $(\text{MAE})^2$ and $\text{MSE}$

Key Relationship

1. Jensen’s Inequality gives:

Why?

Intuition

A useful ratio

Relationship Between MAE² and MSE (recap)

Now Add RMSE Into the Picture

Summary of All Relationships

Intuition

A useful diagnostic ratio

Related

Leave a ReplyCancel reply

Relationship Between MAE, MSE, RMSE

📐 Definitions (for clarity)

📐 Relationship Between and

Key Relationship

1. Jensen’s Inequality gives:

Why?

Intuition

A useful ratio

Relationship Between MAE² and MSE (recap)

Now Add RMSE Into the Picture

Summary of All Relationships

Intuition

A useful diagnostic ratio

Share this:

Related

Leave a ReplyCancel reply

📐 Relationship Between $(\text{MAE})^2$ and $\text{MSE}$