📐 Definitions (for clarity)
Let errors be .
MAE
MSE
RMSE
📐 Relationship Between
and 
Let the errors be
Then:
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Key Relationship
1. Jensen’s Inequality gives:
Why?
Because the square function is convex, so:
This means: MSE is always ≥ (MAE)²
When are they equal?
only when all errors have the same magnitude, i.e.:
This almost never happens in real data.
Intuition
MSE penalizes large errors more because of the square.
MAE treats all errors linearly.
So:
- If your error distribution has outliers,
- If your errors are uniform and small,
A useful ratio
A common diagnostic is:
- ≈ 1 → errors are uniform
- ≫ 1 → heavy-tailed errors or outliers
- < 1 → impossible (violates Jensen)
Relationship Between MAE² and MSE (recap)
From Jensen’s inequality:
Equality only when all errors have the same magnitude.
Now Add RMSE Into the Picture
Since:
we immediately get:
because:
This is a strict inequality unless all errors have identical magnitude.
Summary of All Relationships
1. MSE vs MAE²
2. RMSE vs MAE
3. RMSE vs MAE²
4. Combined chain
Intuition
RMSE penalizes large errors more strongly than MAE because of the square.
- If your error distribution has outliers, RMSE will be much larger than MAE.
- If your errors are uniform, RMSE ≈ MAE.
A useful diagnostic ratio
Interpretation:
≈ 1 → errors are uniform, no large outliers
> 1.2 → moderate spread in error magnitudes
> 2 → heavy-tailed errors or significant outliers
This ratio is widely used in ML model evaluation to understand error distribution shape.