📐 Definitions (for clarity)
Let errors be .
MAE
MSE
RMSE
📐 Relationship Between
and 
Let the errors be
Then:
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Key Relationship
1. Jensen’s Inequality gives:
Why?
Because the square function is convex, so:
This means: MSE is always ≥ (MAE)²
When are they equal?
only when all errors have the same magnitude, i.e.:
This almost never happens in real data.
Intuition
MSE penalizes large errors more because of the square.
MAE treats all errors linearly.
So:
- If your error distribution has outliers,
- If your errors are uniform and small,
A useful ratio
A common diagnostic is:
- ≈ 1 → errors are uniform
- ≫ 1 → heavy-tailed errors or outliers
- < 1 → impossible (violates Jensen)
Relationship Between MAE² and MSE (recap)
From Jensen’s inequality:
Equality only when all errors have the same magnitude.
Now Add RMSE Into the Picture
Since:
we immediately get:
because:
This is a strict inequality unless all errors have identical magnitude.
Summary of All Relationships
1. MSE vs MAE²
2. RMSE vs MAE
3. RMSE vs MAE²
4. Combined chain
Intuition
RMSE penalizes large errors more strongly than MAE because of the square.
- If your error distribution has outliers, RMSE will be much larger than MAE.
- If your errors are uniform, RMSE ≈ MAE.
A useful diagnostic ratio
Interpretation:
≈ 1 → errors are uniform, no large outliers
> 1.2 → moderate spread in error magnitudes
> 2 → heavy-tailed errors or significant outliers
This ratio is widely used in ML model evaluation to understand error distribution shape.
Discover more from Knowledge sparks
Subscribe to get the latest posts sent to your email.