Proof:
Let’s define a general convex quadratic function:
where ,
is a symmetric positive semi-definite matrix (to ensure convexity),
, and
.
The gradient of this function is:
Lipschitz Continuity
A function is Lipschitz continuous if there exists a constant
such that for all
:
where denotes a vector norm (e.g., the Euclidean norm).
We want to show that for our quadratic function, is equal to the largest eigenvalue of
.
Let’s compute the difference of the gradients at two points and
:
Now, take the norm:
We need to find a bound for in terms of
.
From linear algebra, we know that for any vector and symmetric matrix
:
where
is the operator norm (or spectral norm) of the matrix
.
For a symmetric matrix, the operator norm is equal to its largest eigenvalue (in absolute value). Since is positive semi-definite, all its eigenvalues are non-negative, so the operator norm is simply the largest eigenvalue. Let’s denote the largest eigenvalue of
as $\lambda_{max}(A)$.
Therefore:
Substituting back into our Lipschitz continuity equation:
This shows that the gradient of the quadratic function is Lipschitz continuous with Lipschitz constant
, where $latex\lambda_{max}(A)$ is the largest eigenvalue of the Hessian matrix
.
Since the MSE loss function in linear regression is a convex quadratic function, this result applies directly. The Hessian of the MSE loss is , and its largest eigenvalue is the Lipschitz constant of the gradient of the MSE loss.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.