For a convex quadratic function (like the MSE loss in linear regression), the Lipschitz constant L of the gradient is equal to the largest eigenvalue of the Hessian.

Proof:

Let’s define a general convex quadratic function:

\qquad f(x) = \frac{1}{2} x^T A x - b^T x + c

where x \in \mathbb{R}^n, A \in \mathbb{R}^{n \times n} is a symmetric positive semi-definite matrix (to ensure convexity), b \in \mathbb{R}^n, and c \in \mathbb{R}.

The gradient of this function is:

\qquad \nabla f(x) = Ax - b

Lipschitz Continuity

A function \nabla f(x) is Lipschitz continuous if there exists a constant L > 0 such that for all x, y:

\qquad ||\nabla f(x) - \nabla f(y)|| \leq L ||x - y||

where ||\cdot|| denotes a vector norm (e.g., the Euclidean norm).
We want to show that for our quadratic function, L is equal to the largest eigenvalue of A.

Let’s compute the difference of the gradients at two points x and y:

\qquad \nabla f(x) - \nabla f(y) = (Ax - b) - (Ay - b) = A(x - y)

Now, take the norm:

\qquad ||\nabla f(x) - \nabla f(y)|| = ||A(x - y)||
We need to find a bound for ||A(x - y)|| in terms of ||x - y||.
From linear algebra, we know that for any vector v and symmetric matrix A: \qquad ||Av|| \leq ||A|| \cdot ||v|| where ||A|| is the operator norm (or spectral norm) of the matrix A.
For a symmetric matrix, the operator norm is equal to its largest eigenvalue (in absolute value). Since A is positive semi-definite, all its eigenvalues are non-negative, so the operator norm is simply the largest eigenvalue. Let’s denote the largest eigenvalue of A as $\lambda_{max}(A)$.
Therefore:

\qquad ||A(x - y)|| \leq \lambda_{max}(A) \cdot ||x - y||
Substituting back into our Lipschitz continuity equation:

\qquad ||\nabla f(x) - \nabla f(y)|| \leq \lambda_{max}(A) \cdot ||x - y||

This shows that the gradient of the quadratic function \nabla f(x) = Ax - b is Lipschitz continuous with Lipschitz constant L = \lambda_{max}(A), where $latex\lambda_{max}(A)$ is the largest eigenvalue of the Hessian matrix A.
Since the MSE loss function in linear regression is a convex quadratic function, this result applies directly. The Hessian of the MSE loss is X^T X, and its largest eigenvalue is the Lipschitz constant of the gradient of the MSE loss.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!