Skip to content
Home » For a convex quadratic function (like the MSE loss in linear regression), the Lipschitz constant L of the gradient is equal to the largest eigenvalue of the Hessian.

For a convex quadratic function (like the MSE loss in linear regression), the Lipschitz constant L of the gradient is equal to the largest eigenvalue of the Hessian.

Proof:

Let’s define a general convex quadratic function:

\qquad f(x) = \frac{1}{2} x^T A x - b^T x + c

where x \in \mathbb{R}^n, A \in \mathbb{R}^{n \times n} is a symmetric positive semi-definite matrix (to ensure convexity), b \in \mathbb{R}^n, and c \in \mathbb{R}.

The gradient of this function is:

\qquad \nabla f(x) = Ax - b

Lipschitz Continuity

A function \nabla f(x) is Lipschitz continuous if there exists a constant L > 0 such that for all x, y:

\qquad ||\nabla f(x) - \nabla f(y)|| \leq L ||x - y||

where ||\cdot|| denotes a vector norm (e.g., the Euclidean norm).
We want to show that for our quadratic function, L is equal to the largest eigenvalue of A.

Let’s compute the difference of the gradients at two points x and y:

\qquad \nabla f(x) - \nabla f(y) = (Ax - b) - (Ay - b) = A(x - y)

Now, take the norm:

\qquad ||\nabla f(x) - \nabla f(y)|| = ||A(x - y)||
We need to find a bound for ||A(x - y)|| in terms of ||x - y||.
From linear algebra, we know that for any vector v and symmetric matrix A: \qquad ||Av|| \leq ||A|| \cdot ||v|| where ||A|| is the operator norm (or spectral norm) of the matrix A.
For a symmetric matrix, the operator norm is equal to its largest eigenvalue (in absolute value). Since A is positive semi-definite, all its eigenvalues are non-negative, so the operator norm is simply the largest eigenvalue. Let’s denote the largest eigenvalue of A as $\lambda_{max}(A)$.
Therefore:

\qquad ||A(x - y)|| \leq \lambda_{max}(A) \cdot ||x - y||
Substituting back into our Lipschitz continuity equation:

\qquad ||\nabla f(x) - \nabla f(y)|| \leq \lambda_{max}(A) \cdot ||x - y||

This shows that the gradient of the quadratic function \nabla f(x) = Ax - b is Lipschitz continuous with Lipschitz constant L = \lambda_{max}(A), where $latex\lambda_{max}(A)$ is the largest eigenvalue of the Hessian matrix A.
Since the MSE loss function in linear regression is a convex quadratic function, this result applies directly. The Hessian of the MSE loss is X^T X, and its largest eigenvalue is the Lipschitz constant of the gradient of the MSE loss.

Leave a Reply

error: Content is protected !!