For a convex quadratic function (like the MSE loss in linear regression), the Lipschitz constant L of the gradient is equal to the largest eigenvalue of the Hessian.

Proof:

Let’s define a general convex quadratic function:

$\qquad f(x) = \frac{1}{2} x^T A x - b^T x + c$

where $x \in \mathbb{R}^n$ , $A \in \mathbb{R}^{n \times n}$ is a symmetric positive semi-definite matrix (to ensure convexity), $b \in \mathbb{R}^n$ , and $c \in \mathbb{R}$ .

The gradient of this function is:

$\qquad \nabla f(x) = Ax - b$

Lipschitz Continuity

A function $\nabla f(x)$ is Lipschitz continuous if there exists a constant $L > 0$ such that for all $x, y$ :

$\qquad ||\nabla f(x) - \nabla f(y)|| \leq L ||x - y||$

where $||\cdot||$ denotes a vector norm (e.g., the Euclidean norm).
We want to show that for our quadratic function, $L$ is equal to the largest eigenvalue of $A$ .

Let’s compute the difference of the gradients at two points $x$ and $y$ :

$\qquad \nabla f(x) - \nabla f(y) = (Ax - b) - (Ay - b) = A(x - y)$

Now, take the norm:

$\qquad ||\nabla f(x) - \nabla f(y)|| = ||A(x - y)||$
We need to find a bound for $||A(x - y)||$ in terms of $||x - y||$ .
From linear algebra, we know that for any vector $v$ and symmetric matrix $A$ : $\qquad ||Av|| \leq ||A|| \cdot ||v||$ where $||A||$ is the operator norm (or spectral norm) of the matrix $A$ .
For a symmetric matrix, the operator norm is equal to its largest eigenvalue (in absolute value). Since $A$ is positive semi-definite, all its eigenvalues are non-negative, so the operator norm is simply the largest eigenvalue. Let’s denote the largest eigenvalue of $A$ as $\lambda_{max}(A)$.
Therefore:

$\qquad ||A(x - y)|| \leq \lambda_{max}(A) \cdot ||x - y||$
Substituting back into our Lipschitz continuity equation:

$\qquad ||\nabla f(x) - \nabla f(y)|| \leq \lambda_{max}(A) \cdot ||x - y||$

This shows that the gradient of the quadratic function $\nabla f(x) = Ax - b$ is Lipschitz continuous with Lipschitz constant $L = \lambda_{max}(A)$ , where $latex\lambda_{max}(A)$ is the largest eigenvalue of the Hessian matrix $A$ .
Since the MSE loss function in linear regression is a convex quadratic function, this result applies directly. The Hessian of the MSE loss is $X^T X$ , and its largest eigenvalue is the Lipschitz constant of the gradient of the MSE loss.

For a convex quadratic function (like the MSE loss in linear regression), the Lipschitz constant L of the gradient is equal to the largest eigenvalue of the Hessian.

Related

Leave a ReplyCancel reply

For a convex quadratic function (like the MSE loss in linear regression), the Lipschitz constant L of the gradient is equal to the largest eigenvalue of the Hessian.

Share this:

Related

Leave a ReplyCancel reply