Inverse function theorem for Lipschitz functions

Recently, while wandering the corridors of the Mathematics Department, I overheard one of the graduate students explaining the Inverse Function Theorem to some first-year undergraduates. On a non-rigorous level, the Inverse Function Theorem is one of the most accessible (or “obvious”) results in elementary calculus: if the graph of a function y = f(x) has a well-defined and non-zero slope (derivative) s at some point x0, then

  1. we ought to be able to write x as a function of y, i.e. x = f−1(y) for y near f(x0),
  2. and, moreover, the slope of the inverse function f−1 at f(x0) should be 1s.

The “visual proof” of this statement amounts to sketching the graph of f, observing that the graph of f−1 (if the inverse function exists at all) is the graph of f with the x and y axes interchanged, and hence that if the slope of f is approximately ΔyΔx then the slope of f−1 is approximately ΔxΔy, i.e. the reciprocal of that of f.

Recall that the derivative of a function f: ℝn → ℝm is the rectangular m × n matrix of partial derivatives

\displaystyle \mathrm{D} f(x) = \begin{bmatrix} \dfrac{\partial f_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_{m}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}} \end{bmatrix}

whenever all these partial derivatives exist. With this notation, a more careful statement of the Inverse Function Theorem is that if f: ℝn → ℝn is continuously differentiable in a neighbourhood of x0 and the square n × n matrix of partial derivatives Df(x0) is invertible, then there exist neighbourhoods U of x0 and V of f(x0) and a continuously differentiable function gV → ℝn (called a local inverse for f) such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

An interesting question to ask is whether one really needs continuous differentiability of f. For example, Rademacher’s theorem says that whenever f satisfies a Lipschitz condition of the form

\displaystyle | f(x) - f(y) | \leq C | x - y | \text{ for all } x, y \in \mathbb{R}^{n}

for some constant C ≥ 0 it follows that f is differentiable almost everywhere in ℝn with derivative having norm at most C. Is this sufficient? It turns out, courtesy of a theorem of F. H. Clarke, that the Inverse Function Theorem does hold true for Lipschitz functions provided that one adopts the right generalized interpretation of the derivative of f.

The (set-valued) generalized derivative Df(x0) of f: ℝn → ℝm at x0 is defined to be the convex hull of the set of all matrices M ∈ ℝm×n that arise as a limit

\displaystyle M = \lim_{k \to \infty} \mathrm{D} f(x_{k})

for some sequence (xk) in ℝn of differentiability points of f that converges to x0. One can show that, when f satisfies a Lipschitz condition in a neighbourhood of x0, Df(x0) is a non-empty, compact, convex subset of ℝm×n. The generalized derivative Df(x0) is said to be of maximal rank if every M ∈ Df(x0) has maximal rank (i.e. has rank(M) = min(mn)).

Theorem. (Clarke, 1976) If f: ℝn → ℝn satisfies a Lipschitz condition in some neighbourhood of x0 and Df(x) ⊆ ℝn is of maximal rank, then there exist neighbourhoods U of x0 and V of f(x0) and a Lipschitz function gV → ℝn such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

It’s very important to note the maximal rank condition in Clarke’s Inverse Function Theorem: we need every matrix M in the generalized derivative to be non-singular. So, for example, the absolute value function on the real line ℝ does not satisfy the hypotheses of Clarke’s theorem at x = 0, even though it is Lipschitz with Lipschitz constant 1, since its generalized derivative at 0 is

\displaystyle \mathrm{D} |0| = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| -1 \leq \ell \leq 1 \bigr\},

which contains the non-invertible derivative matrix [0]. It is hardly surprising that the Inverse Function Theorem cannot be applied here since the absolute value function is non-injective in any neighbourhood of 0: both +δ and −δ map to +δ. On the other hand, the function f defined by

\displaystyle f(x) := 2 x + |x| = \begin{cases} x, & \text{if } x \leq 0 \\ 3x, & \text{if } x \geq 0. \end{cases}

has generalized derivative at 0 given by

\displaystyle \mathrm{D} f(0) = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| 1 \leq \ell \leq 3 \bigr\},

which is of maximal rank. The local (in fact, global) Lipschitz inverse of this function f is, unsurprisingly,

\displaystyle f^{-1}(y) := \begin{cases} y, & \text{if } y \leq 0 \\ y/3, & \text{if } y \geq 0. \end{cases}