Inverse function theorem for Lipschitz functions

Recently, while wandering the corridors of the Mathematics Department, I overheard one of the graduate students explaining the Inverse Function Theorem to some first-year undergraduates. On a non-rigorous level, the Inverse Function Theorem is one of the most accessible (or “obvious”) results in elementary calculus: if the graph of a function y = f(x) has a well-defined and non-zero slope (derivative) s at some point x0, then

  1. we ought to be able to write x as a function of y, i.e. x = f−1(y) for y near f(x0),
  2. and, moreover, the slope of the inverse function f−1 at f(x0) should be 1s.

The “visual proof” of this statement amounts to sketching the graph of f, observing that the graph of f−1 (if the inverse function exists at all) is the graph of f with the x and y axes interchanged, and hence that if the slope of f is approximately ΔyΔx then the slope of f−1 is approximately ΔxΔy, i.e. the reciprocal of that of f.

Recall that the derivative of a function f: ℝn → ℝm is the rectangular m × n matrix of partial derivatives

\displaystyle \mathrm{D} f(x) = \begin{bmatrix} \dfrac{\partial f_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_{m}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}} \end{bmatrix}

whenever all these partial derivatives exist. With this notation, a more careful statement of the Inverse Function Theorem is that if f: ℝn → ℝn is continuously differentiable in a neighbourhood of x0 and the square n × n matrix of partial derivatives Df(x0) is invertible, then there exist neighbourhoods U of x0 and V of f(x0) and a continuously differentiable function gV → ℝn (called a local inverse for f) such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

An interesting question to ask is whether one really needs continuous differentiability of f. For example, Rademacher’s theorem says that whenever f satisfies a Lipschitz condition of the form

\displaystyle | f(x) - f(y) | \leq C | x - y | \text{ for all } x, y \in \mathbb{R}^{n}

for some constant C ≥ 0 it follows that f is differentiable almost everywhere in ℝn with derivative having norm at most C. Is this sufficient? It turns out, courtesy of a theorem of F. H. Clarke, that the Inverse Function Theorem does hold true for Lipschitz functions provided that one adopts the right generalized interpretation of the derivative of f.

The (set-valued) generalized derivative Df(x0) of f: ℝn → ℝm at x0 is defined to be the convex hull of the set of all matrices M ∈ ℝm×n that arise as a limit

\displaystyle M = \lim_{k \to \infty} \mathrm{D} f(x_{k})

for some sequence (xk) in ℝn of differentiability points of f that converges to x0. One can show that, when f satisfies a Lipschitz condition in a neighbourhood of x0, Df(x0) is a non-empty, compact, convex subset of ℝm×n. The generalized derivative Df(x0) is said to be of maximal rank if every M ∈ Df(x0) has maximal rank (i.e. has rank(M) = min(mn)).

Theorem. (Clarke, 1976) If f: ℝn → ℝn satisfies a Lipschitz condition in some neighbourhood of x0 and Df(x) ⊆ ℝn is of maximal rank, then there exist neighbourhoods U of x0 and V of f(x0) and a Lipschitz function gV → ℝn such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

It’s very important to note the maximal rank condition in Clarke’s Inverse Function Theorem: we need every matrix M in the generalized derivative to be non-singular. So, for example, the absolute value function on the real line ℝ does not satisfy the hypotheses of Clarke’s theorem at x = 0, even though it is Lipschitz with Lipschitz constant 1, since its generalized derivative at 0 is

\displaystyle \mathrm{D} |0| = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| -1 \leq \ell \leq 1 \bigr\},

which contains the non-invertible derivative matrix [0]. It is hardly surprising that the Inverse Function Theorem cannot be applied here since the absolute value function is non-injective in any neighbourhood of 0: both +δ and −δ map to +δ. On the other hand, the function f defined by

\displaystyle f(x) := 2 x + |x| = \begin{cases} x, & \text{if } x \leq 0 \\ 3x, & \text{if } x \geq 0. \end{cases}

has generalized derivative at 0 given by

\displaystyle \mathrm{D} f(0) = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| 1 \leq \ell \leq 3 \bigr\},

which is of maximal rank. The local (in fact, global) Lipschitz inverse of this function f is, unsurprisingly,

\displaystyle f^{-1}(y) := \begin{cases} y, & \text{if } y \leq 0 \\ y/3, & \text{if } y \geq 0. \end{cases}


Article: Optimal Uncertainty Quantification

Almost three years on from the initial submission, the article “Optimal Uncertainty Quantification”, jointly written with Houman Owhadi, Clint Scovel, Mike McKerns and Michael Ortiz, is now in print. It will appear in this year’s second-quarter issue of SIAM Review, and is already accessible online for those with SIAM subscriptions; the preprint version can be found at arXiv:1009.0679.

This paper was a real team effort, with everyone bringing different strengths to the table. Given the length of the review process, I think that our corresponding author Houman Owhadi deserves a medal for his patience (as does Ilse Ipsen, the article’s editor at SIAM Review), but, really, congratulations and thanks to all. 🙂

We propose a rigorous framework for Uncertainty Quantification (UQ) in which the UQ objectives and the assumptions/information set are brought to the forefront. This framework, which we call Optimal Uncertainty Quantification (OUQ), is based on the observation that, given a set of assumptions and information about the problem, there exist optimal bounds on uncertainties: these are obtained as values of well-defined optimization problems corresponding to extremizing probabilities of failure, or of deviations, subject to the constraints imposed by the scenarios compatible with the assumptions and information. In particular, this framework does not implicitly impose inappropriate assumptions, nor does it repudiate relevant information. Although OUQ optimization problems are extremely large, we show that under general conditions they have finite-dimensional reductions. As an application, we develop Optimal Concentration Inequalities (OCI) of Hoeffding and McDiarmid type. Surprisingly, these results show that uncertainties in input parameters, which propagate to output uncertainties in the classical sensitivity analysis paradigm, may fail to do so if the transfer functions (or probability distributions) are imperfectly known. We show how, for hierarchical structures, this phenomenon may lead to the non-propagation of uncertainties or information across scales. In addition, a general algorithmic framework is developed for OUQ and is tested on the Caltech surrogate model for hypervelocity impact and on the seismic safety assessment of truss structures, suggesting the feasibility of the framework for important complex systems. The introduction of this paper provides both an overview of the paper and a self-contained mini-tutorial about basic concepts and issues of UQ.