Automatic differentiation

Just a quick post to point people in the direction of a nice blog post by Damon McDougall on automatic differentiation, which can be very useful in performing sensitivity analysis and hence uncertainty quantification for numerical models.


Article: Optimal uncertainty quantification for legacy data observations of Lipschitz functions

I’m happy to report that the article “Optimal uncertainty quantification for legacy data observations of Lipschitz functions”, jointly written with Mike McKerns, Dominik Meyer, Florian Theil, Houman Owhadi and Michael Ortiz has now appeared in ESAIM: Mathematical Modelling and Numerical Analysis, vol. 47, no. 6. The preprint version can be found at arXiv:1202.1928.

Continue reading “Article: Optimal uncertainty quantification for legacy data observations of Lipschitz functions”

Angular momentum

Apologies for being so quiet lately. I thought that I’d share this little blog post examining sayabiki (the pulling back of the scabbard with the left hand to draw the Japanese katana with the right hand) from the physicist’s perspective of conservation of angular momentum:

Sayabiki and Angular Momentum

This is an interesting point, although I think that it is more relevant to, say, the dynamics of a throw in Aikidō than drawing the sword in Iaidō. While I don’t doubt that using sayabiki to exploit the conservation of angular momentum does allow a more powerful draw, I think that the primary reason for sayabiki is simple mechanics: if one does not retract the scabbard, then the sword will not come cleanly out of the scabbard mouth — if the sword comes out at all, then its cutting tip will damage the scabbard and indeed the Iaidōka’s hand!

Inverse function theorem for Lipschitz functions

Recently, while wandering the corridors of the Mathematics Department, I overheard one of the graduate students explaining the Inverse Function Theorem to some first-year undergraduates. On a non-rigorous level, the Inverse Function Theorem is one of the most accessible (or “obvious”) results in elementary calculus: if the graph of a function y = f(x) has a well-defined and non-zero slope (derivative) s at some point x0, then

  1. we ought to be able to write x as a function of y, i.e. x = f−1(y) for y near f(x0),
  2. and, moreover, the slope of the inverse function f−1 at f(x0) should be 1s.

The “visual proof” of this statement amounts to sketching the graph of f, observing that the graph of f−1 (if the inverse function exists at all) is the graph of f with the x and y axes interchanged, and hence that if the slope of f is approximately ΔyΔx then the slope of f−1 is approximately ΔxΔy, i.e. the reciprocal of that of f.

Recall that the derivative of a function f: ℝn → ℝm is the rectangular m × n matrix of partial derivatives

\displaystyle \mathrm{D} f(x) = \begin{bmatrix} \dfrac{\partial f_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_{m}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}} \end{bmatrix}

whenever all these partial derivatives exist. With this notation, a more careful statement of the Inverse Function Theorem is that if f: ℝn → ℝn is continuously differentiable in a neighbourhood of x0 and the square n × n matrix of partial derivatives Df(x0) is invertible, then there exist neighbourhoods U of x0 and V of f(x0) and a continuously differentiable function gV → ℝn (called a local inverse for f) such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

An interesting question to ask is whether one really needs continuous differentiability of f. For example, Rademacher’s theorem says that whenever f satisfies a Lipschitz condition of the form

\displaystyle | f(x) - f(y) | \leq C | x - y | \text{ for all } x, y \in \mathbb{R}^{n}

for some constant C ≥ 0 it follows that f is differentiable almost everywhere in ℝn with derivative having norm at most C. Is this sufficient? It turns out, courtesy of a theorem of F. H. Clarke, that the Inverse Function Theorem does hold true for Lipschitz functions provided that one adopts the right generalized interpretation of the derivative of f.

The (set-valued) generalized derivative Df(x0) of f: ℝn → ℝm at x0 is defined to be the convex hull of the set of all matrices M ∈ ℝm×n that arise as a limit

\displaystyle M = \lim_{k \to \infty} \mathrm{D} f(x_{k})

for some sequence (xk) in ℝn of differentiability points of f that converges to x0. One can show that, when f satisfies a Lipschitz condition in a neighbourhood of x0, Df(x0) is a non-empty, compact, convex subset of ℝm×n. The generalized derivative Df(x0) is said to be of maximal rank if every M ∈ Df(x0) has maximal rank (i.e. has rank(M) = min(mn)).

Theorem. (Clarke, 1976) If f: ℝn → ℝn satisfies a Lipschitz condition in some neighbourhood of x0 and Df(x) ⊆ ℝn is of maximal rank, then there exist neighbourhoods U of x0 and V of f(x0) and a Lipschitz function gV → ℝn such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

It’s very important to note the maximal rank condition in Clarke’s Inverse Function Theorem: we need every matrix M in the generalized derivative to be non-singular. So, for example, the absolute value function on the real line ℝ does not satisfy the hypotheses of Clarke’s theorem at x = 0, even though it is Lipschitz with Lipschitz constant 1, since its generalized derivative at 0 is

\displaystyle \mathrm{D} |0| = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| -1 \leq \ell \leq 1 \bigr\},

which contains the non-invertible derivative matrix [0]. It is hardly surprising that the Inverse Function Theorem cannot be applied here since the absolute value function is non-injective in any neighbourhood of 0: both +δ and −δ map to +δ. On the other hand, the function f defined by

\displaystyle f(x) := 2 x + |x| = \begin{cases} x, & \text{if } x \leq 0 \\ 3x, & \text{if } x \geq 0. \end{cases}

has generalized derivative at 0 given by

\displaystyle \mathrm{D} f(0) = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| 1 \leq \ell \leq 3 \bigr\},

which is of maximal rank. The local (in fact, global) Lipschitz inverse of this function f is, unsurprisingly,

\displaystyle f^{-1}(y) := \begin{cases} y, & \text{if } y \leq 0 \\ y/3, & \text{if } y \geq 0. \end{cases}

Article: Optimal Uncertainty Quantification

Almost three years on from the initial submission, the article “Optimal Uncertainty Quantification”, jointly written with Houman Owhadi, Clint Scovel, Mike McKerns and Michael Ortiz, is now in print. It will appear in this year’s second-quarter issue of SIAM Review, and is already accessible online for those with SIAM subscriptions; the preprint version can be found at arXiv:1009.0679.

This paper was a real team effort, with everyone bringing different strengths to the table. Given the length of the review process, I think that our corresponding author Houman Owhadi deserves a medal for his patience (as does Ilse Ipsen, the article’s editor at SIAM Review), but, really, congratulations and thanks to all. 🙂

We propose a rigorous framework for Uncertainty Quantification (UQ) in which the UQ objectives and the assumptions/information set are brought to the forefront. This framework, which we call Optimal Uncertainty Quantification (OUQ), is based on the observation that, given a set of assumptions and information about the problem, there exist optimal bounds on uncertainties: these are obtained as values of well-defined optimization problems corresponding to extremizing probabilities of failure, or of deviations, subject to the constraints imposed by the scenarios compatible with the assumptions and information. In particular, this framework does not implicitly impose inappropriate assumptions, nor does it repudiate relevant information. Although OUQ optimization problems are extremely large, we show that under general conditions they have finite-dimensional reductions. As an application, we develop Optimal Concentration Inequalities (OCI) of Hoeffding and McDiarmid type. Surprisingly, these results show that uncertainties in input parameters, which propagate to output uncertainties in the classical sensitivity analysis paradigm, may fail to do so if the transfer functions (or probability distributions) are imperfectly known. We show how, for hierarchical structures, this phenomenon may lead to the non-propagation of uncertainties or information across scales. In addition, a general algorithmic framework is developed for OUQ and is tested on the Caltech surrogate model for hypervelocity impact and on the seismic safety assessment of truss structures, suggesting the feasibility of the framework for important complex systems. The introduction of this paper provides both an overview of the paper and a self-contained mini-tutorial about basic concepts and issues of UQ.


In view of the week’s events in Boston, and various pronouncements about good and evil at war on our streets, I can do little better than to quote J. Michael Straczynski, writing after 9/11:

What DO we tell the children?
Do we tell them evil is a foreign face?
No. The evil is the thought behind the face, and it can look just like yours.
Do we tell them evil is tangible, with defined borders and names and geometries and destinies?
No. They will have nightmares enough.

Perhaps we tell them that we are sorry.
Sorry that we were not able to deliver unto them the world we wished them to have.
That our eagerness to shout is not the equal of our willingness to listen.
That the burdens of distant people are the responsibility of all men and women of conscience, or their burdens will one day become our tragedy.

Or perhaps we simply tell them that we love them, and that we will protect them. That we would give our lives for theirs and do it gladly, so great is the burden of our love.
In a universe of Gameboys and VCRs, it is, perhaps, an insubstantial gift. But it is the only one that will wash away the tears and knit the wounds and make the world a sane place to live in.