Just a quick post to point people in the direction of a nice blog post by Damon McDougall on automatic differentiation, which can be very useful in performing sensitivity analysis and hence uncertainty quantification for numerical models.

# Tag: mathematics

## Article: Optimal uncertainty quantification for legacy data observations of Lipschitz functions

I’m happy to report that the article “Optimal uncertainty quantification for legacy data observations of Lipschitz functions”, jointly written with Mike McKerns, Dominik Meyer, Florian Theil, Houman Owhadi and Michael Ortiz has now appeared in *ESAIM: Mathematical Modelling and Numerical Analysis*, vol. 47, no. 6. The preprint version can be found at arXiv:1202.1928.

## Inverse function theorem for Lipschitz functions

Recently, while wandering the corridors of the Mathematics Department, I overheard one of the graduate students explaining the **Inverse Function Theorem** to some first-year undergraduates. On a non-rigorous level, the Inverse Function Theorem is one of the most accessible (or “obvious”) results in elementary calculus: if the graph of a function *y* = *f*(*x*) has a well-defined and non-zero slope (derivative) *s* at some point *x*_{0}, then

- we ought to be able to write
*x*as a function of*y*, i.e.*x*=*f*^{−1}(*y*) for*y*near*f*(*x*_{0}), - and, moreover, the slope of the inverse function
*f*^{−1}at*f*(*x*_{0}) should be^{1}⁄_{s}.

The “visual proof” of this statement amounts to sketching the graph of *f*, observing that the graph of *f*^{−1} (if the inverse function exists at all) is the graph of *f* with the *x* and *y* axes interchanged, and hence that if the slope of *f* is approximately ^{Δy}⁄_{Δx} then the slope of *f*^{−1} is approximately ^{Δx}⁄_{Δy}, i.e. the reciprocal of that of *f*.

Recall that the derivative of a function *f*: ℝ^{n} → ℝ^{m} is the rectangular *m* × *n* matrix of partial derivatives

whenever all these partial derivatives exist. With this notation, a more careful statement of the Inverse Function Theorem is that if *f*: ℝ^{n} → ℝ^{n} is continuously differentiable in a neighbourhood of *x*_{0} and the square *n* × *n* matrix of partial derivatives D*f*(*x*_{0}) is invertible, then there exist neighbourhoods *U* of *x*_{0} and *V* of *f*(*x*_{0}) and a continuously differentiable function *g*: *V* → ℝ^{n} (called a **local inverse** for *f*) such that

- for every
*u*∈*U*,*g*(*f*(*u*)) =*u*, and - for every
*v*∈*V*,*f*(*g*(*v*)) =*v*.

An interesting question to ask is whether one really needs continuous differentiability of *f*. For example, Rademacher’s theorem says that whenever *f* satisfies a **Lipschitz condition** of the form

for some constant *C* ≥ 0 it follows that *f* is differentiable almost everywhere in ℝ^{n} with derivative having norm at most *C*. Is this sufficient? It turns out, courtesy of a theorem of F. H. Clarke, that the Inverse Function Theorem does hold true for Lipschitz functions provided that one adopts the right generalized interpretation of the derivative of *f*.

The (set-valued) **generalized derivative** D*f*(*x*_{0}) of *f*: ℝ^{n} → ℝ^{m} at *x*_{0} is defined to be the convex hull of the set of all matrices *M* ∈ ℝ^{m×n} that arise as a limit

for some sequence (*x*_{k}) in ℝ^{n} of differentiability points of *f* that converges to *x*_{0}. One can show that, when *f* satisfies a Lipschitz condition in a neighbourhood of *x*_{0}, D*f*(*x*_{0}) is a non-empty, compact, convex subset of ℝ^{m×n}. The generalized derivative D*f*(*x*_{0}) is said to be of **maximal rank** if every *M* ∈ D*f*(*x*_{0}) has maximal rank (i.e. has rank(*M*) = min(*m*, *n*)).

**Theorem.**(Clarke, 1976) If

*f*: ℝ

^{n}→ ℝ

^{n}satisfies a Lipschitz condition in some neighbourhood of

*x*

_{0}and D

*f*(

*x*) ⊆ ℝ

^{n}is of maximal rank, then there exist neighbourhoods

*U*of

*x*

_{0}and

*V*of

*f*(

*x*

_{0}) and a Lipschitz function

*g*:

*V*→ ℝ

^{n}such that

- for every
*u*∈*U*,*g*(*f*(*u*)) =*u*, and - for every
*v*∈*V*,*f*(*g*(*v*)) =*v*.

It’s very important to note the maximal rank condition in Clarke’s Inverse Function Theorem: we need *every* matrix *M* in the generalized derivative to be non-singular. So, for example, the absolute value function on the real line ℝ does *not* satisfy the hypotheses of Clarke’s theorem at *x* = 0, even though it is Lipschitz with Lipschitz constant 1, since its generalized derivative at 0 is

which contains the non-invertible derivative matrix [0]. It is hardly surprising that the Inverse Function Theorem cannot be applied here since the absolute value function is non-injective in any neighbourhood of 0: both +*δ* and −*δ* map to +*δ*. On the other hand, the function *f* defined by

has generalized derivative at 0 given by

which is of maximal rank. The local (in fact, global) Lipschitz inverse of this function *f* is, unsurprisingly,

## Article: Optimal Uncertainty Quantification

Almost three years on from the initial submission, the article “Optimal Uncertainty Quantification”, jointly written with Houman Owhadi, Clint Scovel, Mike McKerns and Michael Ortiz, is now in print. It will appear in this year’s second-quarter issue of *SIAM Review*, and is already accessible online for those with SIAM subscriptions; the preprint version can be found at arXiv:1009.0679.

This paper was a real team effort, with everyone bringing different strengths to the table. Given the length of the review process, I think that our corresponding author Houman Owhadi deserves a medal for his patience (as does Ilse Ipsen, the article’s editor at *SIAM Review*), but, really, congratulations and thanks to all. 🙂

We propose a rigorous framework for Uncertainty Quantification (UQ) in which the UQ objectives and the assumptions/information set are brought to the forefront. This framework, which we call

Optimal Uncertainty Quantification(OUQ), is based on the observation that, given a set of assumptions and information about the problem, there exist optimal bounds on uncertainties: these are obtained as values of well-defined optimization problems corresponding to extremizing probabilities of failure, or of deviations, subject to the constraints imposed by the scenarios compatible with the assumptions and information. In particular, this framework does not implicitly impose inappropriate assumptions, nor does it repudiate relevant information. Although OUQ optimization problems are extremely large, we show that under general conditions they have finite-dimensional reductions. As an application, we developOptimal Concentration Inequalities(OCI) of Hoeffding and McDiarmid type. Surprisingly, these results show that uncertainties in input parameters, which propagate to output uncertainties in the classical sensitivity analysis paradigm, may fail to do so if the transfer functions (or probability distributions) are imperfectly known. We show how, for hierarchical structures, this phenomenon may lead to the non-propagation of uncertainties or information across scales. In addition, a general algorithmic framework is developed for OUQ and is tested on the Caltech surrogate model for hypervelocity impact and on the seismic safety assessment of truss structures, suggesting the feasibility of the framework for important complex systems. The introduction of this paper provides both an overview of the paper and a self-contained mini-tutorial about basic concepts and issues of UQ.

## Radon and non-Radon spaces

One of the theorems that I make frequent use of in my uncertainty quantification (UQ) research concerns probabilities measures on **Radon spaces**. Without going into details, the UQ methods that I like to work with will work fine if the spaces where your uncertain parameters / functions / models / other gubbins live are Radon spaces, and might fail to work otherwise; therefore, it’s important to know what a Radon space is, and if it’s a serious restriction. (It’s also very useful to know some pithy examples use in response to questions about Radon spaces in talks and poster presentations!)

So… the definition. Consider a topological space (*X*, *T*) and a probability measure *μ*: ℬ(*T*) → [0, 1] defined on the Borel σ-algebra ℬ(*T*) (i.e. the smallest σ-algebra on *X* that contains all the open sets, i.e. those sets that are listed in the topology *T*). The measure *μ* is said to be **inner regular** if, for every ℬ(*T*)-measurable set *E* ⊆ *X*,

This is often informally read as saying that the measure of an arbitrary measurable set can be approximated from within by compact sets. The space (*X*, *T*) is called a **pseudo-Radon space** (my terminology) if every probability measure *μ* on ℬ(*T*) is inner regular, and if the space is also separable and metrizable then it is called a **Radon space** (more standard in the literature, e.g. the book of Ambrosio, Gigli & Savaré on gradient flows in metric spaces).

So, what spaces are (pseudo-)Radon spaces? It turns out that most of the “nice” spaces that one might want to consider are Radon:

- any compact subset of
*n*-dimensional Euclidean space ℝ^{n}is Radon, - indeed, Euclidean space itself is Radon,
- as is any Polish space (i.e. a separable and completely metrizable space),
- as indeed is any Suslin space (i.e. a continuous Hausdorff image of a Polish space).

This all seems to suggest that non-Radon spaces must be very weird beasts indeed, perhaps spaces that are topologically very large, so much so that they cannot be the image of a separable space. However, there are, in fact, “small” examples of non-Radon spaces. Since just one counterexample will suffice, it’s enough to find a single example of a non-inner-regular measure on space to show that it is not (pseudo-)Radon.

Continue reading “Radon and non-Radon spaces”

## Three Bernstein inequalities

This post concerns three unrelated inequalities, or families of inequalities, that all go by the name of **Bernstein’s inequality**, after Sergei Natanovich Bernstein (Сергей Натанович Бернштейн). In no particular order, they are:

- a family of inequalities in probability theory that bound the deviations of independent or weakly dependent random variables,
- an inequality bounding the
*L*^{p}norms of band-limited functions, and - an inequality bounding the derivatives of complex polynomials.

1. Deviations of Random Variables

There are quite a few Bernstein inequalities for families of independent or weakly correlated random variables *X*_{1}, …, *X*_{n}. The following inequality is one of the simplest, but gives the general idea. Let *X*_{1}, …, *X*_{n} be independent, but not necessarily identically distributed, real-valued random variables taking values in [−*R*, +*R*]. Then for any *t* > 0,

For those readers who prefer to think in terms of the vector-valued random variable **X** = (*X*_{1}, …, *X*_{n}), this inequality reads

A similar inequality for negative *t* is easily obtained by replacing each random variable *X*_{j} by its negative. There are also many generalizations, some also known as Bernstein inequalities, and others falling under the general heading of concentration of measure.

2. *L*^{p} Norms of Band-Limited Functions

Suppose that *f*: ℝ^{n} → ℂ is band-limited in the sense that its Fourier transform ℱ[*f*]: ℝ^{n} → ℂ is identically zero outside the ball *B*(0, *R*) of radius *R* centred at the origin 0 of ℝ^{n}. Suppose also that 1 ≤ *p* < *q* < ∞. Then there is a constant *C*, independent of *f*, such that

In fact, with a possibly different constant, the same inequality holds with a weak *L*^{p} quasi-norm on the right-hand side:

This inequality is not too difficult to prove using Young’s inequality for convolutions and standard interpolation between *L*^{1,∞} and *L*^{2q,∞}.

3. Derivatives of Polynomials

Suppose that *p*: ℂ → ℂ is a polynomial of degree *n*. Then Bernstein’s inequality for the polynomial *p* states that, on the unit disc

the degree *n* of *p*, multiplied by the maximum absolute volue of *p*, bounds the maximum absolute value of the derivative of *p*:

Similarly, for the *k*^{th} derivative of *p*, it holds that

This Bernstein inequality is useful in the theory of polynomial approximation, since it gives some measure of approximation of derivatives “for free” once the function itself has been approximated.

## A very quick Sobolev embedding

A little while ago, while discussing scaling results in analysis, I had a conversation with a colleague who pointed out to me an elegantly brief proof of the embedding of the **critical Sobolev space** *W*^{1,n}(ℝ^{n}) into the space BMO(ℝ^{n}) of functions of **bounded mean oscillation** in spatial dimension *n*. By definition, BMO consists of those *L*^{1} functions *u* such that the semi-norm

is finite, where the supremum runs over all balls *B* in ℝ^{n} of finite radius, |*B*| denotes the volume of the ball *B*, and *u*_{B} denotes the average of *u* over *B*:

The usual norm on BMO is the sum of the *L*^{1} norm and the BMO semi-norm. For *p* < *n*, the Sobolev space *W*^{1,p}(ℝ^{n}) embeds into a suitable *L*^{q} space; when *p* > *n*, the Sobolev space *W*^{1,p}(ℝ^{n}) embeds into a space of Hölder-continuous functions. The space BMO provides a neat (although not complete — see e.g. this paper) description of what happens in the critical case *p* = *n*, namely that *W*^{1,n}(ℝ^{n}) embeds into BMO(ℝ^{n}); that is, there is a constant *c* depending only on *n* such that

Continue reading “A very quick Sobolev embedding”