Automatic differentiation

Just a quick post to point people in the direction of a nice blog post by Damon McDougall on automatic differentiation, which can be very useful in performing sensitivity analysis and hence uncertainty quantification for numerical models.


Article: Optimal uncertainty quantification for legacy data observations of Lipschitz functions

I’m happy to report that the article “Optimal uncertainty quantification for legacy data observations of Lipschitz functions”, jointly written with Mike McKerns, Dominik Meyer, Florian Theil, Houman Owhadi and Michael Ortiz has now appeared in ESAIM: Mathematical Modelling and Numerical Analysis, vol. 47, no. 6. The preprint version can be found at arXiv:1202.1928.

Continue reading “Article: Optimal uncertainty quantification for legacy data observations of Lipschitz functions”

Article: Optimal Uncertainty Quantification

Almost three years on from the initial submission, the article “Optimal Uncertainty Quantification”, jointly written with Houman Owhadi, Clint Scovel, Mike McKerns and Michael Ortiz, is now in print. It will appear in this year’s second-quarter issue of SIAM Review, and is already accessible online for those with SIAM subscriptions; the preprint version can be found at arXiv:1009.0679.

This paper was a real team effort, with everyone bringing different strengths to the table. Given the length of the review process, I think that our corresponding author Houman Owhadi deserves a medal for his patience (as does Ilse Ipsen, the article’s editor at SIAM Review), but, really, congratulations and thanks to all. 🙂

We propose a rigorous framework for Uncertainty Quantification (UQ) in which the UQ objectives and the assumptions/information set are brought to the forefront. This framework, which we call Optimal Uncertainty Quantification (OUQ), is based on the observation that, given a set of assumptions and information about the problem, there exist optimal bounds on uncertainties: these are obtained as values of well-defined optimization problems corresponding to extremizing probabilities of failure, or of deviations, subject to the constraints imposed by the scenarios compatible with the assumptions and information. In particular, this framework does not implicitly impose inappropriate assumptions, nor does it repudiate relevant information. Although OUQ optimization problems are extremely large, we show that under general conditions they have finite-dimensional reductions. As an application, we develop Optimal Concentration Inequalities (OCI) of Hoeffding and McDiarmid type. Surprisingly, these results show that uncertainties in input parameters, which propagate to output uncertainties in the classical sensitivity analysis paradigm, may fail to do so if the transfer functions (or probability distributions) are imperfectly known. We show how, for hierarchical structures, this phenomenon may lead to the non-propagation of uncertainties or information across scales. In addition, a general algorithmic framework is developed for OUQ and is tested on the Caltech surrogate model for hypervelocity impact and on the seismic safety assessment of truss structures, suggesting the feasibility of the framework for important complex systems. The introduction of this paper provides both an overview of the paper and a self-contained mini-tutorial about basic concepts and issues of UQ.

Radon and non-Radon spaces

One of the theorems that I make frequent use of in my uncertainty quantification (UQ) research concerns probabilities measures on Radon spaces. Without going into details, the UQ methods that I like to work with will work fine if the spaces where your uncertain parameters / functions / models / other gubbins live are Radon spaces, and might fail to work otherwise; therefore, it’s important to know what a Radon space is, and if it’s a serious restriction. (It’s also very useful to know some pithy examples use in response to questions about Radon spaces in talks and poster presentations!)

So… the definition. Consider a topological space (XT) and a probability measure μ: ℬ(T) → [0, 1] defined on the Borel σ-algebra ℬ(T) (i.e. the smallest σ-algebra on X that contains all the open sets, i.e. those sets that are listed in the topology T). The measure μ is said to be inner regular if, for every ℬ(T)-measurable set E ⊆ X,

\mu(E) = \sup \bigl\{ \mu(K) \big| K \subseteq E \mbox{ and } K \mbox{ is compact} \bigr\}.

This is often informally read as saying that the measure of an arbitrary measurable set can be approximated from within by compact sets. The space (XT) is called a pseudo-Radon space (my terminology) if every probability measure μ on ℬ(T) is inner regular, and if the space is also separable and metrizable then it is called a Radon space (more standard in the literature, e.g. the book of Ambrosio, Gigli & Savaré on gradient flows in metric spaces).

So, what spaces are (pseudo-)Radon spaces? It turns out that most of the “nice” spaces that one might want to consider are Radon:

  • any compact subset of n-dimensional Euclidean space ℝn is Radon,
  • indeed, Euclidean space itself is Radon,
  • as is any Polish space (i.e. a separable and completely metrizable space),
  • as indeed is any Suslin space (i.e. a continuous Hausdorff image of a Polish space).

This all seems to suggest that non-Radon spaces must be very weird beasts indeed, perhaps spaces that are topologically very large, so much so that they cannot be the image of a separable space. However, there are, in fact, “small” examples of non-Radon spaces. Since just one counterexample will suffice, it’s enough to find a single example of a non-inner-regular measure on space to show that it is not (pseudo-)Radon.

Continue reading “Radon and non-Radon spaces”

Bayesian probability banned?

This post on Understanding Uncertainty bears the amusing, alarming and somewhat over-stated title “Court of Appeal bans Bayesian probability (and Sherlock Holmes)”. It’s not unusual for people to experience a little intellectual indigestion when first faced with the Bayesian probabilistic paradigm; it is particularly prevalent among people whose point of view is roughly speaking “frequentist”, even though they may have had no formal education in probability in their lives. Personally, I think that the judge’s criticisms, and Understanding Uncertainty‘s criticisms of those criticisms, are somewhat overblown.

However, I will advance one criticism of Bayesian probability as applied to practical situations. The basic axiom of the Bayesian paradigm is that one’s state of knowledge (or uncertainty) can be encapsulated in a unique, well-defined probability measure ℙ (the “prior”) on some sample space. Having done this, the only sensible way to update your probability measure (to produce a “posterior”) in light of new evidence is to condition it using Bayes’ rule — and I have no bone of contention with that theorem. My issue is with specifying a unique prior. If I believe that a coin is perfectly balanced, then I might be willing to commit to the prior ℙ for which

ℙ[heads] = ℙ[tails] = 1/2.

But can I really know that the coin is perfectly fair? Can I reasonably be expected to tell the difference between a perfectly fair coin and one for which

| ℙ[heads] − ℙ[tails] | ≤ 10−100?

(By Hoeffding’s inequality, to be satisfied with confidence level 1 − ε of the truth of this inequality would take of the order of 10100 (− log ε)1/2 / √2 (i.e. lots!) independent flips of the coin.) If not, then any prior distribution ℙ that satisfies this inequality should be a reasonable prior, and all the resulting posteriors are similarly reasonable conclusions. This kind of extended Bayesian point of view goes by the name of the robust Bayesian paradigm. It may seem that the difference between 10−100 and 0 is negligible… but it is not! The results of statistical tests can depend very sensitively on the assumptions made, especially when there is little data available to filter through those assumptions (and, scarily, sometimes even in the limit of infinite data!).

So, yes, I agree that (classical) Bayesian statistics shouldn’t be let near life-or-death cases in a courtroom. But robust Bayesian statistics? I could support that…

Questions and Answers

Two quotations:

“In re mathematica ars proponendi pluris facienda est quam solvendi.”
(In mathematics the art of asking [questions] is more valuable than solving [them].)
— Georg Cantor (1845–1918), Doctoral Thesis, 1867


“The uncreative mind can spot wrong answers, but it takes a very creative mind to spot wrong questions.”
— Antony Jay (1930–)

It’s a common but still dispiriting experience for me to see an excellent tool being applied to the wrong problem. Conclusions are only as sound as (a) the logical reasoning used and, crucially, (b) the validity of the premises, which includes the applicability of the method itself. Both “the right answer to the wrong question” and “the wrong answer to the right question” are wrong, but the former is more devastating because it carries an aura of (false) respectability that can lead one into making bad decisions with great confidence.

Time Evolution of Quantum States

My previous posts on quantum mechanics, and specifically uncertainty principles, were essentially about the quantum state ψ of the system at a fixed time. This post concerns the time evolution of quantum systems.

The key evolution equation here is a Hilbert-space-valued ordinary differential equation that consists of two key ingredients: the differential operator i ℏ ∂t (the energy operator) and a Hamiltonian operator H that describes the energetics of the system. (Sorry, “H” has changed from denoting a Hilbert space to denoting the Hamiltonian operator. So many concepts, so few letters…) The time-dependent Schrödinger equation with Hamiltonian H is

i ℏ ∂tψ = Hψ.

Often, the Hamiltonian is itself a differential operator: a good example is the Schrödinger equation for a single non-relativistic particle of mass m moving in a scalar potential V: ℝn → ℝ:

i ℏ ∂tψ = − (ℏ2 ⁄ 2m) Δψ + V ψ

The Hamiltonian in this case is the familiar “kinetic energy + potential energy” one. V is fairly obviously the potential energy term. The kinetic energy term is the usual “½ × mass × velocity2” but in an interesting form: it is 1⁄2m times the dot product of the momentum operator P := −iℏ∇ with itself, hence the “− (ℏ2 ⁄ 2m) Δ”, where Δ denotes the spatial Laplacian.

Simply put, the time-dependent Schrödinger equation is an absolute pain to solve in all but the simplest settings. Life gets slightly easier if we search for so-called “stationary states”, which, despite the name, are not actually solutions ψ that are constant in time, but rather are eigenstates of the Hamiltonian operator H. (In a sense to be made precise in a moment, these stationary states are constant from the point of view of any observation operator, even though they are themselves non-constant.)

A stationary state is a solution ψ to the time-independent Schrödinger equation, i.e. to the eigenvalue problem

Hψ = Eψ.

Here the eigenvalue E ∈ ℝ is the energy of the quantum state ψ. If H is compact and self-adjoint, then the usual remarks about there being at most countable many eigenvalues apply. In any case, the state ψ with least energy E is called the ground state of the system, and E is called ground state energy or zero-point energy of the system; the other eigenstates are called excited states.

Note that stationary states are not actually constant in time: substituting the definition into the time-dependent Schrödinger equation reveals that a stationary state ψ satisfies the (complex) ordinary differential equation

i ℏ ∂tψ = Eψ,

to which the solution, given ψ at some initial time t0, is

ψ(t) = eiE(tt0)⁄ℏ ψ(t0).

So, stationary states actually evolve by “rotation”, with “angular velocity” E⁄ℏ; but note that the probability density |ψ(t)|2 is independent of time t. Indeed, if A is any linear observation operator, then

Aψ(t) = 〈Aψ(t), ψ(t)〉 = eiE(tt0)⁄ℏ e+iE(tt0)⁄ℏAψ(t0), ψ(t0)〉 = 〈Aψ(t0).

Continue reading “Time Evolution of Quantum States”