Inverse function theorem for Lipschitz functions

Recently, while wandering the corridors of the Mathematics Department, I overheard one of the graduate students explaining the Inverse Function Theorem to some first-year undergraduates. On a non-rigorous level, the Inverse Function Theorem is one of the most accessible (or “obvious”) results in elementary calculus: if the graph of a function y = f(x) has a well-defined and non-zero slope (derivative) s at some point x0, then

  1. we ought to be able to write x as a function of y, i.e. x = f−1(y) for y near f(x0),
  2. and, moreover, the slope of the inverse function f−1 at f(x0) should be 1s.

The “visual proof” of this statement amounts to sketching the graph of f, observing that the graph of f−1 (if the inverse function exists at all) is the graph of f with the x and y axes interchanged, and hence that if the slope of f is approximately ΔyΔx then the slope of f−1 is approximately ΔxΔy, i.e. the reciprocal of that of f.

Recall that the derivative of a function f: ℝn → ℝm is the rectangular m × n matrix of partial derivatives

\displaystyle \mathrm{D} f(x) = \begin{bmatrix} \dfrac{\partial f_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_{m}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}} \end{bmatrix}

whenever all these partial derivatives exist. With this notation, a more careful statement of the Inverse Function Theorem is that if f: ℝn → ℝn is continuously differentiable in a neighbourhood of x0 and the square n × n matrix of partial derivatives Df(x0) is invertible, then there exist neighbourhoods U of x0 and V of f(x0) and a continuously differentiable function gV → ℝn (called a local inverse for f) such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

An interesting question to ask is whether one really needs continuous differentiability of f. For example, Rademacher’s theorem says that whenever f satisfies a Lipschitz condition of the form

\displaystyle | f(x) - f(y) | \leq C | x - y | \text{ for all } x, y \in \mathbb{R}^{n}

for some constant C ≥ 0 it follows that f is differentiable almost everywhere in ℝn with derivative having norm at most C. Is this sufficient? It turns out, courtesy of a theorem of F. H. Clarke, that the Inverse Function Theorem does hold true for Lipschitz functions provided that one adopts the right generalized interpretation of the derivative of f.

The (set-valued) generalized derivative Df(x0) of f: ℝn → ℝm at x0 is defined to be the convex hull of the set of all matrices M ∈ ℝm×n that arise as a limit

\displaystyle M = \lim_{k \to \infty} \mathrm{D} f(x_{k})

for some sequence (xk) in ℝn of differentiability points of f that converges to x0. One can show that, when f satisfies a Lipschitz condition in a neighbourhood of x0, Df(x0) is a non-empty, compact, convex subset of ℝm×n. The generalized derivative Df(x0) is said to be of maximal rank if every M ∈ Df(x0) has maximal rank (i.e. has rank(M) = min(mn)).

Theorem. (Clarke, 1976) If f: ℝn → ℝn satisfies a Lipschitz condition in some neighbourhood of x0 and Df(x) ⊆ ℝn is of maximal rank, then there exist neighbourhoods U of x0 and V of f(x0) and a Lipschitz function gV → ℝn such that

  • for every u ∈ U, g(f(u)) = u, and
  • for every v ∈ V, f(g(v)) = v.

It’s very important to note the maximal rank condition in Clarke’s Inverse Function Theorem: we need every matrix M in the generalized derivative to be non-singular. So, for example, the absolute value function on the real line ℝ does not satisfy the hypotheses of Clarke’s theorem at x = 0, even though it is Lipschitz with Lipschitz constant 1, since its generalized derivative at 0 is

\displaystyle \mathrm{D} |0| = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| -1 \leq \ell \leq 1 \bigr\},

which contains the non-invertible derivative matrix [0]. It is hardly surprising that the Inverse Function Theorem cannot be applied here since the absolute value function is non-injective in any neighbourhood of 0: both +δ and −δ map to +δ. On the other hand, the function f defined by

\displaystyle f(x) := 2 x + |x| = \begin{cases} x, & \text{if } x \leq 0 \\ 3x, & \text{if } x \geq 0. \end{cases}

has generalized derivative at 0 given by

\displaystyle \mathrm{D} f(0) = \bigl\{ [\ell] \in \mathbb{R}^{1 \times 1} \big| 1 \leq \ell \leq 3 \bigr\},

which is of maximal rank. The local (in fact, global) Lipschitz inverse of this function f is, unsurprisingly,

\displaystyle f^{-1}(y) := \begin{cases} y, & \text{if } y \leq 0 \\ y/3, & \text{if } y \geq 0. \end{cases}


Null vectors and spinors

My recent reading on the topic of spin and angular momentum in quantum mechanics led me to the concept of a spinor. It turns out that spinors are fearsomely nasty objects to wrap one’s head around in full generality, requiring Clifford algebras and other ingredients, although the three-dimensional case is quite accessible and is described below. The (slightly unhelpful) heuristic is that spinors behave like vectors except that they change sign under rotation through an angle of 2π — a somewhat confusing property that will be made clearer later.

First, some general notions: let V be a vector space over a field K, equipped with a bilinear map b: V × VK and hence a quadratic form q: VK given by q(v) ≔ b(v, v) for all v ∈ V. A vector vV is called a null vector or isotropic if q(v) = 0. Recall that the standard Euclidean bilinear form on ℝn is

\displaystyle b \bigl( (x_{1}, \dots, x_{n}) , (x'_{1}, \dots, x'_{n}) \bigr) \equiv x \cdot x' := \sum_{j = 1}^{n} x_{j} x'_{j} .

This bilinear form has no non-trivial null vectors (i.e. the only null vector is the zero vector), but two close relatives of the Euclidean bilinear form do have interesting null vectors.

[Somehow, WordPress deleted large chunks of this post. Apologies! If the text below differs from what you saw earlier, then blame the post-deletion restoration effort.]

Continue reading “Null vectors and spinors”

Annihilation, creation, and ladder operators

These are some notes, mostly for my own benefit, on annihilation, creation, and ladder operators in quantum mechanics, with a few remarks towards the end on angular momentum, spin and Clebsch–Gordan coefficients.

First, the abstract definition: if T, LV → V are linear operators on a vector space V over a field K, then L is said to be a ladder operator for T if there is a scalar cK such that the commutator of T and L satisfies

\displaystyle [T, L] := TL - LT = cL.

The operator L is called a raising operator for T if c is real and positive, and a lowering operator for T if c is real and negative.

The motivation behind this definition is that if (λv) ∈ K × V is an eigenpair for T (i.e. Tv = λv), then a quick calculation reveals that (λ + cLv) is an eigenpair for T:

\displaystyle T(Lv) = (TL)v = (LT - [T,L])v = LTv + cLv = (\lambda + c) (L v).

Ladder operators come up in quantum mechanics because many of the elementary operations on quantum systems act as ladder operators and increase or decrease the eigenvalues of other operators. Those eigenvalues often encode important information about the system, and the increments and decrements provided by the ladder operators often come in discrete, rather than continuous, values. Annihilation and creation operators are a prime example of this phenomenon.

Continue reading “Annihilation, creation, and ladder operators”

Interpolation and fractional differentiability revisited

In this earlier post on interpolation spaces, part of the motivation for studying interpolation spaces was the search for a reasonable space of functions with a non-integer order of differentiability 0 < α < 1. In the case of strong (classical) derivatives, a suitable such space was the vector space Cα(K) of α-Hölder functions on a compact set K ⊆ ℝn with interior, i.e. the set of functions uK → ℝ for which the norm

\displaystyle \| u \|_{C^{\alpha}(K)} := \| u \|_{\infty} + \sup_{\substack{ x, y \in K \\ x \neq y }}  \frac{| u(x) - u(y) |}{| x - y |^{\alpha}}

is finite. In the case of weak derivatives, a suitable such space was the vector space Wα,p(K) of functions u for which the norm

\displaystyle \| u \|_{W^{\alpha, p}(K)} := \left( \| u \|_{L^{p}(K)}^{p} + \iint_{K} \frac{| u(x) - u(y) |^{p}}{| x - y |^{\alpha p + n}} \, \mathrm{d}x \mathrm{d}y \right)^{1/p}

is finite. These spaces are all Banach spaces, and interpolate in the sense of real K-interpolation between the spaces C0(K) of continuous functions and C1(K) of continuously differentiable functions (respectively the Lebesgue space Lp(K) and the Sobolev space W1,p(K). This post grew out of my noticing one simple omission in the previous post, now corrected: for p = 2, the spaces Wα,2 are Hilbert spaces under the inner product

\displaystyle (u, v)_{W^{\alpha, 2}(K)} := \int_{K} u(x) v(x) \, \mathrm{d} x + \iint_{K} \frac{( u(x) - u(y)) (v(x) - v(y))}{| x - y |^{2 \alpha + n}} \, \mathrm{d}x \mathrm{d}y .

On realizing this omission, I started to think more deeply about other notions of fractional differentiability. In particular, I wondered how the above Wα,p spaces are related to other fractional-order Sobolev spaces defined using Fourier transforms. So, the rest of this post is devoted to surveying the two main methods of constructing fractional-order Sobolev spaces and the relationships between them.

For neatly self-contained proofs of the assertions in this post, I recommend this set of notes by Eleonora Di Nezza, Giampiero Palatucci and Enrico Valdinoci.

Continue reading “Interpolation and fractional differentiability revisited”

A floating (and open) problem

There are so many interesting books out there in the world, but among the most interesting are those contributed to by dozens, hundreds or even thousands of people bound by a common interest. A particular example of the species is a book of mathematical problems, a usually hefty tome left in a place liable to be frequented by mathematicians who are wise enough to know that what they don’t know dwarfs what they do know, and are inclined to inscribe in it for posterity some of the open problems that are vexing them — and offering for their solution the occasional alcoholic or even more exotic reward.

I’ve personally leafed through the book of this type maintained by the Mathematisches Forschungsinstitut Oberwolfach, but even this venerable institution is following in mightier footsteps: the Scottish Café of Lwów — frequented in the 1930s and 1940s by titans such as Stefan Banach, Stanisław Mazur, Hugo Steinhaus, Stanisław Ulam and many others — and its Scottish Book. Problem 19, posed by Ulam, is a perfect example of the kind of simply-posed yet very thorny question that takes one quite by surprise:

Is every solid of uniform density that will float in water in every position a sphere?

Continue reading “A floating (and open) problem”

Supervising (postgraduate) research

I recently attended a workshop on supervising research students, focussing on PhD students in the sciences. Like with all such workshops, a lot of what was said was “obvious” (for some value of “obvious”), but there were also a few interesting points.

The first “obvious” point was one that is always worth re-stating, the one of selection bias: the workshop necessarily concentrates on the supervision of problematic students. Also, as tempting as it is to spend time talking with the students who are working well and producing interesting results, it’s important to “force” one’s self to spend time on the less pleasant task of working with the problematic students who ipso facto don’t have such fascinating results to report.

Continue reading “Supervising (postgraduate) research”

Interpolation inequalities, interpolation spaces and fractional differentiability

I have recently developed an interest in interpolation inequalities and the formal structure of interpolation spaces. Interpolation inequalities arise in mathematics when one controls the norm of a function u in one norm by the product of two other norms of the same function (or closely related functions like the derivatives of u). A classic example is the following interpolation inequality for u: ℝn → ℝ:

\displaystyle \| u \|_{L^{r}} \leq \| u \|_{L^{p}}^{\alpha} \| u \|_{L^{q}}^{1 - \alpha},

where the Lr norm is, as usual,

\displaystyle \| u \|_{L^{r}} := \left( \int_{\mathbb{R}^{n}} |u(x)|^{r} \, \mathrm{d} x \right)^{1/r}

for finite r, and

\displaystyle \| u \|_{L^{\infty}} := \inf \bigl\{ B > 0 \,\big|\, \mu \{ x \mid | u(x) | > B \} \bigr\}

and the exponent α satisfies

\displaystyle \frac{1}{r} = \frac{\alpha}{p} + \frac{1 - \alpha}{q}.

So, for example, the space L2 is “between” the spaces L1 and L3, with the “betweenness” quantified by the exponent α = ¼ and the estimate

\displaystyle \| u \|_{L^{2}} \leq \| u \|_{L^{1}}^{1/4} \| u \|_{L^{3}}^{3/4}.

A natural question to ask is whether we could in fact characterize L2 as being precisely the space of all u that are “between” the spaces L1 and L3 in the sense that

\displaystyle \| u \|_{L^{1}}^{1/4} \| u \|_{L^{3}}^{3/4} < \infty,

or at least remains bounded along some approximating sequence un → u. More exotically, could we construct a space of functions that are “half-differentiable” by finding a space somewhere between the space C0 of continuous functions and the space C1 of continuously differentiable functions? A natural candidate for such a space is the space of Hölder continuous functions, but this is little more than an educated guess — can we put it on a sounder footing?

The brief notes that follow owe a lot to some more detailed notes by Alessandra Lunardi.

Continue reading “Interpolation inequalities, interpolation spaces and fractional differentiability”