The topic of this third part of **MA398 Matrix Analysis and Algorithms** is the study of the (usually inevitable) differences between computed and mathematically exact results, particularly in our three prototypical problems of simultaneous linear equations (SLE), least squares (LSQ), and eigenvalue/eigenvector problems (EVP). The first key point is that we split the error analysis into two parts: **conditioning**, which is a property of the problem alone, and **stability**, which is a property of the algorithm used to (approximately) solve the problem.

A useful point of view is that of **backward error analysis**: the computed solution to the problem is viewed as the *exact* solution to a *perturbed* problem. If that perturbation in “problem space” is small then the algorithm is called **backward stable**, and **unstable** otherwise. Once the original and perturbed problems have been identified, we wish to understand the difference between their respective exact solutions — this is the issue of **conditioning**. The three prototypical problems can all be viewed as solving an equation of the form *G*(*y*, *w*) = 0, where *w* denotes the data that define the problem and *y* the solution.

Problem | (SLE) | (LSQ) | (EVP) |
---|---|---|---|

Solution, y |
x |
x |
(x, λ) |

Data, w |
(A, b) |
(A, b) |
A |

G(y, w) |
Ax−b |
x−(A^{∗}A)^{−1}A^{∗}b |
Ax−λx |

In the backward error analysis point of view, the computed solution *ŷ* solves *G*(*ŷ*, *ŵ*) = 0. Conditioning concerns the estimation of Δ*y* ≔ *y*−*ŷ* in terms of Δ*w* ≔ *w*−*ŵ*.

Conditioning of (SLE)

Recall that the (SLE) problem is, given a matrix *A* ∈ ℂ^{n×n} and a vector *b* ∈ ℂ^{m}, to find *x* ∈ ℂ^{n} such that *A**x* = *b*.

The **condition number** in a norm ‖·‖ of a matrix *A* ∈ ℂ^{n×n} is *κ*(*A*) ≔ ‖*A*‖ ‖*A*^{−1}‖ if *A* is invertible, and ∞ otherwise. From now on, we fix a vector norm and its induced matrix norm, in which setting we have *κ*(*A*) ≥ 1 for all *A* ∈ ℂ^{n×n}. As an example, if *A* ∈ ℝ^{n×n} is symmetric and positive definite, then its condition number in the induced 2-norm is the ratio of its largest and smallest eigenvalues. The relevance of the condition number to (SLE) is hopefully made clear by the following result on perturbations of (SLE) with respect to *b* and *A*:

**Conditioning Theorem for (SLE).** (1) Suppose that *A**x* = *b* and *A*(*x*+Δ*x*) = *b*+Δ*b*, with *b* ≠ 0. Then

‖Δ*x*‖ ⁄ ‖*x*‖ ≤ *κ*(*A*) ‖Δ*b*‖ ⁄ ‖*b*‖.

(2) Suppose that *A**x* = *b* and (*A*+Δ*A*)(*x*+Δ*x*) = *b*. If *κ*(*A*) ‖Δ*A*‖ < ‖*A*‖, then

‖Δ*x*‖ ⁄ ‖*x*‖ ≤ ^{κ(A)‖A‖}⁄_{‖A‖−κ(A)‖ΔA‖} ‖Δ*A*‖ ⁄ ‖*A*‖.

Conditioning of (LSQ)

Recall that the (LSQ) problem is, given a matrix *A* ∈ ℂ^{m×n} with *m* ≥ *n* and a vector *b* ∈ ℂ^{m}, to find *x* ∈ ℂ^{n} that minimizes ‖ *A**x* − *b* ‖_{2}.

Furthermore, assume that rank(*A*) = *n*, so that *A* has singular values *σ*_{1} ≥ … ≥ *σ*_{n} > 0. Under this assumption *A*^{∗}*A* is invertible and (LSQ) has a unique solution given by the **normal equations**

*x* = (*A*^{∗}*A*)^{−1}*A*^{∗}*b*.

The matrix *A*^{†} ≔ (*A*^{∗}*A*)^{−1}*A*^{∗} ∈ ℂ^{m×n} is called the **Moore–Penrose pseudo-inverse** of *A*.

Thus, for *A* ∈ ℂ^{m×n} with *m* ≥ *n*, the **condition number** of *A* a norm ‖·‖ is *κ*(*A*) ≔ ‖*A*‖ ‖*A*^{†}‖ if rank(*A*) = *n*, and ∞ otherwise. Since square, invertible matrices have *A*^{†} = *A*^{−1}, this definition of condition number extends the previous one. In the induced 2-norm, the condition number is the ratio of the largest and smallest singular values.

**Conditioning Theorem for LSQ.** Suppose that *x* solves (LSQ) for (*A*, *b*) and that *x*+Δ*x* solves (LSQ) for (*A*, *b*+Δ*b*). Let *η* ≔ ‖*A*‖_{2} ‖*x*‖_{2} ⁄ ‖*A**x*‖_{2} ≥ 1, and let *θ* ∈ [0, *π*⁄2] be defined by cos(*θ*) = ‖*A**x*‖_{2} ⁄ ‖*b*‖_{2}. Then

‖Δ*x*‖_{2} ⁄ ‖*x*‖_{2} ≤ ^{κ(A)}⁄_{η cos(θ)} ‖Δ*b*‖_{2} ⁄ ‖*b*‖_{2}.

The corresponding estimate for perturbations of *A* instead of *b* is

‖Δ*x*‖_{2} ⁄ ‖*x*‖_{2} ≤ ( *κ*(*A*) + ^{κ(A)2 tan(θ)}⁄_{η} ) ‖Δ*A*‖_{2} ⁄ ‖*A*‖_{2}.

Conditioning of (EVP)

Recall that (EVP) problem is, given a matrix *A* ∈ ℂ^{n×n}, to find (*x*, *λ*) ∈ ℂ^{n}×ℂ such that *A**x* = *λ**x* and ‖*x*‖_{2} = 1.

A first step towards understanding the conditioning of (EVP) is the following result on the continuity of eigenvalues:

**Theorem.** Let *λ*: ℂ^{n×n} → ℂ^{n} be the function that assigns to each square matrix the vector of its eigenvalues, ordered by decreasing absolute value and repeated according to algebraic multiplicity. Then *λ* is a continuous function.

Like the previous two problems, there is a notion of condition number associated to (EVP). The **eigenvalue condition number** for a matrix *A* ∈ ℂ^{n×n} and an eigenvalue *λ* ∈ ℂ of *A* is *κ*_{A}(*λ*) ≔ |〈*x*, *y*〉|^{−1} if *x* and *y*are non-orthogonal, and ∞ otherwise, where *x* and *y* are normalized right and left eigenvectors for the eigenvalue *λ*.

**Conditioning Theorem for (EVP).** Let *λ* be a simple eigenvalue of *A* with right and left normalized eigenvectors *x* and *y*. If 〈*y*, *x*〉 ≠ 0, then, for all small enough Δ*A* ∈ ℂ^{n×n}, the matrix *A*+Δ*A* has an eigenvalue *λ*+Δ*λ* with

Δ*λ* = 〈*y*, *x*〉^{−1} ( 〈*y*, Δ*A* *x*〉 + O(‖Δ*A*‖_{2}^{2}) ),

and hence

|Δ*λ*| ≤ *κ*_{A}(*λ*) ( ‖Δ*A*‖_{2} + O(‖Δ*A*‖_{2}^{2}) ).

Stability of Algorithms

Returning now to the general “*G*(*y*, *w*) = 0” framework, suppose that we have a locally unique family of solutions *y* = *g*(*w*) as a function of the input data *w*, and that the algorithm returns *ŷ* ≠ *y* that is the exact solution to a perturbed problem, *ŷ* = *g*(*ŵ*). Δ*y* ≔ *y*−*ŷ* is called the **forward error**, and Δ*w* ≔ *w*−*ŵ* is called the **backward error**; the **relative forward error** is ‖Δ*y*‖ ⁄ ‖*y*‖ and the **relative backward error** is ‖Δ*w*‖ ⁄ ‖*w*‖. So, for example, the Conditioning Theorem for (SLE) showed that the relative forward error is bounded by the condition number times the relative backward error.

At this stage, we introduce a (very simplified) model for computer arithmetic. The most significant simplifying assumption is to ignore the problems of numbers that are not representable because they are either too large (overflow) or too close to zero (underflow). In this simple model, every compact interval [*a*, *b*] of ℝ is approximated by a finite set of numbers **F**, with rounding function fl: [*a*, *b*] → **F**. we assume that there exists *ε*_{m} > 0, called **machine epsilon**, such that

- for all
*x*∈ [*a*,*b*], there exists*ε*∈ (−*ε*_{m}, +*ε*_{m}) such that fl(*x*) =*x*(1 +*ε*); and - for each operation ∗ ∈ {+, −, ×, ⁄} and each
*x*,*y*∈**F**, there exists*δ*∈ (−*ε*_{m}, +*ε*_{m}) such that*x*⊛*y*= (*x*∗*y*) (1 +*δ*), where ⊛ denotes the computed version of ∗.

An algorithm is called **backward stable** if the relative backward error is O(*ε*_{m}). For example, floating-point addition, subtraction, multiplication and division are backward stable. In practice, however, we want to know not just that the backward error is O(*ε*_{m}), but that it is at most *C*_{m,n}*ε*_{m}, where the constant *C*_{m,n} does not depend badly (e.g. exponentially) on the problem dimensions *m* and *n*.