The topic of this part of **MA398 Matrix Analysis and Algorithms** is the computational cost of solving numerical problems, as measured by the number of algebraic operations — and, in particular, how that depends upon the size of the problem. For simplicity, we will neglect issues like memory usage, parallel communications costs and so on, and define the **cost** of an algorithm to be the total number of additions, subtractions, multiplications, divisions and square roots performed during one run of the algorithm.

To describe asymptotic costs as the problem size becomes large, we will use the following notation:

*g* = O(*f*) if limsup_{x→∞} ^{g(x)}⁄_{f(x)} <; ∞,

*g* = Ω(*f*) if liminf_{x→∞} ^{g(x)}⁄_{f(x)} >; 0,

*g* = Θ(*f*) if *g* = O(*f*) and *g* = Ω(*f*),

*g* ∼ *f* if lim_{x→∞} ^{g(x)}⁄_{f(x)} = 1.

These notes will *not* attempt to formally define what constitutes “an algorithm”. This can be a tricky point, and is especially important if one wants to rigorously establish non-trivial lower bounds on computational costs: if simply guessing the right answer and returning it in one operation is a valid algorithm, then the lower bounds are trivially 1 in all cases.

Matrix-Vector and Matrix-Matrix Multiplication

The standard algorithm for the inner product of *x*, *y* ∈ ℂ^{n} is as follows:

1) Set *s* = *x*̅_{1}*y*_{1}.

2) For *i* = 2, …, *n*: replace *s* by *s* + *x*̅_{i}*y*_{i}.

3) Return *s*.

**Theorem.** The standard inner-product algorithm on ℂ^{n} has cost Θ(*n*); any algorithm for the inner product has cost Ω(*n*).

The standard algorithm for multiplication of a matrix *A* ∈ ℂ^{n×n} with a vector *x* ∈ ℂ^{n} is simply *n* copies of the standard inner product algorithm, one for each of the *n* rows of *A*.

**Theorem.** The standard algorithm for matrix-vector multiplication has cost Θ(*n*^{2}); any algorithm for it has cost Ω(*n*).

Similarly, the standard algorithm for the multiplication of two matrices *A*, *B* ∈ ℂ^{n×n} is *n*^{2} copies of the standard inner product algorithm: the (*i*, *j*)th element of *A**B* is the inner product of (the transpose of) the *i*th row of *A* with the *j*th column of *B*.

**Theorem.** The standard algorithm for matrix-matrix multiplication has cost Θ(*n*^{3}); any algorithm for it has cost Ω(*n*^{2}).

There are algorithms for matrix-matrix multiplication that have better computational cost than Θ(*n*^{3}). Perhaps the simplest is the **Strassen algorithm**. Let *n* be even, and suppose that we are interested in multiplying *n*×*n* matrices *A* and *B*, and let *D* ≔ *A**B*. Write each of these matrices *X* in terms of (*n*⁄2)×(*n*⁄2) blocks *X*_{11}, *X*_{12}, *X*_{21} and *X*_{22}. Let

*P*_{1} ≔ (*A*_{11}+*A*_{22})(*B*_{11}+*B*_{22}),

*P*_{2} ≔ (*A*_{21}+*A*_{22})*B*_{11},

*P*_{3} ≔ *A*_{11}(*B*_{12}−*B*_{22}),

*P*_{4} ≔ *A*_{22}(*B*_{21}−*B*_{11}),

*P*_{5} ≔ (*A*_{11}+*A*_{12})*B*_{22},

*P*_{6} ≔ (*A*_{21}−*A*_{11})(*B*_{11}+*B*_{12}),

*P*_{7} ≔ (*A*_{12}−*A*_{22})(*B*_{21}+*B*_{22}).

The above definitions may seem to be overly complicated, but note that the are only *seven* multiplications of (*n*⁄2)×(*n*⁄2) matrices involved. In general, this kind of reduction from a larger problem (*n*×*n* matrix multiplication) to multiple instances of a smaller one ((*n*⁄2)×(*n*⁄2) matrix multiplication) is called a **divide and conquer** strategy. The crucial point is that the Strassen algorithm reduces the larger problem to 7 smaller ones, whereas the naïve approach would use 8. The blocks of the *D* ≔ *A**B* can be calculated from the *P* matrices using only sums and differences:

*D*_{11} = *P*_{1}+*P*_{4}−*P*_{5}+*P*_{7},

*D*_{12} = *P*_{3}+*P*_{5},

*D*_{21} = *P*_{2}+*P*_{4},

*D*_{22} = *P*_{1}+*P*_{3}−*P*_{2}+*P*_{6}.

By applying this idea recursively, and if necessary padding matrices that are not power-of-2-dimensional with zeros, we find that the Strassen algorithm for matrix-matrix multiplication has cost Θ(*n*^{log27}).

There are also more efficient methods of matrix inversion than the naïve method of Cramer’s rule:

**Theorem.** If there is a method to multiply *n*×*n* matrices with cost O(*n*^{β}) for some *β* ≥ 2, then it is also possible to invert *n*×*n* matrices with cost O(*n*^{β}).

The idea of the proof is quite simple. Let *A* ∈ ℂ^{n×n} be invertible. By appropriate padding with zero and identity matrices if necessary, we can assume that *n* is a power of 2. Since *A* is invertible, *A*^{−1} exists and equals (*A*^{∗}*A*)^{−1}*A*^{∗}, so *A* can be inverted with cost equal to that of two matrix multiplications and one inversion of the Hermitian positive-definite matrix *A*^{∗}*A*. At this point, we resort to divide-and-conquer approach using Schur complements to show that the inversion of *n*×*n* Hermitian positive-definite matrices can be accomplished in O(*n*^{β}).

Fast Fourier Transform

The previous section considered the costs of general matrix multiplication. If the matrix has some additional structure, then further reductions in cost can be achieved. A good example is furnished by the **discrete Fourier transform**.

For a natural number *n*, let

*ω*_{n} ≔ *e*^{2πi⁄n},

*φ*_{k} ≔ *n*^{−1⁄2} (1, *ω*_{n}^{−k}, *ω*_{n}^{−2k}, …, *ω*_{n}^{−(n−1)k}).

The vectors *φ*_{k} form an orthonormal basis of ℂ^{n}, i.e. 〈*φ*_{k}, *φ*_{m}〉 = *δ*_{km}. This, any *u* ∈ ℂ^{n} can be expanded in this basis as

*u* = *a*_{0}*φ*_{0} + *a*_{1}*φ*_{1} + … + *a*_{n−1}*φ*_{n−1},

where *a*_{k} = 〈*φ*_{k}, *u*〉. Indeed, all the coefficients {*a*_{k}} of *u* in this Fourier basis can be found at one by multiplying the vector *u* in the usual basis by the symmetric matrix *A*_{n} whose (*i*, *j*) entry is *n*^{−1⁄2}*ω*_{n}^{i+j} (here the indices of *A*_{n} run from 0 to *n*−1 instead of 1 to *n* as usual.

**Theorem.** For *n* a power of 2, matrix-vector multiplication by *A*_{n} can be accomplished with cost O(*n* log *n*).

Bidiagonal and Hessenberg Forms

The emphasis of the following two matrix factorization results is on their O(*n*^{3}) cost. Recall that *A* is **upper bidiagonal** if its only non-zero entries *a*_{ij} have either *i* = *j* or *i* = *j*−1; *A* is **upper Hessenberg** if its only non-zero entries *a*_{ij} have *i* ≤ *j*+1

**Theorem.** For every *A* ∈ ℂ^{n×n} there exist unitary *U*, *V* ∈ ℂ^{n×n} and an upper bidiagonal *B* ∈ ℂ^{n×n} such that *A* = *U**B**V*^{∗}, and this factorization can be achieved with cost O(*n*^{3}).

**Theorem.** For every *A* ∈ ℂ^{n×n} there exists a unitary *U* ∈ ℂ^{n×n} and an upper Hessenberg *T* ∈ ℂ^{n×n} such that *A* = *U**T**U*^{∗}, and this factorization can be achieved with cost O(*n*^{3}).