The singular value decomposition (SVD) of an *m*×*n* complex matrix *A* is a very useful decomposition of *A* into three factors. In some ways, it can be thought of as an extension of the ideas of eigenvalues and eigenvectors to non-square matrices. Recall that for *A* ∈ ℂ^{n×n}, *λ* ∈ ℂ and 0 ≠ *x* ∈ ℂ^{n} are an **eigenvalue-eigenvector pair** for *A* if *A**x* = *λ**x*. If, in addition, *A* is a normal matrix (i.e. *A**A*^{∗} = *A*^{∗}*A*), then there is a unitary matrix *V* ∈ ℂ^{n×n} and a diagonal matrix Λ ∈ ℂ^{n×n} such that *A* = *V*Λ*V*^{∗}; perhaps unsurprisingly, the (normalized) eigenvectors of *A* are the columns of *V*, and the corresponding eigenvalues are the diagonal entries of Λ.

Can we do anything similar in the case that *A* is a non-normal square matrix, or even a non-square matrix? Remember that a matrix *A* ∈ ℂ^{m×n} represents a linear map from ℂ^{n} to ℂ^{m} with respect to fixed bases of either space. The idea behind the SVD is to separate the roles of the two spaces: we seek scalars *σ* ∈ ℂ (although they will later turn out to be non-negative reals) and non-zero vectors *v* ∈ ℂ^{n} and *u* ∈ ℂ^{m} such that

*A**v* = *σ**u* and *A*^{∗}*u* = *σ**v*

The scalar *σ* is called a **singular value** and *u* and *v* are called **singular vectors**; in some applications, these are more useful than eigenvalues and eigenvectors; note that the above equation is the eigenvalue problem if we insist that *u* = *v* ∈ ℂ^{n}. The idea now is to collect together all the singular values and vectors into appropriate matrices, just as in the eigenvalue/eigenvector case.

A **singular value decomposition** of *A* ∈ ℂ^{m×n} is a factorization

*A* = *U*Σ*V*^{∗}

in which *U* ∈ ℂ^{m×m} and *V* ∈ ℂ^{n×n} are unitary, Σ ∈ ℝ^{m×n} is diagonal, and the diagonal entries of Σ are *σ*_{1} ≥ *σ*_{2} ≥ … ≥ *σ*_{p} ≥ 0 with *p* ≔ min(*m*, *n*). The *σ*_{i} are called the **singular values** of *A*, the columns *u*_{i} of *U* its **left singular vectors**, and the columns *v*_{i} of *V* its **right singular vectors**.

**Theorem.** Every matrix *A* ∈ ℂ^{m×n} has an SVD *A* = *U*Σ*V*^{∗} and the singular values (although not the singular vectors) are uniquely determined. If *A* is real, then the matrices *U* and *V* are also real, and hence orthogonal.

Further properties of the SVD include

- If
*m*>*n*, then the factorization*A*=*U*Σ*V*^{∗}does not depend upon the last*m*−*n*columns of*U*. Therefore, we can instead work with the**reduced SVD***A*=*U*′Σ′*V*^{∗}, where*U*′ consists of the first*n*columns of*U*and Σ′ the first*n*rows of Σ. - The eigenvalues of
*A*^{∗}*A*are*σ*_{i}^{2}and the corresponding (right) eigenvectors are the right singular vectors*v*_{i}. The eigenvalues of*A**A*^{∗}are*σ*_{i}^{2}and*m*−*n*zeros and the corresponding (right) eigenvectors are the left singular vectors*u*_{i}. - If
*A*is real, square and symmetric and*A*=*Q*Λ*Q*^{∗}with Λ = diag(*λ*_{i}) and*q*_{i}the columns of*Q*, then an SVD of*A*is arrived at by setting*U*=*Q*, Σ = |Λ|, and*v*_{i}= sgn(*λ*_{i})*q*_{i}.

The SVD reveals the rank, image and kernel of *A*: if we write the singular values as *σ*_{1} ≥ *σ*_{2} ≥ … ≥ *σ*_{r} > 0, then

rank(*A*) = *r*,

ran(*A*) = span { *u*_{1}, …, *u*_{r} } ⊆ ℂ^{m},

ker(*A*) = span { *v*_{r+1}, …, *v*_{n} } ⊆ ℂ^{n}.

Inversion of non-singular square matrices written in SVD form is easy. If *A* ∈ ℂ^{n×n} is invertible with SVD *A* = *U*Σ*V*^{∗}, then an SVD for *A*^{−1} is *V*Σ^{−1}*U*^{∗}, and the matrix inverse of Σ is easily calculated — it’s just the diagonal matrix whose elements are the reciprocals of the diagonal elements of Σ. It gets better, though: *V*(Σ^{−1})^{T}*U*^{∗} is an SVD for the Moore–Penrose pseudo-inverse *A*^{†} ∈ ℂ^{n×m} of *A* ∈ ℂ^{m×n}. The pseudo-inverse *A*^{†} is of interest because the solution to the **least squares problem**

find *x* ∈ ℂ^{n} to minimize ‖ *A**x* − *b* ‖

is given by *x* = *A*^{†}*b*. In the context of least squares, the pseudo-inverse is usually arrived at by considering the so-called normal equations *A*^{∗}*A**x* = *A*^{∗}*b*, and so *A*^{†} = (*A*^{∗}*A*)^{−1}*A*^{∗} when *A*^{∗}*A* is invertible — which, by the second of the “further properties” above, happens if all the singular values of *A* are non-zero.

The following result, the Minimax Theorem, relates the singular values of *A* to the operator norm of *A* when restricted to subspaces *E* of ℂ^{n}. In particular, the Minimax Theorem enables us to characterize the singular values of *A* without direct reference to the singular vectors. Recall that, if *E* ⊆ ℂ^{n} is a linear subspace,

‖ *A*|_{E} ‖ = max { ‖ *A**x* ‖ | *x* ∈ *E* with ‖*x*‖ = 1 },

and the codimension of *E* is *n* − dim(*E*). (Here the norms on ℂ^{n} and ℂ^{m} are the Euclidean ones.)

**Minimax Theorem.** Let *A* ∈ ℂ^{m×n}. For each *i*,

*σ*_{i} = min { ‖ *A*|_{E} ‖ | *E* ⊆ ℂ^{n} with codim(*E*) = *i* − 1 }

= min { ‖ *A*|_{E} ‖ | *E* ⊆ ℂ^{n} with codim(*E*) ≤ *i* − 1 }.

In particular, ‖*A*‖ = *σ*_{1}(*A*), the largest of its singular values; if *A* ∈ ℂ^{n×n} is invertible, then ‖*A*^{−1}‖ = 1⁄*σ*_{n}(*A*).

Not only is the *σ*_{1}(·) (i.e. the operator norm) a norm on the ℂ^{m×n}, but so is *σ*_{1}(·) + … + *σ*_{k}(·) for every *k*. This norm, the sum of the *k* largest singular values, is known as the **Ky Fan k-norm**. The last of the Ky Fan norms, the sum of all the singular values, is another well-known norm: it is the

**trace norm**or

**nuclear norm**:

‖*A*‖_{nuc} = trace((*A*^{∗}*A*)^{1⁄2}).

(Remember, the eigenvalues of *A*^{∗}*A* are the squares of the singular values of *A*, and are non-negative, so the matrix square root (*A*^{∗}*A*)^{1⁄2} makes sense!)

In fact, the SVD of *A* encodes the singularity structure of *A* in a very detailed way, as the following theorem makes precise. In rough terms, the first *k* components of the SVD of *A* form the best rank-*k* approximation of *A*, and the singular value *σ*_{k+1} measures the quality of that rank-*k* approximation.

**Theorem.** Let *A* ∈ ℂ^{m×n}. Let *A*_{k} denote the product of the first *k* columns of *U*, the upper-left *k*×*k* block of Σ, and the first *k* columns of *V*, i.e.

*A*_{k} = *σ*_{1}*u*_{1}*v*_{1}^{∗} + … + *σ*_{k}*u*_{k}*v*_{k}^{∗}.

Let *M*_{k} denote the set of all *m*×*n* matrices of rank at most *k*. Then

*σ*_{k+1} = ‖ *A* − *A*_{k} ‖ = min_{X∈Mk} ‖ *A* − *X* ‖.

In particular, if *A* ∈ ℂ^{n×n} is non-singular, then *σ*_{n} is the norm distance from *A* to the set *M*_{n−1} of singular matrices, and *A*_{n−1} = *A* − *σ*_{n}*u*_{n}*v*_{n}^{∗} is singular.

The SVD is a very powerful decomposition, and many methods (for, e.g., simultaneous linear equations or least squares problems) that behave poorly when *A* is nearly singular can be improved upon if one uses the SVD, which “understands” how nearly-singular *A* is as being the smallest singular value *σ*_{n}. However, since the SVD reveals eigenvalues — and obtaining eigenvalues of arbitrary matrices is equivalent to finding roots of arbitrary polynomials — it cannot in general be found exactly with a *finite* number of arithmetical operations. Without getting too deep into questions of how to compute an SVD of *A* — something I’ll return to in the notes for MA398 Matrix Analysis and Algorithms — it is reassuring to know that the singular values of a matrix do not change by too much under perturbations of the matrix:

**Perturbation Theorem.** Let *A*, Δ ∈ ℂ^{m×n} and let *σ*_{i}(*A*) and *σ*_{i}(*A*+Δ) be the singular values of *A* and *A*+Δ respectively. Then

| *σ*_{i}(*A*) − *σ*_{i}(*A*+Δ) | ≤ ‖Δ‖.

As a footnote, an SVD for *A* can be calculated via QR factorizations and bidiagonalizations using 2*m**n*^{2} + 11*n*^{3} + Θ(*m**n* + *n*^{2}) operations; details to follow in MA398.

## 1 thought on “Singular Value Decomposition”