Singular Value Decomposition

The singular value decomposition (SVD) of an m×n complex matrix A is a very useful decomposition of A into three factors. In some ways, it can be thought of as an extension of the ideas of eigenvalues and eigenvectors to non-square matrices. Recall that for A ∈ ℂn×n, λ ∈ ℂ and 0 ≠ x ∈ ℂn are an eigenvalue-eigenvector pair for A if Ax = λx. If, in addition, A is a normal matrix (i.e. AA = AA), then there is a unitary matrix V ∈ ℂn×n and a diagonal matrix Λ ∈ ℂn×n such that A = VΛV; perhaps unsurprisingly, the (normalized) eigenvectors of A are the columns of V, and the corresponding eigenvalues are the diagonal entries of Λ.

Can we do anything similar in the case that A is a non-normal square matrix, or even a non-square matrix? Remember that a matrix A ∈ ℂm×n represents a linear map from ℂn to ℂm with respect to fixed bases of either space. The idea behind the SVD is to separate the roles of the two spaces: we seek scalars σ ∈ ℂ (although they will later turn out to be non-negative reals) and non-zero vectors v ∈ ℂn and u ∈ ℂm such that

Av = σu and Au = σv

The scalar σ is called a singular value and u and v are called singular vectors; in some applications, these are more useful than eigenvalues and eigenvectors; note that the above equation is the eigenvalue problem if we insist that u = v ∈ ℂn. The idea now is to collect together all the singular values and vectors into appropriate matrices, just as in the eigenvalue/eigenvector case.

A singular value decomposition of A ∈ ℂm×n is a factorization

A = UΣV

in which U ∈ ℂm×m and V ∈ ℂn×n are unitary, Σ ∈ ℝm×n is diagonal, and the diagonal entries of Σ are σ1σ2 ≥ … ≥ σp ≥ 0 with p ≔ min(m, n). The σi are called the singular values of A, the columns ui of U its left singular vectors, and the columns vi of V its right singular vectors.

Theorem. Every matrix A ∈ ℂm×n has an SVD A = UΣV and the singular values (although not the singular vectors) are uniquely determined. If A is real, then the matrices U and V are also real, and hence orthogonal.

Further properties of the SVD include

1. If m > n, then the factorization A = UΣV does not depend upon the last mn columns of U. Therefore, we can instead work with the reduced SVD A = U′Σ′V, where U′ consists of the first n columns of U and Σ′ the first n rows of Σ.
2. The eigenvalues of AA are σi2 and the corresponding (right) eigenvectors are the right singular vectors vi. The eigenvalues of AA are σi2 and mn zeros and the corresponding (right) eigenvectors are the left singular vectors ui.
3. If A is real, square and symmetric and A = QΛQ with Λ = diag(λi) and qi the columns of Q, then an SVD of A is arrived at by setting U = Q, Σ = |Λ|, and vi = sgn(λi)qi.

The SVD reveals the rank, image and kernel of A: if we write the singular values as σ1σ2 ≥ … ≥ σr > 0, then

rank(A) = r,
ran(A) = span { u1, …, ur } ⊆ ℂm,
ker(A) = span { vr+1, …, vn } ⊆ ℂn.

Inversion of non-singular square matrices written in SVD form is easy. If A ∈ ℂn×n is invertible with SVD A = UΣV, then an SVD for A−1 is VΣ−1U, and the matrix inverse of Σ is easily calculated — it’s just the diagonal matrix whose elements are the reciprocals of the diagonal elements of Σ. It gets better, though: V−1)TU is an SVD for the Moore–Penrose pseudo-inverse A ∈ ℂn×m of A ∈ ℂm×n. The pseudo-inverse A is of interest because the solution to the least squares problem

find x ∈ ℂn to minimize ‖ Axb

is given by x = Ab. In the context of least squares, the pseudo-inverse is usually arrived at by considering the so-called normal equations AAx = Ab, and so A = (AA)−1A when AA is invertible — which, by the second of the “further properties” above, happens if all the singular values of A are non-zero.

The following result, the Minimax Theorem, relates the singular values of A to the operator norm of A when restricted to subspaces E of ℂn. In particular, the Minimax Theorem enables us to characterize the singular values of A without direct reference to the singular vectors. Recall that, if E ⊆ ℂn is a linear subspace,

A|E ‖ = max { ‖ Ax ‖ | xE with ‖x‖ = 1 },

and the codimension of E is n − dim(E). (Here the norms on ℂn and ℂm are the Euclidean ones.)

Minimax Theorem. Let A ∈ ℂm×n. For each i,

σi = min { ‖ A|E ‖ | E ⊆ ℂn with codim(E) = i − 1 }
= min { ‖ A|E ‖ | E ⊆ ℂn with codim(E) ≤ i − 1 }.

In particular, ‖A‖ = σ1(A), the largest of its singular values; if A ∈ ℂn×n is invertible, then ‖A−1‖ = 1⁄σn(A).

Not only is the σ1(·) (i.e. the operator norm) a norm on the ℂm×n, but so is σ1(·) + … + σk(·) for every k. This norm, the sum of the k largest singular values, is known as the Ky Fan k-norm. The last of the Ky Fan norms, the sum of all the singular values, is another well-known norm: it is the trace norm or nuclear norm:

Anuc = trace((AA)1⁄2).

(Remember, the eigenvalues of AA are the squares of the singular values of A, and are non-negative, so the matrix square root (AA)1⁄2 makes sense!)

In fact, the SVD of A encodes the singularity structure of A in a very detailed way, as the following theorem makes precise. In rough terms, the first k components of the SVD of A form the best rank-k approximation of A, and the singular value σk+1 measures the quality of that rank-k approximation.

Theorem. Let A ∈ ℂm×n. Let Ak denote the product of the first k columns of U, the upper-left k×k block of Σ, and the first k columns of V, i.e.

Ak = σ1u1v1 + … + σkukvk.

Let Mk denote the set of all m×n matrices of rank at most k. Then

σk+1 = ‖ AAk ‖ = minXMkAX ‖.

In particular, if A ∈ ℂn×n is non-singular, then σn is the norm distance from A to the set Mn−1 of singular matrices, and An−1 = Aσnunvn is singular.

The SVD is a very powerful decomposition, and many methods (for, e.g., simultaneous linear equations or least squares problems) that behave poorly when A is nearly singular can be improved upon if one uses the SVD, which “understands” how nearly-singular A is as being the smallest singular value σn. However, since the SVD reveals eigenvalues — and obtaining eigenvalues of arbitrary matrices is equivalent to finding roots of arbitrary polynomials — it cannot in general be found exactly with a finite number of arithmetical operations. Without getting too deep into questions of how to compute an SVD of A — something I’ll return to in the notes for MA398 Matrix Analysis and Algorithms — it is reassuring to know that the singular values of a matrix do not change by too much under perturbations of the matrix:

Perturbation Theorem. Let A, Δ ∈ ℂm×n and let σi(A) and σi(A+Δ) be the singular values of A and A+Δ respectively. Then

| σi(A) − σi(A+Δ) | ≤ ‖Δ‖.

As a footnote, an SVD for A can be calculated via QR factorizations and bidiagonalizations using 2mn2 + 11n3 + Θ(mn + n2) operations; details to follow in MA398.