# MA398.1: Vector and Matrix Norms

This first part of MA398 Matrix Analysis and Algorithms is devoted to recalling the key basic facts about vector and matrix norms, dual spaces, eigenvalue problems, and various special types of matrices. A lot of this introductory stuff can be (or even should be) done in the context of Banach and Hilbert spaces, and operators between them. The focus of MA398, though, is on matrix analysis and algorithms, which means that infinite-dimensional issues, while interesting, are basically a distraction. For all intents and purposes, then, the only vector spaces that we care about are

• n-dimensional complex space ℂn (or occasionally n-dimensional real space ℝn), which will usually be equipped with the Euclidean inner product and norm — but sometimes with other ones, or even norms that do not arise from an inner product; and
• the mn-dimensional space ℂm×n of complex matrices with m rows and n columns, representing linear maps from ℂn to ℂm with respect to the usual bases of these spaces, or the corresponding space ℝm×n of real matrices; there are plenty of interesting norms that can be put on these spaces.

A norm (or, more fully, vector norm) on a vector space V over the field K = ℝ or ℂ is a function ‖·‖: V → [0, ∞) such that ‖x‖ = 0 if and only if x = 0 ∈ V; ‖αx‖ = |α|‖x‖ for all α ∈ K and x ∈ V; and ‖x+y‖ ≤ ‖x‖ + ‖y‖ for all x, y ∈ V. We call this a “vector” norm because later on we will call attention to particular norms on spaces of matrices, and call such things matrix norms.

An important fact is that if V is a finite-dimensional vector space over K, such as ℂn, then any two norms ‖·‖ and ‖·‖′ on V are equivalent in the sense that there exist constants 0 < cC < ∞ such that

cx‖ ≤ ‖x‖′ ≤ Cx‖ for all x ∈ V.

An inner product on V is a function 〈·, ·〉: V × VK such that 〈x, x〉 ≥ 0, with equality if and only if x = 0; 〈x, y〉 is the complex conjugate of 〈y, x〉; and 〈x, αy+βz〉 = αx, y〉 + βx, z〉. Every inner product induces a norm by ‖x‖ ≔ 〈x, x1⁄2. Any inner product and its induced norm satisfy the Cauchy–Schwarz inequality

|〈x, y〉| ≤ ‖x‖ ‖y‖ for all x, y ∈ V,

with equality if and only if x and y are linearly dependent.

On V = ℂn, the usual Euclidean inner product is

x, y〉 ≔ x̅1y1 + … + x̅nyn,

and the norm that it induces is the Euclidean norm ‖·‖2, i.e. the square root of the sum of the squares of the absolute values of the components of x; in the real case, of course, the complex conjugate signs (over-bars) in the above definition are redundant. Similarly, for p ≥ 1, the p-normxp of x is the pth root of the sum of the pth powers of the absolute values of the components of x; the ∞-normx of x is the maximum of the absolute values of the components of x.

If V and W are inner product spaces and T: VW is a linear map, then the adjoint of T is the linear map T: WV defined by the relation

Tx, y〉 = 〈x, Ty〉 for all x ∈ V and y ∈ W.

It’s an easy exercise to check that if T has matrix A ∈ ℂm×n, then the matrix of T is the conjugate transpose of A, the n×m matrix whose (i, j)th element is the complex conjugate of the (j, i)th element of A. Therefore, we call the conjugate transpose of A the adjoint of A, and denote it by A ∈ ℂn×m.

A matrix A ∈ ℂm×n is said to be unitary if AA = I; in the real case, A is called orthogonal. A matrix is unitary if and only if its columns are orthonormal with respect to the usual inner product, and hence cannot have more columns than rows. A square unitary matrix A has A−1 = A. Also, for a square matrix, A, A is unitary if and only if A is unitary.

An important property of unitary matrices is that they preserve the Euclidean inner product (and hence the Euclidean norm as well); indeed, many authors use this as the definition of unitarity. A matrix A ∈ ℂm×n is unitary if and only if, for all x, y ∈ ℂn,

Ax, Ay〉 = 〈x, y〉.

A matrix A ∈ ℂn×n is called Hermitian (or self-adjoint) if A = A; in the real case, A is called symmetric.

A (Hermitian) matrix A ∈ ℂn×n is called positive definite (resp. positive semi-definite) if 〈Ax, x〉 > 0 (resp. ≥ 0) whenever 0 ≠ x ∈ ℂn. A positive-definite Hermitian matrix A can be used to define a new inner product and norm by

x, yA ≔ 〈x, Ay
xA ≔ √〈x, xA.

Eigenvalues and Eigenvectors

For a matrix A ∈ ℂn×n, a non-zero x ∈ ℂn and λ ∈ ℂ are called a (right) eigenvector and eigenvalue for A if Ax = λx. Similarly, a non-zero y ∈ ℂn and λ ∈ ℂ are called a (left) eigenvector and eigenvalue for A if yA = λy. As it happens, the right and left eigenvalues of A are always the same, but a given eigenvalue may have different right and left eigenvectors — which are in any case only unique up to multiplication by non-zero scalars, so we often search for unit-norm eigenvectors.

The characteristic polynomial of A is

ρA(z) ≔ det(AzI),

which is a polynomial of degree n in z. A complex number λ is an eigenvalue of A if and only if it is a root of ρA; since, by the Fundamental Theorem of Algebra, ρA has n complex roots (possibly with multiplicity), A has n eigenvalues (possibly with multiplicity).

Actually, there are two notions of multiplicity for eigenvalues. The algebraic multiplicity of λ is the largest integer q such that (zλ)q is a factor of ρA(z). The geometric multiplicity of λ is the dimension of ker(AλI). An eigenvalue is simple if both its algebraic and geometric multiplicities are 1. The algebraic and geometric multiplicities q and r satisfy 1 ≤ qr.

Matrices A, B ∈ ℂn×n are said to be similar if there is an invertible S ∈ ℂn×n (a change of basis matrix) such that B = S−1AS.

If A ∈ ℂn×n has n linearly independent eigenvectors (and not all matrices do!), then eigenvalue equation gives us the neat similarity relation

A = VΛV−1,

where V is the matrix with the (say, unit-norm) eigenvectors as its columns, and Λ is the diagonal matrix that has the corresponding eigenvalues along the diagonal and zeros elsewhere. (For ideas about what to do when A doesn’t have n linearly independent eigenvectors, see the next part, and in particular Schur Diagonalization and Jordan Canonical Form.) Regardless of linear independence issues, similar matrices always have the same eigenvalues with the same algebraic and geometric multiplicities, although obviously not the same eigenvectors.

If A ∈ ℂn×n has n orthogonal eigenvectors, then A is called normal. Normality is equivalent to the condition that AA = AA.

If A ∈ ℂn×n is Hermitian and positive definite, then the eigenvalues of A are all strictly positive, and give a nice estimate for the A-norm in terms of the original one: if the smallest and largest eigenvalues of A are respectively, λmin and λmax, then

λminx‖ ≤ ‖xA ≤ √λmaxx‖ for all x ∈ ℂn.

The spectral radius of A ∈ ℂn×n is

ρ(A) ≔ max { |λ| | λ is an eigenvalue of A }.

Dual Spaces

In general, the dual space of a normed vector space (V, ‖·‖) is the space V′ of all continuous linear functions from V to its field, be that ℝ or ℂ — continuous, that is, with respect to the norm ‖·‖ on V and the absolute value on the field. The dual space is equipped with the operator norm

y‖′ ≔ max { | yx | | x ∈ V, ‖x‖ = 1 }

As a set, the dual of ℂn is always ℂn, but the norm on the dual space depends upon the norm on the primal space. (The dual pairing between the two copies of ℂn is just the standard inner product.)

Theorem. The spaces (ℂn, ‖·‖1) and (ℂn, ‖·‖) are duals of one another. For p, q ∈ (1, ∞) with 1⁄p + 1⁄q = 1, the spaces (ℂn, ‖·‖p) and (ℂn, ‖·‖q) are duals of one another. In particular, (ℂn, ‖·‖2) is self-dual.

Matrix Norms

On the vector space ℂn×n of complex matrices with n rows and columns, we are sometimes interested in norms that interact nicely with matrix multiplication. A norm ‖·‖ on ℂn×n is called a matrix norm if, in addition to the usual norm axioms, it satisfies

AB‖ ≤ ‖A‖ ‖B‖ for all A, B ∈ ℂn×n.

(Note that multiplying a matrix norm by a constant at least 1 yields a new matrix norm, whereas multiplying it by a positive constant less than 1 yields a new norm that is not a matrix norm.)

Given a (vector) norm ‖·‖v on ℂn, the induced matrix norm (also known as the operator norm) is the matrix norm ‖·‖m on ℂn×n defined by

Am ≔ sup { ‖Axv | x ∈ ℂn with ‖xv = 1 }.

Notably, the induced matrix norm always gives the identity matrix norm 1, a property that is not always true for arbitrary matrix norms. Usually, the same symbol is used for an induced matrix norm as for the vector norm that induces it. The matrix norm induced by the infinity norm is the maximum row sum, i.e. the maximum over the rows of A of the 1-norm of each row; the matrix norm induced by the 1-norm is the maximum column sum, i.e. the maximum over the rows of A of the 1-norm of each column.

Matrix norms interact nicely with the spectral radius (the maximum modulus of the eigenvalues):

Theorem. For any matrix norm, any square matrix A ∈ ℂn×n, and any natural k,

ρ(A)kρ(Ak) ≤ ‖Ak‖ ≤ ‖Ak.

Furthermore, if A is normal, then in the induced 2-norm, equality holds throughout the above expression.

For non-square matrices A ∈ ℂm×n, the matrix norm induced by the Euclidean norm satisfies ‖A2 = √ρ(AA). Also, just as the vector 2-norm is preserved by the action of unitary matrices, so too is the matrix 2-norm: for all A ∈ ℂm×n and all unitary U ∈ ℂm×m and V ∈ ℂn×n,

UA2 = ‖A2 = ‖AV2.