Linear Transformations and Matrices


 

1. Linear transformations in vector spaces

A linear transformation in a vector space is a correspondence which assigns to every vector in another vector in in such a way that , for any two vectors and and any two complex numbers and .

If is any linear transformation and any complex number, we define a linear transformation by . For any two linear transformations and we define their sum by and their product by . Two special linear transformation of interest are , defined by for all , and , defined by for all . If the correspondence between and happens to be one to one, in other words if every vector can be written in the form in one and only one way, we may define a transformation by . It is easy to verify that is a linear transformation.

(The transformations we call linear are sometimes called homogeneous linear transformations: i.e., they have the property that .)

We observe, without proof, that the following formulas are valid for linear transformations: the proofs of all of these facts are immediate from the definitions.

The associative law enables us to define for every positive integer the transformation by the recursive definition , . Although in general the multiplication of two transformations is not commutative, and in fact much of the difficulty and interest in the theory is due to this fact, for powers of one transformation we do have . If we make the convention that and if we define, in case exists, , for any positive integer , by then the calculus of powers of a single linear transformation is exactly the same as in ordinary arithmetic. In accordance with this comment and the properties of addition and scalar multiplication, for every polynomial we may write as an abbreviation for the linear transformation . These ideas will be very useful to us later.

We make also some more comments about , the inverse of . We observe first that a necessary and sufficient condition that has an inverse, (in other words that be a one to one transformation), is that imply . The necessity of this condition is obvious: we proceed to the proof of sufficiency. Since implies , it follows that , so that is one to one as far as it goes. It remains to prove only that every vector has the form . For this purpose let be any linearly independent vectors in : we claim that are also linearly independent. For if we had , then whence and the linear independence of the implies that for all . Hence the set of all vectors of the form , which is clearly a linear subspace, contains linearly independent vectors and is therefore -dimensional. It follows that it must coincide with .

We have already stated that . We claim now that these equations are characteristic of : in other words if any linear transformation exists for which then . For

Similarly we could prove that if there is a for which then . It is an immediate consequence of this result that and , and since , that .

2. The dual of a transformation

Let be a linear transformation in an inner product space and consider the expression . Since is for each fixed a linear function of . Hence, by the theorem of (I.15), there exists a uniquely determined vector, say , for which for all . We denote the correspondence which assigns to every vector the vector as just defined by : . We prove that is a linear transformation. For if and , where and , then by multiplying the two equations involving inner products by and respectively and adding, we obtain so that (using the uniqueness statement in (I.15)) as was to be proved.

The process just described associates with every linear transformation another linear transformation which we shall denote by and call the dual of (This terminology is justified by the well known geometric language of duality: roughly speaking when is applied in the space of vectors, is applied in the dual space of hyperplanes.) The dual of is uniquely characterized by the fundamental relation . If we denote the dual of the dual of by then the relation

implies (interchanging the roles of and and removing the conjugates) that for all and we have , so that .

The relation of the dual to the previously introduced operations (among linear transformations) of addition, multiplication, and scalar multiplication is completely described by the following identities: The proofs of these identities will be found in the equations:

We observe that and , and that in case has an inverse then does also and . The latter fact follows from the identity

3. Matrices associated with linear transformations

Let be an orthogonal basis in the inner product space of dimension , and let be a linear transformation on . Since for every , , is a vector in it may, in virtue of (I.11.4), be written (uniquely) as a linear combination of the , say . The set of indexed complex numbers is a matrix; a matrix is usually written in the form of a square array,

We shall consistently use the following notation. Capital Latin letters denote (as before) linear transformations, the corresponding lower case letters with double subscripts will be the elements of the corresponding matrix, and the capital letters in square brackets will stand for the matrix itself. When a linear transformation is distinguished by symbols, as or , the corresponding matrix elements and matrices will be denoted by or and or respectively. Two matrices and are equal if for every and .

With the aid of a fixed orthogonal basis we have made correspond a matrix to every linear transformation : the correspondence is described by the relations . We assert now that this correspondence is one to one. For let be a linear transformation and let be any vector. Then is a linear combination of the vectors of the basis, say , and the linearity of implies that

so that the vector whose -th coordinate in the coordinate system is becomes, upon application of the transformation , the vector whose -th coordinate in the same coordinate system is . Conversely if is any matrix we may define a transformation by the formula

it is easy to verify that is a linear transformation whose corresponding matrix is precisely . We emphasize the fundamental fact that this one to one correspondence was set up by means of a particular coordinate system and that as we pass from one coordinate system to another the same linear transformation may correspond to several matrices and one matrix may be the correspondent of several linear transformations. In fact the relation between the different matrices that may correspond to one linear transformation in various coordinate systems will be the object of study in much of what follows.

4. Isomorphism of matrices and transformations

Although the matrix associated with a linear transformation depends on a particular coordinate system, several properties of the correspondence between transformations and matrices are the same in all coordinate systems. In this section we study some of these properties. Throughout we assume that is an arbitrary but fixed coordinate system and that the matrices we discuss are related to linear transformations by means of this system.

If and is any complex number, then , and , so that the matrix , corresponding to , has the elements .

Similarly so that the matrix has elements .

Also so that .

Finally if then we have, (using Parseval’s identity, (I.11.5)),

where .

We observe that for , and for , .

A simple way of summing up the results of this section is the following. For a matrix , (not for a linear transformation!) we define by ; and we define the conjugate transpose by . If moreover for any two matrices and we define their sum by , and their product by , then our result is that the correspondence established by means of an arbitrary coordinate system between the set of all linear transformations of and the set of square matrices of rows and columns is an isomorphism: i.e., it preserves addition, multiplication, and scalar multiplication, and makes the dual of a transformation correspond to the conjugate transpose of its matrix.

We return now to our general study of linear transformations without reference to any orthogonal basis. Periodically we shall stop to interpret our results in terms of the language and notation of matrices.

5. The forms and

With every linear transformation We may associate the expressions and . We may consider these as numerical valued functions, defined by means of , of a single vector or of a pair of vectors and , respectively. Properties of these functions are intimately connected with properties of the linear transformation : we shall study this subject in more detail later. At present we observe only two simple facts. First: if for all and then . For in particular we may choose and obtain , whence for all . Second: if for all then . The proof of this statement is less trivial: It depends on a standard technique called polarization. If for all then for every pair of vectors and and every complex number we have so that since the first two terms of the right member vanish we obtain Choosing first and then we obtain the equations Dividing the second equation by and then forming the arithmetic mean of the two shows that for all and , so that, by our first result, .

We observe that our second result is not true if we restrict ourselves to real spaces: the proof of course breaks down at our choice, . For example a rotation in the plane clearly has the property that it sends every vector into a vector which is orthogonal to .

6. Hermitian transformations

In many fundamental respects the algebraic system of all linear transformations on an inner product space resembles the set of all complex numbers. In both systems the notions of addition, multiplication, and are defined and have similar properties; and, moreover, in both systems there is defined a conjugation ( and , respectively): i.e., an involutory conjugate automorphism of the system on itself. We shall use this analogy as a heuristic principle and we shall attempt to carry over to linear transformations some well known concepts of the complex domain.

When is a complex number real? Clearly a necessary and sufficient condition for the reality of is that . We might accordingly (remembering that the analog of the complex conjugate for linear transformations is the dual) define a linear transformation to be real if . Actually linear transformations for which are called Hermitian. (Since the dual of a transformation is sometimes called its adjoint, Hermitian transformations are sometimes called self-adjoint. Other terms are symmetric and Hermitian symmetric. The reason for the latter terminology will appear later when we examine the corresponding matricial concept.) The following theorem shows that Hermitian transformations are tied up with reality in more ways than through the formal analogy that suggested their definition.

Theorem 1. A necessary and sufficient condition that be Hermitian is that be real for all .

Proof. For if , then so that is equal to its own conjugate and is therefore real. Conversely, if is always real then so that for all , whence, by (II.5), .

We remark that this theorem is also false in real spaces. For in the first place its proof depends on a lemma that is valid only in complex inner product spaces, and in the second place in a real space the reality of (in fact of ) is a condition automatically satisfied by all , whereas the condition , or equivalently , need not be satisfied. It is not difficult to verify that the example given in (II.5) is a counter example to this theorem in real spaces.

7. Positive definite transformations

When is a complex number non-negative? Two equally natural necessary and sufficient conditions are that may be written in the form for some real or that may be written in the form for an arbitrary . Remembering also the fact that the Hermitian property of a transformation can be described in terms of the function we may consider any one of three conditions and attempt to use them as the definition of a transformation being non-negative.

7.1. , Hermitian,

7.2. , arbitrary,

7.3. for all .

Before deciding on which one of these three conditions to use as definition we prove that the following implication relations hold:

For if with a Hermitian , i.e., with a for which , then . And if then

It is actually true that (7.3) implies (7.1), so that the three conditions are equivalent, but we shall not be able to prove this till much later. We adopt as our definition the third condition: a linear transformation is non-negative, in symbols , if for all , .

(Non-negative transformations are usually called positive semidefinite. If and implies we call positive definite.) It follows from the theorem in (II.6) that implies that is Hermitian. The transformations and are non-negative.

8. Algebraic combinations of Hermitian and definite transformation

We discuss the relation of the two concepts just defined to our preceding notions. If and are both Hermitian then, since , is also. If is Hermitian then is Hermitian if and only if is real. This follows immediately from the fact that . If and are both Hermitian then is Hermitian if and only if . This is a consequence of the relations . Since , is Hermitian if and only if is. Similarly the relation implies that is Hermitian along with . A converse to the last statement is that if has an inverse and is Hermitian then so is . (We remark that both the direct and converse statements are valid for not necessarily Hermitian .) For if has an inverse, every vector may be written in the form , and since , the reality of the last term for all implies the reality of the first for all .

The formulas used in the proofs in the preceding paragraph prove also that is non-negative if both and are, that if then is non-negative if and only if is, that is non-negative along with , and that, conversely, if has an inverse then the non-negativeness of implies that of . It is true also that if and are non-negative and commutative then is non-negative, but we shall have to postpone the proof of this statement until later. Neither this statement nor the one concerning Hermitian transformations is true without the restriction of commutativity.

9. Matricial characterizations of Hermitian transformations

If is any coordinate system in then a necessary and sufficient condition that a linear transformation be Hermitian is that the matrix corresponding to in this coordinate system satisfy the equation , or, in other words, that we have for all and . This explains, incidentally, why Hermitian transformations are sometimes called Hermitian symmetric. A similar matricial characterization of non-negative matrices is possible, but the conditions on the are much more complicated and since we shall not have any occasion to use them we do not enter on this subject here. We shall, however, refer to matrices (not linear transformations!) as Hermitian (or non-negative) if their associated linear transformations are Hermitian (or non-negative).

10. Unitary transformations

When does a complex number have absolute value ? Clearly is a necessary and sufficient condition: guided by our heuristic principle we are led to consider linear transformations for which . Such transformations are called unitary. Concerning unitary transformations we prove the following two theorems,

Theorem 2. 10.1. A necessary and sufficient condition that be unitary is that for all and .

10.2. A necessary and sufficient condition that be unitary is that for all .

Proof. Since the condition of (10.1) is obviously stronger than that of (10.2), (i.e., if then ) it will be sufficient to prove that the condition of (10.1) is necessary and that the condition of (10.2) is sufficient. If for all , then so that for all . It follows that , so that . Conversely if is unitary, so that , then for all and ,

11. Automorphisms of inner product spaces

In any algebraic system, in particular in vector spaces and in inner product spaces, it is of interest to consider the automorphisms of the systems: i.e., to consider those one to one mappings of the system on itself which preserve all relations between elements of the system. The most general automorphism of a vector space is a one to one transformation that preserves addition and scalar multiplication: in other words it is an arbitrary linear transformation which has an Inverse. Of an automorphism of an inner product space we should also require that it preserve inner product: i.e., that we have . But this, as we have seen, is equivalent to the requirement that be unitary. Thus the two questions — ‘What linear transformations are the analogs of complex numbers of absolute value one?’ and ‘What are the most general automorphisms of inner product space?’ — have the same answer: unitary transformations. In tha next paragraph we shall see that unitary transformations furnish the answer to a third question also: ‘What happens to the matrix of the linear transformation when we change coordinate systems?’

12. Change of basis in an inner product space

We start with the comment that a necessary and sufficient condition that a linear transformation be unitary is that whenever is a complete orthonormal set then so is . For the condition is merely the statement that for and lying in complete orthonormal set, and by linearity the condition extends to all and . Suppose then that and are two coordinate systems (i.e., complete orthonormal sets). If is an arbitrary linear transformation then throughout this paragraph we shall denote by (or ) the matrix of in the coordinate system (or ). The matrices and are characterized, respectively, by the following two equations:

Let us denote by the linear transformation defined by the relations , and generally, . It follows from the comment made as the beginning of this paragraph that is unitary, and the second of tho two equations above implies that , or, in other words, that

But this means exactly that . Summing up: if and are two matrices corresponding to the same linear transformation in different coordinate systems, then there exists a unitary transformation such that , or, equivalently, there exists a unitary matrix such that .

13. Matricial characterization of unitary transformations

If is a unitary transformation and a coordinate system in which the matrix of is , then it follows from the multiplication rule for matrices and the equation , that . Clearly this condition characterizes : i.e., it is necessary and sufficient in order that be unitary. We terminate, temporarily, our discussion of unitary transformations, and turn to the discussion of another special class of linear transformations that will be of great interest to us.

14. Orthogonal projections in an inner product space

We have seen (in I.13) that if is a linear subspace then every vector may be written, uniquely, in the form with in and in . is called the projection of on . We consider the correspondence which assigns to every vector its projection on : is called a projection transformation or simply a projection. On occasions when it is not necessary to denote the dependence of the projection on the linear subspace in terms of which it was defined we shall use the letters for projections. We prove first that is a linear transformation. For if ; , with in and in , then and from the fact that and are linear subspaces it follows therefore that

The following theorem gives a complete algebraic characterization of projections.

Theorem 3. If is a projection then ; conversely if is a linear transformation for which then where is the linear subspace of all vectors of the form .

Proof. If and with in and in , then . Since has the representation , with in and in , and since this representation is unique, it follows that , whence for all . To prove that , let and be any two vectors and denote their projections on (or on ) by and (or and ) respectively. Then and this implies that .

Conversely suppose that , and let be the linear subspace of all vectors of the form . Since for any we have with in , the proof of the theorem will be complete when we succeed in proving that for every , is in . But this follows from the equations (We remark that the proof shows incidentally that consists precisely of all vectors of the form .)

In different words this theorem states that the characteristic properties of a projection are that it is Hermitian and idempotent . As a corollary of this theorem we obtain the fact that if then . Incidentally this theorem establishes a one to one correspondence between the class of all projections (idempotent and Hermitian transformations) and the class of all linear subspaces. In the following paragraphs we shall investigate this correspondence more closely and obtain conditions in order that certain algebraic combinations of projection transformations be themselves projections.

15. Products of projections

Theorem 4. A necessary and sufficient condition that the product of two projections, and , be a projection is that ; if this commutativity condition is satisfied then .

Proof. If is a projection, along with and , then so that and commute. (In fact we have already seen in (II.8) that if and are Hermitian then is Hermitian if and only if and commute.) Conversely if then and so that is Hermitian and idempotent. Finally suppose that ; we know from (II.14) that is the set of all vectors of the form . Then every vector in is simultaneously of the form and , so that is contained in the intersection . On the other hand if is any vector in then and , whence , so that is contained in . This proves that .

16. Sums of projections

Theorem 5. A necessary and sufficient condition that the sum of two projections, and , be a projection is that ; if this condition is satisfied then and are orthogonal and .

Proof. If is a projection then whence . Multiplying this equation first on the right and then on the left by we obtain the equations Upon subtraction it follows that , and this, combined with our original equation, yields . Conversely, if , then

so that is idempotent; being the sum of two Hermitian transformations it is also Hermitian and therefore it is a projection. Finally suppose that . Since and since is in and is in , it follows that is contained in . On the other hand if is any vector in then has the form , whence (since ), so that every in has the form and is therefore in . This proves that . In order to show that and are orthogonal we must show that in and in implies . This follows from the equations

17. Differences of projections

Theorem 6. A necessary and sufficient condition that the difference of two projections, and , be a projection is that ; if this condition is satisfied then is contained in and .

Proof. A necessary and sufficient condition in order that be a projection is that be one: i.e., that be a projection. According to (II.16) above, this is equivalent to . If moreover, this condition is satisfied and if then we know, still from (II.16) that and are orthogonal, whence is contained in , or, equivalently, is contained in , and finally

18. Relation between projections and involutions

We conclude our discussion of projections by indicating their relation to certain other classes of transformations. We shall call a linear transformation for which an involution, or an involutory transformation. We assert now that if a linear transformation has any two of the three properties—Hermitian, involutory, unitary—then it has the third.

(i). If and then , so that .

(ii). If and then whence .

(iii). If and , then .

Transformations having these three properties are related to projections through the following theorem.

Theorem 7. If two transformations and are related by the two (equivalent) conditions then a necessary and sufficient condition that be a projection is that be Hermitian, unitary, and involutory.

Proof. If is a projection then, since we must also have , and from it follows that Conversely if is Hermitian then is, and if then

19. The rank of a linear transformation

We conclude this chapter by a discussion of two important numerical invariants of linear transformations: the rank and the norm.

If is any linear transformation we define two linear subspaces and as follows: is the set of all vectors of the form , and is the set of all vectors for which . Let be the dimension of the vector space under consideration, and let and be the dimensions of and , respectively. We shall show that . The non-negative integer is the rank of the linear transformation . Let be a linear basis in (which is of course not necessarily a coordinate system in the sense of this chapter, but merely a maximal linearly independent set in ), and let the vectors be a linear basis in . Then every vector in may be written (uniquely) in the form and it follows that

In other words every vector of the form is a linear combination of the vectors , so that . We shall prove that these vectors are linearly independent, thereby proving that . If we had then we should have belonging to . accordingly this vector would have an expression of the form , so that . The linear independence of the , , implies then that .

If for the moment we denote the dependence of and on by writing and , and if is an arbitrary linear transformation which has an inverse, then it is very easy to characterize and . For if and only if lies in and if and only if , i.e., if and only if lies in . Since the image of under has the same dimension as it follows in particular that and have the same rank as . If is a linear transformation in an inner product space and if is any vector in then for all . In other words every vector of the form is orthogonal to , whence . It follows that is contained in , whence, denoting the rank of by , . Since this is generally true we may apply this result to obtaining , so that . Finally we notice that if and are arbitrary linear transformations and and their ranks then, since is contained in , it follows that the rank of is . Applying our previous result on duals and denoting the rank of by , we obtain , and we obtain the result that the rank of a product of two transformations does not exceed the rank of either factor. (This is Sylvester’s law of nullity: the terminology arises from the fact that if is a transformation of rank , is called the nullity of .)

We observe that in virtue of (II.14) the rank of a projection is the dimension of the linear subspace .

20. The norm of a linear transformation

In order to define the norm of a linear transformation defined in an inner product space of dimension we prove first the following theorem.

Theorem 8. There exists a constant such that for all , .

Proof. Let be an orthogonal basis in , and choose a constant so that for all . Since an arbitrary vector may be written in the form , we have whence it follows, remembering that and applying the Schwartz inequality, that so that we may choose .

The least number with the property described in the theorem is called the norm of : more formally the norm of can be defined as , with the supremum taken over all vectors .

21. Expressions for the norm of a transformation

Along with the norm of the transformation we may consider the following three constants:

We shall prove that . We first remark, however, that since the expressions defining and are bounded, and the suprema and are finite. It follows from this comment, and from the definitions of , , and , that , and we also have . Accordingly we will have proved the equality of all four constants involved if we succeed in proving that , , and .

Since for any , , where , and since , it follows that , whence the supremum, , of the expression on the left is also .

If for any vector for which neither nor we write , we obtain the equation It follows that the supremum, , of the expression on the is .

If for any pair , of non-vanishing vectors we define and , then we obtain, since , so that the supremum, , of the expression on the left is also .

The equation implies immediately that . It is easy to verify that if is unitary , and that if is a projection, , .

22. Upper and lower bounds of a Hermitian transformation

For Hermitian transformations we can find still another interesting expression for the norm, . If is an arbitrary real number we shall write if the transformation is non-negative (Cf. II.7), and if is non-negative. In this sense we may write , and we may define the upper bound, , of the Hermitian transformation , to be the least (i.e., the infimum) of the numbers for which ; similarly we define the lower bound, , of to be the greatest (i.e., the supremum) of the numbers for which . In other words, remembering the definition of non-negative transformations, and are, respectively, the infimum and the supremum of the set of all real numbers of the form . Our main result is that the upper and lower bounds of are related to the norm, , of by the relation Since we have already observed that , it follows that , so that . From the definition of it follows that , or, in other words, that the transformations and are non-negative. It follows from (II.8) that the transformations and are also non-negative. Hence so also is their sum, . Since implies , the theorem is trivial in this case; in any other case is positive and we obtain therefore the result that is a non-negative linear transformation. In other words whence , and the proof of the theorem is complete. Since it is easy to prove, by the methods of the preceding paragraph, that (where the first supremum is extended over all and the second one over all with ), our result is equivalent to the statement that for Hermitian transformations , for .