Table of Contents
- 1. Transformations of rank one
- 2. The Hadamard product of non-negative matrices
- 3. The dual space of a vector space
- 4. The dual space of an inner product space
- 5. Reflexivity of inner product spaces
- 6. Direct sum of vector spaces
- 7. Tensor product of vector spaces
- 8. Dimension of a tensor product
- 9. The dual of a tensor product
- 10. Tensor product of inner product spaces
- 11. The inner product in a tensor product
- 12. Tensor product of transformations
- 13. Kronecker products of matrices
- 14. Properties of tensor product transformations
1. Transformations of rank one
Before beginning the proper subject matter of the present chapter we digress to a discussion of interest in itself whose results we shall need later. It follows easily from the spectral theory of normal transformations (or, equivalently, from the possibility of representing a normal transformation by a diagonal matrix) that every normal transformation is a sum of normal transformations of rank one; similarly every Hermitian (or non-negative) transformation is a sum of Hermitian (or non-negative) transformations of rank one. It becomes, therefore, of interest to investigate transformations of rank one.
Theorem 1. A necessary and sufficient condition that a linear transformation has rank is that in every coordinate system the matrix has the form .
Proof. If has rank then the set of all vectors of the form is one dimensional, so that there exists a vector with the property that is for every a constant multiple (depending on ) of : . (It is easily verified that is a linear function of , but we shall not need this fact.) Now if is a coordinate system the matrix is characterized by , whence so that
Conversely if , we may find a linear function for which , and we may define . The linear transformation defined by is clearly of rank , and we have so that .■
If has rank and is Hermitian and if the matrix of in some coordinate system is , then we must have . If, for some , and , then for all , whence . Since we assumed that the rank of is , this is impossible and we can find an for which . Using this the relation implies that with some constant independent of . Since the diagonal elements of a Hermitian matrix, i.e., the , are real, we can even conclude that is real, so that in this case has the form with a real .
If has rank and is non-negative, then the discussion of the preceding paragraph applies and the fact that the diagonal elements of a non-negative matrix are non-negative implies that is non-negative. In this case we may write and the relation shows that has the form .
It is easy to see that the conditions given in the last two paragraphs are not only necessary but also sufficient. If then clearly is Hermitian and has rank . If, moreover, and then
\begin{aligned} \langle A x, x \rangle &= \sum_i \sum_{j} c_{j} \bar{c}_{i}\langle A x_{j}, x_{i} \rangle\\ &= \sum_i \sum_{j} c_{j} \bar{c}_{i} a_{i} \bar{a}_{j} \\ &= \Big(\sum_i a_{i} \bar{c}_{i}\Big) \overline{\Big(\sum_{j} a_{j} \bar{c}_{j}\Big)}\\ &= \Big|\sum_i a_{i} \bar{c}_{i}\Big|^{2} \\ &\geq 0 \end{aligned}
so that is non-negative.
2. The Hadamard product of non-negative matrices
As a consequence of the preceding section it is very easy to prove a remarkable theorem on non-negative matrices, due to I. Schur.
Theorem 2. If and are non-negative linear transformations whose matrices in some coordinate system are and respectively then the linear transformation whose matrix in this coordinate system is defined by is also non-negative.
Proof. Since we may write both and as a sum of non-negative transformations of rank , may be written as a sum of transformations the matrices of which are obtained from the matrices of two non-negative transformations of rank in the same way as the matrix of was obtained from the matrices of and . Since a sum of non-negative transformations is non-negative, it is therefore sufficient to prove the theorem in the case where and both have rank . In this case , , and therefore , where , whence it follows that is non-negative (and has rank ).■
3. The dual space of a vector space
Definition 1. Let be an arbitrary vector space; we denote by the set of all linear functions defined on . If in , , , and are defined by , , and respectively, then becomes a vector space: we call the dual space of .
In the present chapter we shall discuss the theory of dual spaces. We call attention to the fact that all our definitions and theorems will be phrased without reference to any basis or coordinate system and that, although we shall make liberal use of bases, we use them only when that is unavoidable: namely in considerations of dimensionality, where bases enter by definition. Through out this chapter we shall mean by a basis a linear basis, i.e., a maximal set of linearly independent elements: in case is an inner product space we shall in each case specify whether or not we need an orthogonal basis.
If is of dimension so is . For let be a basis in . For each we may define a linear function by the requirement that . Then , i.e., implies so that the are linearly independent. Moreover if is arbitrary in and in then and , so that . In other words is a basis in , so that has dimension .
Since and are both dimensional vector spaces it is possible, in many ways, to set up a one to one correspondence between them that preserves , sum, and scalar product: in other words and are isomorphic. These isomorphisms, however, are perfectly arbitrary and yield no information about the structure of vector spaces. If, however, we consider not but its dual space, which we may denote by , (i.e., is the set of all linear functions defined on ), then it is possible to set up a ‘natural’ isomorphism between and .
Given any vector in , we make correspond to it an element in by defining, for every in , . (It is easy to verify that is indeed a linear function of .) The correspondence is linear: i.e., if , , and , then . For, by definition we have for each in ,
\begin{aligned} X(f)&=f(a_{1} x_{1}+a_{2} x_{2})\\ &=a_{1} f(x_{1})+a_{2} f(x_{2})\\ &=a_{1} X_{1}(f)+a_{2} X_{2}(f). \end{aligned}
We now show that the correspondence is one to one. If and correspond to the same , then we have for every in , , or . If we introduce, as above, a basis in and corresponding linear functions in , defined by , then we see that for all , where , implies in particular that , so that . Hence and can correspond to the same only if , as was to be proved. Finally we remark that every in corresponds in this correspondence to some in . The simplest proof of this fact is that since and therefore are dimensional vector spaces the dual space of is also dimensional. Hence if we can exhibit linearly independent elements of which do correspond to elements of the desired result will follow. Let be a basis in : then is a set of linearly independent elements, and therefore a basis, in . For , i.e., for all , implies that
whence, as above, and therefore .
Thus the correspondence is an isomorphism, the so-called ‘natural isomorphism’, between and .
4. The dual space of an inner product space
The considerations of the preceding section apply, of course, to inner product spaces. In the case of inner product spaces, however, It is not necessary to go to : we shall establish a natural correspondence between and .
Let be an dimensional inner product space and its dual space. The theorem on the representation of linear functions (cf. (I.15)) shows that every in has the form . This relation establishes a correspondence, which we already know to be one to one, between in and in . If in this correspondence corresponds to , , then we have \begin{aligned} f(x) &= a_{1} f_{1}(x)+a_{2} f_{2}(x)\\ &=a_{1} \langle x, y_{1} \rangle + a_{2} \langle x, y_2 \rangle\\ &=\langle x, \bar{a}_{1} y_{1}+\bar{a}_{2} y_{2} \rangle, \end{aligned} so that corresponds to . Thus the correspondence is not an isomorphism but a conjugate isomorphism between and .
This correspondence can also be used to define an inner product in . At first glance it might seem plausible to define to be , where , but due to the fact that the correspondence is a conjugate isomorphism we have the relation
so that this definition does not satisfy the requirements of the definition of an inner product in (I.3). If, however, we define then it is readily verified that is an inner product in so that is an inner product space.
5. Reflexivity of inner product spaces
If we apply the results of the preceding section not to the inner product space but to its dual we obtain a conjugate isomorphism between and . Thereby we have induced a one to one correspondence between itself and : it is readily verified (since the operation of conjugation is involutory) that this correspondence is an isomorphism. We now show that this isomorphism is the same as the natural isomorphism between and described in (IV.3). Let be an arbitrary vector in ; to it there corresponds an element in ; to this element, in turn, there corresponds the element in . We must show that . Let be an arbitrary element of ; we have as was to be proved.
6. Direct sum of vector spaces
Definition 2. If and are arbitrary vector spaces we define the direct sum, to be the set of all pairs with in and in .
If in we define \begin{aligned} 0 &= (0, 0),\\ (x_{1}, y_{1})+(x_{2}, y_{2})&=(x_{1}+x_{2}, y_{1}+y_{2}),\\ a(x, {y})&=(a x, a y), \end{aligned} then becomes a vector space. If, moreover, and are inner product spaces we may define in , and becomes thereby an inner product space. In fact although for vector spaces this definition yields something new, for inner product spaces it can be subsumed under the discussion of the projection theorem (I.13). In other words and can be thought of as two orthogonal linear subspaces in , and in an arbitrary inner product space a linear subspace and its orthogonal complement are a decomposition of the space into a direct sum.
If and are linear transformations in and respectively we may define a linear transformation , the direct sum of and , in by . It is easy to discuss the matricial representation of , its relation to addition, multiplication, scalar multiplication, , , inverse, dual, etc. We omit this discussion here, and merely state without proof two propositions that will be useful to us later.
- If and have dimensions and respectively the dimension of is . If and are bases in and respectively then the totality of all vectors of either of the two forms or is a basis in . If the matrix of the direct sum transformation is computed in this basis it will have the form
where and are the matrices of and in the bases and respectively, and where the zeros represent rectangular blocks each element of which is zero.
- The most general linear function on is of the form , where and are linear functions in and . In other words the dual space of a direct sum is the direct sum of the dual spaces.
7. Tensor product of vector spaces
The main purpose of this chapter is to define for vector spaces (and inner product spaces) the notion of a tensor product. In other words if and are given vector spaces we shall define for every vector in and in a product , which is to be an element of a suitable vector space, in such a way that depends linearly on either variable if the other one is fixed and so that (in case and are inner product spaces) we have
In order to clarify the definition we shall give, we proceed heuristically on the basis of the proposition (ii) in the preceding section. If we denote the (as yet undefined) tensor product of and by , me may expect that . Since it is technically easier to do so, instead of defining itself we shall instead define ; we shall then write, by definition, . Also we may expect that if and are linear functions in and respectively then it is their product, , that should in some sense be the general element of . This product is a function , defined for in and in , with the property that for each fixed value of one variable it is a linear function of the other: in other words is a bilinear function of and . This discussion is meant to motivate the formal work that we begin in the next paragraph.
Let and be vector spaces of dimensions and respectively; we denote by the set of all bilinear functions defined for in and in . Let be the dual space of (i.e., is the set of all linear functions defined for in ): we call the tensor product of and . To every pair of vectors with in and in we make correspond the element in defined by . (It is easy to verify that is a linear function of .) We write and call the tensor product of and . We shall consistently use the notation for vectors of , for vectors of , and for vectors of the vector space .
8. Dimension of a tensor product
We observe that the dimension of is . For, exactly as in (IV.3) above, we may choose bases and in and respectively, and then we may find bilinear functions subject to the requirement that . It is then easy to show that the are linearly independent and that every bilinear function is a linear combination of them.
We shall also need the fact that the elements of are a basis in . According to the preceding paragraph we need only prove that they are linearly independent. If for all then we should have, in particular, for all and , as was to be proved.
9. The dual of a tensor product
If and and if then . For we have, for every bilinear function: , \begin{aligned} z(f)&=f(a_{1} x_{1}+a_{2} x_{2}, y)\\ &=a_{1} f(x_{1}, y)+a_{2} f(x_{2}, y)\\ &=a_{1} z_{1}(f)+a_{2} z_{2}(f). \end{aligned} Similarly we can show that so that depends linearly on each of its factors when the other is held fixed. It follows from the preceding paragraph that every element in is a sum of tensor products (not necessarily uniquely). It is also easy to prove, using the bilinear character of , that every linear function of (i.e., every element in the dual space of ) is a bilinear function of and and consequently a sum of products of the form , where and are linear functions defined on and respectively. Hence for general vector spaces our definition of tensor product fulfills the conditions (heuristically derived above) of our program. Before investigating the relation of tensor product spaces to linear transformations, we examine the situation in inner product spaces.
10. Tensor product of inner product spaces
If and are inner product spaces the construction of the preceding sections applies unaltered: the only new problem is to introduce into the tensor product an inner product related to the given inner products in the factor spaces in a suitable way. It is technically easier to define inner product not in but in and then apply the general theory of duals of inner product spaces to find a tensor product in .
If is any element of , can be written as a sum of products of the form , or, since and are inner product spaces, can be written as a sum of expressions of the form . Hence if f^{\prime} and f^{\prime \prime} are any two elements of we may write
f^{\prime}(x, y)=\sum_i\langle x, x_{i}^{\prime} \rangle \langle y, y_{i}^{\prime} \rangle \quad \text{ and } \quad f^{\prime \prime}(x, y)=\sum_{j}\langle x, x_{j}^{\prime\prime}\rangle \langle y, y_{j}^{\prime\prime} \rangle.
We write, by definition, \langle f^{\prime}, f^{\prime\prime} \rangle = \sum_i \sum_{j}\langle x_{j}^{\prime\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime\prime}, y_{i}^{\prime} \rangle.
(The conjugate nature of the relation between vectors and linear functions again necessitates putting x_{j}^{\prime \prime} before x_{i}^{\prime}.) Before we can even start to prove that this definition fulfills the conditions of the definition of an inner product, we must prove that it defines \langle f^{\prime}, {f}^{\prime \prime} \rangle independently of the representations as sums. To do this we observe that \sum_j \langle x_j^{\prime\prime}, x_i^\prime \rangle \langle y_j^{\prime\prime}, y_i^\prime \rangle = \overline{f^{\prime\prime}(x_i^\prime, y_i^\prime)}, so that \langle f^\prime, f^{\prime\prime} \rangle = \sum_i \overline{f^{\prime\prime}(x_i^\prime, y_i^\prime)}, whence \langle f^{\prime}, {f}^{\prime \prime} \rangle is independent of the particular representation of f^{\prime \prime}. Since, moreover, in any given representations of f^\prime and f^{\prime\prime}, \langle f^{\prime}, f^{\prime\prime} \rangle = \overline{\langle f^{\prime\prime}, f^{\prime} \rangle}, it follows that \langle f^\prime, f^{\prime\prime} \rangle is also independent of the representation of f^{\prime}.
It is easy to verify that the expression \langle f^\prime, f^{\prime\prime} \rangle is linear in f^\prime, conjugate linear in {f}^{\prime \prime}, and Hermitian symmetric. It remains to prove that it is positive definite: i.e., that \langle f^{\prime}, f^{\prime} \rangle \geq 0 for all f^{\prime}, and that \langle f^{\prime}, f^{\prime} \rangle = 0 if and only if f^{\prime}=0. This surprisingly, is not trivial: it requires Schur’s theorem, proved in (IV.2).
We have \langle f^{\prime}, f^{\prime} \rangle = \sum_i^{p} \sum_{j}^{p}\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime}, y_{i}^{\prime} \rangle.
Let be arbitrary complex numbers. Then \begin{aligned} \sum_i \sum_{j}\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \bar{a}_{i} a_{j} &= \Big\langle \sum_{j} a_{j} x_{j}^{\prime}, \sum_i a_{i} x_{i}^{\prime} \Big\rangle\\ &= \Big|\sum_i a_{i} x_{i}^{\prime}\Big|^{2}\\ &\geq 0, \end{aligned} so that the matrix whose general element is \langle x_{j}^{\prime}, x_{i}^{\prime} \rangle is non-negative. Similarly we may show that the matrix whose general element is \langle y_{j}^{\prime}, y_{i}^\prime \rangle is non-negative; it follows from Schur’s theorem that the matrix whose general element is the product \langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime}, y_{i}^{\prime} \rangle is also non-negative. Hence \sum_i \sum_{j}\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime}, y_{i}^{\prime} \rangle a_{i} \bar{a}_{j} \geq 0 for every choice of the complex numbers : choosing for all proves that \langle f^{\prime}, f^{\prime} \rangle \geq 0.
In order to prove that \langle f^{\prime}, f^{\prime} \rangle = 0 implies f^{\prime}=0 we proceed as follows. For the expression \langle f^{\prime}, f^{\prime} \rangle, which now has all other properties of an inner product, we may prove the Schwartz inequality as in (I.4): |\langle f^{\prime}, f^{\prime\prime} \rangle| \leq \Big(\langle f^\prime, f^{\prime} \rangle \, \langle f^{\prime\prime}, f^{\prime\prime} \rangle \Big)^{1 / 2}.
It follows that the vanishing of \langle f^{\prime}, f^{\prime} \rangle implies the vanishing of \langle f^{\prime}, f^{\prime \prime} \rangle for all f^{\prime \prime}. Let x^{\prime \prime} and y^{\prime \prime} be arbitrary vectors and take, in particular, f^{\prime\prime}=f^{\prime\prime}(x, y)= \langle x, x^{\prime\prime} \rangle \langle y, y^{\prime\prime} \rangle. The vanishing of \langle f^{\prime}, f^{\prime\prime} \rangle implies that \begin{aligned} 0 &= \langle f^{\prime}, f^{\prime\prime} \rangle\\ &= \sum_i \langle x^{\prime\prime}, x_{i}^{\prime} \rangle \langle y^{\prime\prime}, y_{i}^{\prime} \rangle\\ &= f^{\prime}(x^{\prime\prime}, y^{\prime\prime}); \end{aligned} hence the vanishing of \langle f^{\prime}, f^{\prime\prime} \rangle for all f^{\prime \prime} implies that f^{\prime}(x^{\prime\prime}, y^{\prime \prime})=0, for every pair x^{\prime \prime}, y^{\prime \prime} of vectors, or, in other words, that f^{\prime}=0.
This concludes the introduction of an inner product in . Applying the results of (IV.4) we obtain an inner product in the dual space of , so that becomes an inner product space.
11. The inner product in a tensor product
It is now easy to prove that the inner product defined in has the property that
\langle x^{\prime} \otimes y^{\prime}, x^{\prime \prime} \otimes y^{\prime \prime} \rangle = \langle x^{\prime}, x^{\prime \prime} \rangle \langle y^{\prime}, y^{\prime \prime} \rangle.
We write z^{\prime}=x^{\prime} \otimes y^{\prime}, \quad \text{ and } \quad z^{\prime \prime}=x^{\prime \prime} \otimes y^{\prime \prime}, and we define z_{0}^{\prime}=z_{0}^{\prime}(f)= \langle f, f^{\prime} \rangle, \qquad z_{0}^{\prime\prime}=z_{0}^{\prime\prime}(f)= \langle f, f^{\prime\prime} \rangle, where f^{\prime} and f^{\prime\prime} are the particular bilinear functions defined by f^{\prime}(x, y)= \langle x, x^{\prime} \rangle \langle y, y^{\prime} \rangle \quad \text{and} \quad f^{\prime \prime}(x, y)= \langle x, x^{n} \rangle \langle y, y^{n} \rangle.
For an arbitrary we have \begin{aligned} z_{0}^{\prime}(f) &= \sum_i \langle x^{\prime}, x_{i} \rangle \langle y^{\prime}, y_{i} \rangle\\ &= f(x^{\prime}, y^{\prime})\\ &= z^{\prime}(f) \end{aligned} and \begin{aligned} z_{0}^{\prime\prime}(f) &= \sum_i \langle x^{\prime\prime}, x_{i} \rangle \langle y^{\prime\prime}, y_{i} \rangle\\ &= f(x^{\prime\prime}, y^{\prime\prime})\\ &= z^{\prime\prime}(f). \end{aligned} (This is similar to the proof in (IV.5) if the equality of the two natural correspondences between an inner product space and its second dual.) Hence we have, finally, \langle z^{\prime}, z^{\prime\prime} \rangle = \langle z_{0}^{\prime}, z_{0}^{\prime\prime} \rangle = \langle f^{\prime\prime}, f^{\prime} \rangle = \langle x^{\prime}, x^{\prime\prime} \rangle \langle y^{\prime}, y^{\prime \prime} \rangle, as was to be proved.
The last proved fact justifies the terminology of tensor product and describes completely the structure of and its relation to and . It follows also that if and are orthogonal bases in and respectively then so that the form an orthonormal set in . Since we have already seen that they form a maximal linearly independent set it follows that they are a complete orthonormal set, or an orthogonal basis, in .
12. Tensor product of transformations
We are now in a position to examine the relation of linear transformations to the theory of tensor products. If and are linear transformations defined in and respectively we define a linear transformation in by , and then a linear transformation in by . In brief:
If we apply to a particular of the form (i.e., ) we obtain
Since we have already remarked that every is a sum of tensor products the relation completely characterizes . The linear transformation in the space is called the tensor product of the linear transformations and , .
13. Kronecker products of matrices
Let and be linear transformations and and orthogonal bases in and respectively. We find the matrix of the linear transformation in the orthogonal basis of . Naturally the matrix depends on the way in which these vectors are ordered in a linear order: we suppose first that the order is the lexicographical one, i.e.,
We have \begin{aligned} C(x_{j} \otimes y_{\beta}) &= A x_{j} \otimes B y_{\beta}\\ &= \Big(\sum_i a_{ij} x_{i}\Big) \otimes \Big(\sum_{\alpha} b_{\alpha \beta} y_{\alpha}\Big)\\ &= a_{1j} b_{1\beta}(x_{1} \otimes y_{1}) + a_{1j} b_{2 \beta}(x_{1} \otimes y_{2}) + \cdots + a_{1j} b_{m \beta}(x_{1} \otimes y_{m})+\cdots \end{aligned}
so that the matrix of has the form or, in a condensed notation whose meaning is clear,
If we had adopted, instead, the converse lexicographic ordering, i.e., we should have found the matrix of to be
The first of these two matrices is known as the Kronecker product, , of and (in this order!); the second one is . Since a permutation of the elements of an orthogonal basis is a trivial kind of change of basis (i.e. it is effected by a unitary transformation ) we obtain that
14. Properties of tensor product transformations
We now proceed to describe some of the elementary properties of tensor product transformations.
14.1. If and then . For we have \begin{aligned} (A \otimes B)(x \otimes y) &= Ax \otimes By\\ &= \Big(\sum_i a_i A_i x\Big) \otimes \Big(\sum_j b_j B_j y\Big)\\ &= \sum_i \sum_j a_i b_j (A_i \otimes B_j)(x \otimes y). \end{aligned}
14.2. If and then . For \begin{aligned} (A \otimes B)(x \otimes y) &= A x \otimes B y\\ &= A_{1} A_{2} x \otimes B_{1} B_{2} y\\ &= (A_{1} \otimes B_{1})(A_{2} x \otimes B_{2} y)\\ &= (A_{1} \otimes B_{1})(A_{2} \otimes B_{2})(x \otimes y). \end{aligned} As immediate consequences of this result we obtain the formulas \begin{aligned} (A \otimes B) &= (A \otimes 1)(1 \otimes B) = (1 \otimes B)(A \otimes 1), \\ (A \otimes B)^{-1} &= (A^{-1} \otimes B^{-1}). \end{aligned}