An introduction to linear algebra
Our story of linear algebra begins with the concept of the vector space. For our discussion, we will let be some field, for instance the real numbers or the complex numbers .
Definition. A vector space is a set (whose elements are called vectors) with addition of vectors and scalar multiplication of a vector by . That is,
- if , then there is an element ;
- if and , then there is an element ;
- there is a vector called such that for all ; and
- everything respects the associative, commutative, and distributive laws (, , , and ).
Some consequences of this definition are that is unique, and there are unique additive inverses such that (left as exercises).
There are many examples of vector spaces. One example is the noble , that is, the set of all -tuples with , where addition and scalar multiplication is componentwise. One can make nice diagrams of vector addition by the so-called “tip-to-tail method”: by imagining the vectors as arrows which can be translated in space without rotation, and then placing the “tail” of one vector at the “tip” of the last; the arrow which would start at the first tail and end at the last tip is the resulting sum. Scalar multiplication by then corresponds to scaling the arrow by a factor of (as it should, at least for integers: for instance). When , notice that this is just as a vector space with scalars.
A good example that is not is the vector space of polynomials (single or multivariable). Two polynomials may be added, and a polynomial may be multiplied by a scalar. Another good example is the vector space of continuous functions, say . In calculus, one learns that the sum of continuous functions is continuous, as is a constant times a continuous function.
A vector subspace of a vector space is a subset of vectors which is itself a vector space. An example is vectors for all inside of . One can show that a nonempty subset of a vector space is a vector subspace if and only if it is closed under addition and scalar multiplication. For instance, from this because .
Let us look again at the definition of a vector space, and suppose we have some scalars and some vectors . We may apply rule 2 for each pair to get vectors , and then we may repeatedly apply rule 1 to get a vector (one may prove that it does not matter in which order these vectors are added by appealing to the associative law). This is a key operation on a collection of vectors, and we call it a linear combination. In fact, the rules are set up so that this concept arises.
1. Linear transformations
At this point, we can start asking questions: which vectors in are linear combinations of some subset of vectors in ? can we find a finite set such that everything is a linear combination of those vectors? how unique are representations as a linear combination? However, we will take a detour through the fundamental idea of a linear transformation.
Definition. A linear transformation is a function between vector spaces and such that
- if then ; and
- if and , then .
Since , it follows that , and so this gives a quick test to check whether a function is not a linear transformation. One example would be a translation in the plane .
An example of a linear transformation is being a rotation around the origin in the plane. Another is being polynomials and being the derivative. The most basic examples are the identity transformation (also known as ) and the zero transformation .
There are two important properties of a linear transformation . The first is the kernel (or nullspace) of , denoted , which is the set of all vectors such that . One can check that is a vector subspace of . The second is the image of , denoted , which is the set of all vectors , for all (that is, the set of all vectors in where there is a such that ). One can also check that is a vector subspace of .
Recall that a function is surjective (or “onto”) if for every there is a such that . A clear observation is that a linear transformation is surjective if and only if . Recall also that a function is injective (or “one-to-one”) if whenever then (alternatively, by the contrapositive, if whenever then ). For a linear transformation, notice that this means that whenever then , and, letting , that whenever then . This amounts to saying that is injective if and only if (called a “trivial kernel”).
Back to linear combinations. We call a tuple of vectors a hypervector. In fact, , the collection of such hypervectors, is itself a vector space. (As an aside: we may imagine to be a collection of hypervectors, but we will not do this for a reason which will be clear shortly.) Given a vector , we define the product to be the linear combination . For instance, if and , with , then
A astute reader will recognize this as matrix multiplication with a column vector, if we had written the vectors of vertically while forgetting to write the inner parentheses.In fact, if is a hypervector, the function defined by is a linear transformation, as one can easily check. Notice if we drop parentheses that , which suggestively has it that . We will speak of itself as a linear transformation.
2. Basis and dimension
Now we are in a position to ask those questions about linear combinations. When can is a collection of vectors represent every vector in ? It is when the linear transformation is surjective (i.e., when ). When is a representation as a linear combination unique? It is when is injective (i.e., when is trivial).[1] This leads to a natural question: can one find a collection for which is an isomorphism (a linear transformation which is both surjective and injective)? This would give every element of a unique coordinate in .
The answer to this question comes from observing two processes. The first is looking at which are surjective. We call such surjective hypervectors spanning sets[2] of since we imagine every element of is “spanned” by some linear combination of these vectors. The second process is looking at which are injective. We call vectors in an injective linearly independent vectors. The terminology comes about from the concept of linear dependence: if is not injective, then the kernel is nontrivial, so one of the vectors (say ) can be written as , and so “depends” on the rest; linearly independent vectors are ones that are not dependent.
We will limit this discussion to vector spaces which have a spanning set with finitely many vectors in it, since otherwise the theory is not so simple. Suppose is such a vector space. Now, it makes sense to speak of a minimal spanning set , one that has the fewest number of elements in it. There may very well be (infinitely) many minimal spanning sets, but we only care that there is one. Because it is a spanning set, every vector of may be represented as a linear combination, but it is not clear that the representation is unique. Suppose is a minimal spanning set: is it an independent set? If there were a dependence, then one of the vectors, say , would be in the span of , which would be a spanning set with fewer vectors, hence it is indeed an independent set. This means that such vector spaces have an isomorphism for some .
What we have not settled is whether there are minimal spanning sets with different numbers of elements. If and are both minimal spanning sets, then there are isomorphisms and . Then is an isomorphism.[3] Hence, it reduces to:
Lemma. If is an isomorphism, then .
Proof. Suppose it is not already the case that , and without loss of generality, assume (replacing with as necessary). Let be the standard basis vectors of (with having a in the th coordinate), and construct an matrix by choosing the as unique coefficients such that . Since there are more unknowns than equations, by elimination we can solve the system for some nonzero , and since , this gives a linear dependence on the vectors , which in turn gives a linear dependence on the vectors since is injective. Therefore, . ∎
Theorem. Every vector space with a finite spanning set has a minimal spanning set called the basis, and any two bases have the same number of elements, called the dimension of the vector space. That is, there is hypervector which is an isomorphism. Such vector spaces are called finite dimensional.
Proof. This follows from the lemma. ∎
Effectively, what we have said is that, up to isomorphism, there are not very many kinds of finite dimensional vector spaces, and in fact they are all isomorphic to some . (“Dimension is all there is.”)
To finish off this discussion of dimension, we need the following key lemma:
Lemma. A vector subspace of a finite dimensional vector space is itself finite dimensional.
Proof. Since is finite dimensional, let us assume it is just , and let be the corresponding subspace under the isomorphism (i.e., if is an isomorphism, then is isomorphic to ). Suppose has linearly independent vectors . Then the matrix matrix represents a system of equations with more unknowns than equations, so again by elimination there is a such that . Hence, there is a dependence among the , a contradiction, so has at most linearly independent vectors. Let now represent a maximal set of linearly independent vectors, and suppose that it does not span . Let be an element not in the span of these vectors, and then is a larger linearly independent set. Therefore, is spanned by a finite set of vectors in , and so it is finite dimensional. ∎
What this gives us is that a linearly independent set of vectors must have no more vectors than a spanning set. Thus, we may build a basis by adding vectors one at a time that are not yet in the span of the vectors already present, and the lemma guarantees this process will finish.
3. Rank-nullity
What is the relationship between a linear transformation , , , and ? If we think about it, “kills” the entire subspace of the kernel, and so, intuitively, it is collapsing dimensions, and so we may think that the image consists of the dimensions which are remaining out of . This intuition is indeed correct, and gives us an important theorem for the theory of linear transformations:
Theorem. (Rank-nullity). Suppose is a linear transformation between finite dimensional vector spaces. Then . The dimension of the kernel is called nullity and the dimension of the image is called rank.
Proof. Let be a basis of elements for , which exists becaues is finite dimensional and is a subspace of . Let be a basis of elements of obtained by adding vectors to until it is a minimal spanning set. Then, because the first vectors in the basis are in the kernel. If this resulting vector in were , then also, and hence a linear combination in , but it cannot be because is a basis. Hence, the vectors are linearly independent and span . This proves that the dimension of is the sum of the dimensions of and . ∎
There is a related concept underlying this theorem. Given two vector spaces and , we may construct a new vector space called the direct sum of and , denoted , which is the set of all pairs . It is a vector space in by component-wise addition and scalar multiplication. We have already seen , which is actually . Another construction is between vector subspaces and of a vector space , namely , which one can check is also a vector subspace.
There is a canonical linear transformation defined by . If and is an isomorphism, then we say that is a direct sum of and (it is common in mathematics to speak of things being “the same,” even though they are not actually exactly the same, if there is some isomorphism between them).
Lemma. If are vector subspaces of such that and (i.e., they only share the zero vector), then is the direct sum .
Proof. Let be the canonical . It is surjective because . Suppose . Then , and so . This entails that and since are both closed under multiplication by . Since , , hence only , and so is injective. Therefore, is an isomorphism. ∎
Lemma. If is isomorphic to , then .
Proof. If is a basis of and is a basis of , then is a basis of . This is essentially the proof. ∎
What the rank-nullity theorem is actually saying is that is isomorphic to . We used the basis to construct a (non-unique) linear transformation such that was the identity (something like a partial inverse of ). With this, the explicit isomorphism is defined by .
4. Change of basis and matrices
If we have two bases and of a vector space , how do they relate? In each case, we have an isomorphism , and so the composition is itself a linear transformation, which takes coordinates according to the first basis and converts them into coordinates according to the second. That is, if , then , so really is the coordinates according to for the vector . This is illustrated by the following diagram, where the top arrow is the linear transformation , and the bottom arrow is the identity transformation . The diagram is said to commute, in that for all (where is simply ).
Notice that may be regarded as an square matrix. This is a point of view which is sometimes useful when one works with the vector spaces in terms of coordinates rather than abstract vectors.
In fact, suppose is a linear transformation, with a basis for the -dimensional and a basis for the -dimensional . We have a similar diagram
where the bottom arrow is and the top arrow is the obvious thing which make the diagram commutative, namely We may regard as an matrix, which is called the matrix of the transformation. The matrix can vary wildly depending on exactly which bases are chosen for and . When studying a particular linear transformation, it pays to find good bases for which the matrix is easy to analyze.The point of view which must be stressed is this: vector spaces may be coordinatized in many different ways (depending on the basis), and a linear transformation between vector spaces can be pulled back to be a linear transformation on the coordinates themselves, and likewise a linear transformation on coordinates can be pushed forward to a linear transformation of the vector spaces. There is much value in this duality of representation.
5. Eigenvectors and eigenvalues
Consider a linear operator . In some cases, we may find non-zero vectors such that for some (possibly zero) . When this happens, is the eigenvalue for the eigenvector . The “eigen” refers to the German word for “own” or “innate,” in the sense that these data tell everything about the action of .
For instance, suppose that has a basis of eigenvectors with respective eigenvalues . Then , and so the matrix of with basis is a diagonal matrix with entries being the eigenvalues. (And, if we wish to find the matrix when in the standard basis, by a suitable basis change, the matrix becomes .)
We have already seen one example of an eigenvector, namely non-zero vectors in , since for . An easy class of examples from the above discussion is diagonal matrices. A more interesting example is the differential operator on differentiable functions . Since , we see that is an eigenvector with eigenvalue .
The analysis of eigenvectors begins with the observation that an eigenvector for an eigenvalue is in the kernel of (which is a linear operator which takes a vector to ). We define the eigenspace for an eigenvalue to be . Since this is a kernel, it is clear that it is indeed a vector subspace of .
If and , then and simultaneously, and so , which implies . This means that there are only finitely many eigenvalues for a given linear operator since is isomorphic to the vector subspace of .
What we do not know yet is whether there even are any eigenvalues. Unfortunately, it is not always the case that there are eigenvalues (for instance, a rotation matrix in by ),[4] but this only depends on the field being algebraically closed (i.e., in which every polynomial has a root). One such field is , and so we will stick with this field for the rest of the section.
To show that has an eigenvalue, let be a nonzero vector in a finite dimensional vector space. Since is finite dimensional, there is some such that is a minimal set of linearly dependent vectors (where represents , and so on). Let be such that . Consider the polynomial . By the fundamental theorem of arithmetic, may be factored as for roots . Then, , so at least one of the linear operators is not injective, hence there is at least one such that is nontrivial. Therefore, has at least one eigenvalue .[5]
Unfortunately again, while this shows that there is an eigenvalue, we are not guaranteed to find a full set of eigenvalues such that , but we will take what we can get. An example of this is the matrix
which has the eigenvector with eigenvalue , however the above techinque for gives and , so there is a linear dependence , but has only as a root, and by rank-nullity, the kernel has dimension .At this point, we could discuss the relationship between determinants and eigenvalues, as well as trace and eigenvalues. Determinant and trace are useful tools for their study.
6. Linear recurrence relations
In this section, we discuss an example of using eigenvectors to analyze a linear recurrence relation and determine a closed formula.
Given a sequence of numbers , a linear recurrence relation of order is a rule with constants. The Fibonacci numbers with is an example of a linear recurrence relation.
Infinite sequences of numbers form a vector space , using componentwise addition and scalar multiplication. The zero vector is the sequence . We define two linear operators on infinite sequences. The first is the identity where , and the second is the shift operator . We may compose these to produce, for instance, the difference operator , where .
Perhaps surprisingly, there are a nice class of eigenvectors of . Since , we see that is an eigenvector with eigenvalue .[6]
Let us look at the Fibonacci sequence again. Notice that what it is saying is that , and so . This is saying that is in the kernel of the operator . If we take a bit of a leap of faith, notice that the polynomial has two roots, and , and this lets us factor . So, the kernel of includes eigenvectors of with eigenvalues and (since the factorization could have gone in either order). A possibility then is that , a sum of two eigenvectors. Let us find and so that and . In this case, and . Solving this system for , we have and , hence a possible closed form
Indeed, we constructed it so that it was in the kernel of and so , and furthermore and . It must be the closed form for the Fibonacci sequence.Since , very quickly approaches zero. The limit of as is thus , the golden ratio.
It should be said that there is another way we could have analyzed this system. By the recurrence relation, we have the following relation:
Let be the matrix above. Then, if we wish to find , we must calculate and take the first coordinate. Now, if there is an eigenbasis of (there is), we may diagonalize in that basis as . In this form, the dynamical properties of may be studied easily by observing that . One may check that and are eigenvalues of , and the eigenbasis follows.