An introduction to linear algebra

Our story of linear algebra begins with the concept of the vector space. For our discussion, we will let $k$ be some field, for instance the real numbers $R$ or the complex numbers $C$ .

Definition. A vector space $V$ is a set (whose elements are called vectors) with addition of vectors and scalar multiplication of a vector by $k$ . That is,

if $v, w \in V$ , then there is an element $v + w \in V$ ;
if $c \in k$ and $v \in V$ , then there is an element $c v \in V$ ;
there is a vector called $0 \in V$ such that $v + 0 = v$ for all $v \in V$ ; and
everything respects the associative, commutative, and distributive laws ( $u + (v + w) = (u + v) + w$ , $v + w = w + v$ , $c (v + w) = c v + c w$ , and $(c + d) v = c v + d v$ ).

Some consequences of this definition are that $0$ is unique, and there are unique additive inverses $- v$ such that $v + (- v) = 0$ (left as exercises).

There are many examples of vector spaces. One example is the noble $R^{n}$ , that is, the set of all $n$ -tuples $(c_{1}, \dots, c_{n})$ with $c_{i} \in R$ , where addition and scalar multiplication is componentwise. One can make nice diagrams of vector addition by the so-called “tip-to-tail method”: by imagining the vectors as arrows which can be translated in space without rotation, and then placing the “tail” of one vector at the “tip” of the last; the arrow which would start at the first tail and end at the last tip is the resulting sum. Scalar multiplication by $c$ then corresponds to scaling the arrow by a factor of $c$ (as it should, at least for integers: $v + v = 1 v + 1 v = (1 + 1) v = 2 v$ for instance). When $n = 1$ , notice that this is just $R$ as a vector space with $R$ scalars.

A good example that is not $R^{n}$ is the vector space of polynomials (single or multivariable). Two polynomials may be added, and a polynomial may be multiplied by a scalar. Another good example is the vector space of continuous functions, say $R \to R$ . In calculus, one learns that the sum of continuous functions is continuous, as is a constant times a continuous function.

A vector subspace $W$ of a vector space $V$ is a subset of vectors which is itself a vector space. An example is vectors $(x, 0)$ for all $x \in R$ inside of $R^{2}$ . One can show that a nonempty subset of a vector space is a vector subspace if and only if it is closed under addition and scalar multiplication. For instance, from this $0 \in W$ because $0 w = 0$ .

Let us look again at the definition of a vector space, and suppose we have some scalars $c_{1}, \dots, c_{n} \in k$ and some vectors $v_{1}, \dots, v_{n} \in V$ . We may apply rule 2 for each pair to get vectors $c_{1} v_{1}, \dots, c_{n} v_{n} \in V$ , and then we may repeatedly apply rule 1 to get a vector $c_{1} v_{1} + \dots + c_{n} v_{n} \in V$ (one may prove that it does not matter in which order these vectors are added by appealing to the associative law). This is a key operation on a collection of vectors, and we call it a linear combination. In fact, the rules are set up so that this concept arises.

1. Linear transformations

At this point, we can start asking questions: which vectors in $V$ are linear combinations of some subset of vectors in $V$ ? can we find a finite set such that everything is a linear combination of those vectors? how unique are representations as a linear combination? However, we will take a detour through the fundamental idea of a linear transformation.

Definition. A linear transformation is a function $T : V \to W$ between vector spaces $V$ and $W$ such that

if $v, w \in V$ then $T (v + w) = T (v) + T (w)$ ; and
if $v \in V$ and $c \in k$ , then $T (c v) = c T (v)$ .

That is, for a linear combination

\sum_{i} c_{i} v_{i}

, then

T (\sum_{i} c_{i} v_{i}) = \sum_{i} c_{i} T (v_{i})

. When

W = V

, then the linear transformation is called a linear operator.

Since $T (0) = T (0 + 0) = T (0) + T (0)$ , it follows that $T (0) = 0$ , and so this gives a quick test to check whether a function is not a linear transformation. One example would be a translation in the plane $R^{2}$ .

An example of a linear transformation is $T$ being a rotation around the origin in the plane. Another is $V = W$ being polynomials and $T$ being the derivative. The most basic examples are the identity transformation $id x = x$ (also known as $I$ ) and the zero transformation $Z x = 0$ .

There are two important properties of a linear transformation $T : V \to W$ . The first is the kernel (or nullspace) of $T$ , denoted $\ker T$ , which is the set of all vectors $v \in V$ such that $T v = 0$ . One can check that $\ker T$ is a vector subspace of $V$ . The second is the image of $T$ , denoted $im T$ , which is the set of all vectors $T v \in W$ , for all $v \in V$ (that is, the set of all vectors $w$ in $W$ where there is a $v \in V$ such that $w = T v$ ). One can also check that $im T$ is a vector subspace of $W$ .

Recall that a function $T : V \to W$ is surjective (or “onto”) if for every $w \in W$ there is a $v \in V$ such that $T v = w$ . A clear observation is that a linear transformation $T$ is surjective if and only if $im T = W$ . Recall also that a function $T : V \to W$ is injective (or “one-to-one”) if whenever $T v_{1} = T v_{2}$ then $v_{1} = v_{2}$ (alternatively, by the contrapositive, if whenever $v_{1} \neq v_{2}$ then $T v_{1} \neq T v_{2}$ ). For a linear transformation, notice that this means that whenever $T (v_{1} - v_{2}) = 0$ then $v_{1} - v_{2} = 0$ , and, letting $v = v_{1} - v_{2}$ , that whenever $T v = 0$ then $v = 0$ . This amounts to saying that $T$ is injective if and only if $\ker T = {0}$ (called a “trivial kernel”).

Back to linear combinations. We call a tuple of vectors $v = (v_{1} v_{2} \dots v_{n})$ a hypervector. In fact, $V^{n}$ , the collection of such hypervectors, is itself a vector space. (As an aside: we may imagine $k^{n}$ to be a collection of hypervectors, but we will not do this for a reason which will be clear shortly.) Given a vector $c \in k^{n}$ , we define the product $v c$ to be the linear combination $\sum_{i} c_{i} v_{i}$ . For instance, if $v_{1} = (0, 1)$ and $v_{2} = (- 1, 0)$ , with $c = (2, 1)$ , then

v c = ((0, 1), (- 1, 0)) (2, 1) = 2 (0, 1) + 1 (- 1, 0) = (- 1, 2) .

A astute reader will recognize this as matrix multiplication with a column vector, if we had written the vectors of

v

vertically while forgetting to write the inner parentheses.

In fact, if $v \in V^{n}$ is a hypervector, the function $T : k^{n} \to V$ defined by $T (c) = v c$ is a linear transformation, as one can easily check. Notice if we drop parentheses that $T c = v c$ , which suggestively has it that $T = v$ . We will speak of $v$ itself as a linear transformation.

2. Basis and dimension

Now we are in a position to ask those questions about linear combinations. When can is a collection of vectors $v_{1}, \dots, v_{n} \in V$ represent every vector in $V$ ? It is when the linear transformation $v : k^{n} \to V$ is surjective (i.e., when $im v = V$ ). When is a representation as a linear combination unique? It is when $v : k^{n} \to V$ is injective (i.e., when $\ker v$ is trivial).^[1] This leads to a natural question: can one find a collection $v$ for which $v$ is an isomorphism (a linear transformation which is both surjective and injective)? This would give every element of $V$ a unique coordinate in $k^{n}$ .

The answer to this question comes from observing two processes. The first is looking at $v$ which are surjective. We call such surjective hypervectors spanning sets^[2] of $V$ since we imagine every element of $V$ is “spanned” by some linear combination of these vectors. The second process is looking at $v$ which are injective. We call vectors in an injective $v$ linearly independent vectors. The terminology comes about from the concept of linear dependence: if $v$ is not injective, then the kernel is nontrivial, so one of the vectors (say $v_{1}$ ) can be written as $v_{1} = (c_{2} / c_{1}) v_{2} + \dots + (c_{n} / v_{1}) v_{2}$ , and so $v_{1}$ “depends” on the rest; linearly independent vectors are ones that are not dependent.

We will limit this discussion to vector spaces which have a spanning set with finitely many vectors in it, since otherwise the theory is not so simple. Suppose $V$ is such a vector space. Now, it makes sense to speak of a minimal spanning set $v$ , one that has the fewest number of elements in it. There may very well be (infinitely) many minimal spanning sets, but we only care that there is one. Because it is a spanning set, every vector of $V$ may be represented as a linear combination, but it is not clear that the representation is unique. Suppose $v = (v_{1} \dots v_{n})$ is a minimal spanning set: is it an independent set? If there were a dependence, then one of the vectors, say $v_{n}$ , would be in the span of $v_{1}, \dots, v_{n - 1}$ , which would be a spanning set with fewer vectors, hence it is indeed an independent set. This means that such vector spaces have an isomorphism $k^{n} \to V$ for some $n$ .

What we have not settled is whether there are minimal spanning sets with different numbers of elements. If $u = (u_{1} \dots u_{m})$ and $u (v_{1} \dots v_{n})$ are both minimal spanning sets, then there are isomorphisms $u : k^{m} \to V$ and $v : k^{n} \to V$ . Then $v^{- 1} u : k^{n} \to k^{m}$ is an isomorphism.^[3] Hence, it reduces to:

Lemma. If $T : k^{m} \to k^{n}$ is an isomorphism, then $m = n$ .

Proof. Suppose it is not already the case that $m = n$ , and without loss of generality, assume $m > n$ (replacing $T$ with $T^{- 1}$ as necessary). Let $e_{1}, \dots, e_{m}$ be the standard basis vectors of $k^{m}$ (with $e_{i} = (0, \dots, 0, 1, 0, \dots, 0)$ having a $1$ in the $i$ th coordinate), and construct an $n \times m$ matrix $(A_{i j})$ by choosing the $A_{i j}$ as unique coefficients such that $T e_{j} = \sum_{i} A_{i j} e_{i}$ . Since there are more unknowns than equations, by elimination we can solve the system $0 = A c$ for some nonzero $c \in k^{m}$ , and since $(T e_{j})_{j} c = \sum_{j} c_{j} T e_{j} = \sum_{j} c_{j} (\sum_{i} A_{i j} e_{i}) = \sum_{i} (\sum_{j} c_{j} A_{i j}) e_{i} = A c = 0$ , this gives a linear dependence on the vectors $T e_{j}$ , which in turn gives a linear dependence on the vectors $e_{j}$ since $T$ is injective. Therefore, $m = n$ . ∎

Theorem. Every vector space $V$ with a finite spanning set has a minimal spanning set called the basis, and any two bases have the same number of elements, called the dimension of the vector space. That is, there is hypervector $v$ which is an isomorphism. Such vector spaces are called finite dimensional.

Proof. This follows from the lemma. ∎

Effectively, what we have said is that, up to isomorphism, there are not very many kinds of finite dimensional vector spaces, and in fact they are all isomorphic to some $k^{n}$ . (“Dimension is all there is.”)

To finish off this discussion of dimension, we need the following key lemma:

Lemma. A vector subspace $W$ of a finite dimensional vector space $V$ is itself finite dimensional.

Proof. Since $V$ is finite dimensional, let us assume it is just $k^{n}$ , and let $W$ be the corresponding subspace under the isomorphism (i.e., if $T : V \to k^{n}$ is an isomorphism, then $W$ is isomorphic to $T (W) = {T w : w \in W}$ ). Suppose $W$ has $m > n$ linearly independent vectors $w_{1}, \dots, w_{m} \in W$ . Then the matrix $n \times m$ matrix $A = (w_{1}, \dots, w_{m})$ represents a system of equations with more unknowns than equations, so again by elimination there is a $c \in k^{m}$ such that $A c = 0$ . Hence, there is a dependence among the $w_{i}$ , a contradiction, so $W$ has at most $n$ linearly independent vectors. Let $w_{1}, \dots, w_{m} \in W$ now represent a maximal set of $m \leq n$ linearly independent vectors, and suppose that it does not span $W$ . Let $w \in W$ be an element not in the span of these vectors, and then $w_{1}, \dots, w_{m}, w$ is a larger linearly independent set. Therefore, $W$ is spanned by a finite set of vectors in $V$ , and so it is finite dimensional. ∎

What this gives us is that a linearly independent set of vectors must have no more vectors than a spanning set. Thus, we may build a basis by adding vectors one at a time that are not yet in the span of the vectors already present, and the lemma guarantees this process will finish.

3. Rank-nullity

What is the relationship between a linear transformation $T : V \to W$ , $V$ , $\ker V$ , and $im T$ ? If we think about it, $T$ “kills” the entire subspace of the kernel, and so, intuitively, it is collapsing dimensions, and so we may think that the image consists of the dimensions which are remaining out of $V$ . This intuition is indeed correct, and gives us an important theorem for the theory of linear transformations:

Theorem. (Rank-nullity). Suppose $T : V \to W$ is a linear transformation between finite dimensional vector spaces. Then $\dim V = \dim (\ker T) + \dim (im T)$ . The dimension of the kernel is called nullity and the dimension of the image is called rank.

Proof. Let $u$ be a basis of $m$ elements for $\ker T$ , which exists becaues $V$ is finite dimensional and $\ker T$ is a subspace of $V$ . Let $v$ be a basis of $n$ elements of $V$ obtained by adding vectors to $u$ until it is a minimal spanning set. Then, $T (c_{1} v_{1} + \dots + c_{n} v_{n}) = T (c_{m + 1} v_{m + 1} + \dots + c_{n} v_{n})$ because the first $m$ vectors in the basis are in the kernel. If this resulting vector in $W$ were $0$ , then $c_{m + 1} v_{m + 1} + \dots + c_{n} v_{n} \in \ker T$ also, and hence a linear combination in $u$ , but it cannot be because $v$ is a basis. Hence, the vectors $T v_{m + 1}, \dots, T_{n} v_{n}$ are linearly independent and span $im T$ . This proves that the dimension of $V$ is the sum of the dimensions of $\ker T$ and $im T$ . ∎

There is a related concept underlying this theorem. Given two vector spaces $V$ and $W$ , we may construct a new vector space called the direct sum of $V$ and $W$ , denoted $V \oplus W$ , which is the set of all pairs ${(v, w) : v \in V, w \in W}$ . It is a vector space in by component-wise addition and scalar multiplication. We have already seen $R^{2}$ , which is actually $R \oplus R$ . Another construction is between vector subspaces $U$ and $W$ of a vector space $V$ , namely $U + W = {u + w : u \in U, v \in W}$ , which one can check is also a vector subspace.

There is a canonical linear transformation $T : U \oplus W \to U + W$ defined by $T (u, w) = u + w$ . If $U + W = V$ and $T$ is an isomorphism, then we say that $V$ is a direct sum of $U$ and $W$ (it is common in mathematics to speak of things being “the same,” even though they are not actually exactly the same, if there is some isomorphism between them).

Lemma. If $U, W$ are vector subspaces of $V$ such that $U + W = V$ and $U \cap W = {0}$ (i.e., they only share the zero vector), then $V$ is the direct sum $U \oplus W$ .

Proof. Let $T : U \oplus W \to V$ be the canonical $T (u, w) = u + w$ . It is surjective because $U + W = V$ . Suppose $(u, w) \in \ker T$ . Then $u + w = 0$ , and so $u = - w$ . This entails that $u \in W$ and $v \in U$ since $U, W$ are both closed under multiplication by $- 1 \in k$ . Since $u, w \in U \cap W$ , $u = w = 0$ , hence only $(0, 0) \in \ker T$ , and so $T$ is injective. Therefore, $T$ is an isomorphism. ∎

Lemma. If $V$ is isomorphic to $U \oplus W$ , then $\dim V = \dim U + \dim W$ .

Proof. If $u$ is a basis of $U$ and $w$ is a basis of $W$ , then $(u_{1}, 0), \dots, (u_{m}, 0), (0, w_{1}), \dots, (0, w_{n})$ is a basis of $U \oplus W$ . This is essentially the proof. ∎

What the rank-nullity theorem is actually saying is that $V$ is isomorphic to $\ker T \oplus im T$ . We used the basis to construct a (non-unique) linear transformation $S : im T \to V$ such that $T S : im T \to im T$ was the identity (something like a partial inverse of $T$ ). With this, the explicit isomorphism $V \to \ker T \oplus im T$ is defined by $v \mapsto (v - S (T (v)), T (v))$ .

4. Change of basis and matrices

If we have two bases $v$ and $w$ of a vector space $V$ , how do they relate? In each case, we have an isomorphism $k^{n} \to V$ , and so the composition $P = w^{- 1} v : k^{n} \to k^{n}$ is itself a linear transformation, which takes coordinates according to the first basis and converts them into coordinates according to the second. That is, if $c \in k^{n}$ , then $w (P c) = w (w^{- 1} v c) = v c$ , so $P c \in k^{n}$ really is the coordinates according to $w$ for the vector $v c$ . This is illustrated by the following diagram, where the top arrow is the linear transformation $P$ , and the bottom arrow is the identity transformation $id x = x$ . The diagram is said to commute, in that $id (v (c)) = w (P c)$ for all $c \in k^{n}$ (where $id (v (c))$ is simply $v (c)$ ).

\begin{matrix} k^{n} & \overset{P}{\to} & k^{n} \\ ↓ v & ↓ w \\ V & \overset{id}{\to} & V \end{matrix}

Notice that $P$ may be regarded as an $n \times n$ square matrix. This is a point of view which is sometimes useful when one works with the vector spaces in terms of coordinates rather than abstract vectors.

In fact, suppose $T : V \to W$ is a linear transformation, with $v$ a basis for the $n$ -dimensional $V$ and $w$ a basis for the $m$ -dimensional $W$ . We have a similar diagram

\begin{matrix} k^{n} & \overset{A}{\to} & k^{m} \\ ↓ v & ↓ w \\ V & \overset{T}{\to} & W \end{matrix}

where the bottom arrow is

T

and the top arrow is the obvious thing which make the diagram commutative, namely

A = w^{- 1} T v

We may regard

A

as an

m \times n

matrix, which is called the matrix of the transformation. The matrix can vary wildly depending on exactly which bases are chosen for

V

and

W

. When studying a particular linear transformation, it pays to find good bases for which the matrix is easy to analyze.

The point of view which must be stressed is this: vector spaces may be coordinatized in many different ways (depending on the basis), and a linear transformation between vector spaces can be pulled back to be a linear transformation on the coordinates themselves, and likewise a linear transformation on coordinates can be pushed forward to a linear transformation of the vector spaces. There is much value in this duality of representation.

5. Eigenvectors and eigenvalues

Consider a linear operator $T : V \to V$ . In some cases, we may find non-zero vectors $v$ such that $T v = λ v$ for some (possibly zero) $λ \in k$ . When this happens, $λ$ is the eigenvalue for the eigenvector $v$ . The “eigen” refers to the German word for “own” or “innate,” in the sense that these data tell everything about the action of $T$ .

For instance, suppose that $V$ has a basis $v$ of eigenvectors $v_{i}$ with respective eigenvalues $λ_{i}$ . Then $T (\sum_{i} c_{i} v_{i}) = \sum_{i} c_{i} λ_{i} v_{i}$ , and so the matrix of $T$ with basis $v$ is a diagonal matrix $D$ with entries being the eigenvalues. (And, if we wish to find the matrix when $V = k^{n}$ in the standard basis, by a suitable basis change, the matrix becomes $v D v^{- 1}$ .)

We have already seen one example of an eigenvector, namely non-zero vectors in $\ker T$ , since $T v = 0 v$ for $v \in \ker T$ . An easy class of examples from the above discussion is diagonal matrices. A more interesting example is the differential operator $d / d x$ on differentiable functions $R \to R$ . Since $(d / d x) (e^{λ x}) = λ e^{x}$ , we see that $e^{λ x}$ is an eigenvector with eigenvalue $λ$ .

The analysis of eigenvectors begins with the observation that an eigenvector for an eigenvalue $λ$ is in the kernel of $T - λ I$ (which is a linear operator which takes a vector $v \in V$ to $T v - λ v$ ). We define the eigenspace $E_{λ}$ for an eigenvalue $λ$ to be $\ker (T v - λ v)$ . Since this is a kernel, it is clear that it is indeed a vector subspace of $V$ .

If $λ \neq μ$ and $v \in E_{λ} \cap E_{μ}$ , then $T v = λ v$ and $T v = μ v$ simultaneously, and so $0 = (λ - μ) v$ , which implies $v = 0$ . This means that there are only finitely many eigenvalues for a given linear operator since $E_{λ_{1}} \oplus \dots \oplus E_{λ_{n}}$ is isomorphic to the vector subspace $E_{λ_{1}} + \dots + E_{λ_{n}}$ of $V$ .

What we do not know yet is whether there even are any eigenvalues. Unfortunately, it is not always the case that there are eigenvalues (for instance, a rotation matrix in $R^{2}$ by $π / 2$ ),^[4] but this only depends on the field being algebraically closed (i.e., in which every polynomial has a root). One such field is $C$ , and so we will stick with this field for the rest of the section.

To show that $T$ has an eigenvalue, let $v \in V$ be a nonzero vector in a finite dimensional vector space. Since $V$ is finite dimensional, there is some $n$ such that $v, T v, T^{2} v +, \dots, T^{n} v$ is a minimal set of linearly dependent vectors (where $T^{n}$ represents $T (T^{n - 1} v)$ , and so on). Let $a_{i} \in C$ be such that $a_{0} v + a_{1} T v + \dots + a_{n} T^{n} v = 0$ . Consider the polynomial $p (z) = a_{0} + a_{1} z + \dots + a_{n} z^{n}$ . By the fundamental theorem of arithmetic, $p$ may be factored as $p (z) = a_{n} (z - r_{1}) \dots (z - r_{n})$ for roots $r_{i} \in C$ . Then, $a_{n} (T - r_{1} I) (T - r_{2} I) \dots (T - r_{n} I) v = (a_{0} I + a_{1} T + \dots + a_{n} T^{n}) v = a_{0} v + a_{1} T v + \dots + a_{n} T^{n} v = 0$ , so at least one of the linear operators $T - r_{i} I$ is not injective, hence there is at least one $i$ such that $\ker (T - r_{i} I)$ is nontrivial. Therefore, $T$ has at least one eigenvalue $r_{i}$ .^[5]

Unfortunately again, while this shows that there is an eigenvalue, we are not guaranteed to find a full set of eigenvalues $λ_{1}, \dots, λ_{n}$ such that $E_{λ_{1}} + \dots + E_{λ_{n}} = V$ , but we will take what we can get. An example of this is the matrix

(\begin{matrix} 0 & 1 \\ 0 & 0 \end{matrix})

which has the eigenvector

(1, 0)

with eigenvalue

0

, however the above techinque for

v = (a, b)

gives

T v = (b, 0)

and

T^{2} v = 0

, so there is a linear dependence

T^{2} v = 0

, but

z^{2} = 0

has only

0

as a root, and by rank-nullity, the kernel has dimension

1

At this point, we could discuss the relationship between determinants and eigenvalues, as well as trace and eigenvalues. Determinant and trace are useful tools for their study.

6. Linear recurrence relations

In this section, we discuss an example of using eigenvectors to analyze a linear recurrence relation and determine a closed formula.

Given a sequence of numbers $s = (s_{0}, s_{1}, \dots)$ , a linear recurrence relation of order $k$ is a rule $s_{n} = a_{1} s_{n - 1} + a_{2} s_{n - 2} + \dots + a_{k} s_{n - k})$ with $a_{k}$ constants. The Fibonacci numbers with $f_{n} = f_{n - 1} + f_{n - 2}$ is an example of a linear recurrence relation.

Infinite sequences of numbers form a vector space $V$ , using componentwise addition and scalar multiplication. The zero vector is the sequence $(0, 0, \dots)$ . We define two linear operators on infinite sequences. The first is the identity $I$ where $I s = s$ , and the second is the shift operator $S s = (s_{2}, s_{3}, \dots)$ . We may compose these to produce, for instance, the difference operator $D = S - I$ , where $D s = (s_{1} - s_{0}, s_{2} - s_{1}, \dots)$ .

Perhaps surprisingly, there are a nice class of eigenvectors of $S$ . Since $S (1, λ, λ^{2}, \dots) = (λ, λ^{2}, λ^{3}, \dots) = λ (1, λ, λ^{2}, \dots)$ , we see that $(1, λ, λ^{2}, \dots)$ is an eigenvector with eigenvalue $λ$ .^[6]

Let us look at the Fibonacci sequence again. Notice that what it is saying is that $f_{n} - f_{n - 1} - f_{n - 2} = 0$ , and so $S^{2} f - S f - f = 0$ . This is saying that $f$ is in the kernel of the operator $S^{2} - S - I$ . If we take a bit of a leap of faith, notice that the polynomial $x^{2} - x - 1$ has two roots, $ϕ = (1 + 5^{1 / 2}) / 2$ and $1 - ϕ = (1 - 5^{1 / 2}) / 2$ , and this lets us factor $S^{2} - S - I = (S - ϕ I) (S - (1 - ϕ) I)$ . So, the kernel of $S^{2} - S - I$ includes eigenvectors of $S$ with eigenvalues $ϕ$ and $(1 - ϕ)$ (since the factorization could have gone in either order). A possibility then is that $f_{n} = c_{1} ϕ^{n} + c_{2} (1 - ϕ)^{n}$ , a sum of two eigenvectors. Let us find $c_{1}$ and $c_{2}$ so that $f_{0} = 0$ and $f_{1} = 1$ . In this case, $0 = f_{0} = c_{1} + c_{2}$ and $1 = f_{1} = c_{1} ϕ + c_{2} (1 - ϕ)$ . Solving this system for $c_{1}, c_{2}$ , we have $c_{1} = 5^{- 1 / 2}$ and $c_{2} = - 5^{- 1 / 2}$ , hence a possible closed form

f_{n} = (ϕ^{n} - (1 - ϕ)^{n}) / 5^{1 / 2} .

Indeed, we constructed it so that it was in the kernel of

S^{2} - S - I

and so

f_{n} = f_{n - 1} + f_{n - 2}

, and furthermore

f_{0} = 0

and

f_{1} = 1

. It must be the closed form for the Fibonacci sequence.

Since $| 1 - ϕ | < 1$ , $| (1 - ϕ)^{n} |$ very quickly approaches zero. The limit of $f_{n} / f_{n - 1}$ as $n \to \infty$ is thus $ϕ$ , the golden ratio.

It should be said that there is another way we could have analyzed this system. By the recurrence relation, we have the following relation:

(\begin{matrix} f_{n} \\ f_{n - 1} \end{matrix}) = (\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix}) (\begin{matrix} f_{n - 1} \\ f_{n - 2} \end{matrix})

Let

A

be the

2 \times 2

matrix above. Then, if we wish to find

f_{n}

, we must calculate

A^{n} (1, 0)

and take the first coordinate. Now, if there is an eigenbasis

v

A

(there is), we may diagonalize

A

in that basis as

v D v^{- 1}

. In this form, the dynamical properties of

A

may be studied easily by observing that

A^{n} = (v D v^{- 1})^{n} = v D^{n} v^{- 1}

. One may check that

ϕ

and

1 - ϕ

are eigenvalues of

A

, and the eigenbasis follows.

^[1] Note that if

v

can be written as two different ways

v c

and

v c^{'}

, then so can zero as

v 0

and

v (c - c^{'})

, and this transfers to giving a representation of any vector in two different ways, if it has a representation at all.

^[2] Caution: a spanning set is a tuple, not a set. The terminology is ’istoric.

^[3] Exercise: prove that if

T : V \to W

is an isomorphism then so is

T^{- 1} : W \to V

^[4] The next best thing is Jordan normal form, but we will not discuss this here.

^[5] From Down with Determinants! by Sheldon Axler, 2014.

^[6] There are also generalized eigenvectors, but because we are avoiding Jordan normal form, we will not discuss them.