\( \definecolor{colordef}{RGB}{249,49,84} \definecolor{colorprop}{RGB}{18,102,241} \)
CommeUnJeu · L2 MP

Further linear algebra

⌚ ~67 min ▢ 8 blocks ✓ 17 exercises Prerequisites : Finite-dimensional vector spaces, Matrix representation of linear maps, Determinants
The MPSI chapters Finite-dimensional vector spaces and Matrix representation of linear maps built the working language of linear algebra: dimension, bases, the sum \(F + G\) and the direct sum \(F \oplus G\) of two subspaces, and the matrix of a linear map. This chapter is their direct continuation. It gives three tools, each one a way to break a large object into smaller pieces.
First: the sum and the direct sum of a finite family \(F_1, \dots, F_p\) of subspaces, with the bases adapted to a decomposition \(E = F_1 \oplus \dots \oplus F_p\). Second: block matrices --- a matrix cut into rectangular blocks, on which sum and product are computed block by block, and whose determinant collapses to a product of block determinants when the matrix is block-triangular. Third: the stable subspaces of an endomorphism and the induced endomorphism, the exact bridge between a stable decomposition of \(E\) and a block-diagonal matrix.
The chapter has three sections, and they converge on one headline result: when \(E\) splits as a direct sum of subspaces each stable by an endomorphism \(u\), the matrix of \(u\) in an adapted basis is block-diagonal. Breaking \(u\) into independent pieces this way is exactly what the reduction theory of the next chapters will do.
Standing notation. \(\mathbb{K}\) denotes a subfield of \(\mathbb{C}\) --- in practice \(\mathbb{R}\) or \(\mathbb{C}\). \(E\) is a \(\mathbb{K}\)-vector space, of finite dimension unless stated otherwise. Subspaces are written \(F, G, F_1, \dots, F_p\) with \(p \geq 1\). We write \(\dim E\) for the dimension, \(\operatorname{Vect}(\cdot)\) for the span, \(\mathcal{L}(E)\) for the endomorphisms of \(E\), and \(\operatorname{Mat}_{\mathcal{B}}(u)\) for the matrix of \(u \in \mathcal{L}(E)\) in a basis \(\mathcal{B}\). The notions of sum of two subspaces, direct sum of two subspaces, supplementary subspaces, the Grassmann formula, the determinant and its multiplicativity are those of the prerequisite chapters and are used here without reproof.
I Sum and direct sum of a finite family of subspaces
I.1 Sum of a finite family of subspaces
In Finite-dimensional vector spaces the sum \(F + G\) of two subspaces was defined as the set of all \(x + y\) with \(x \in F\) and \(y \in G\), and shown to be the smallest subspace containing both. Nothing in that idea is special to two subspaces. This subsection generalises it to a finite family \(F_1, \dots, F_p\): the sum is again a subspace, again the smallest one containing every \(F_i\).
Definition — Sum of a finite family of subspaces
Let \(F_1, \dots, F_p\) be subspaces of \(E\). Their sum, denoted \(\sum_{i=1}^p F_i\) or \(F_1 + \dots + F_p\), is the set of all sums of one vector taken from each \(F_i\): $$ \sum_{i=1}^p F_i = \left\{\, x_1 + x_2 + \dots + x_p \ :\ x_i \in F_i \text{ for every } i \,\right\}. $$
Example — Three coordinate lines of \(\mathbb{R}^3\)
Let \((e_1, e_2, e_3)\) be the canonical basis of \(\mathbb{R}^3\) and \(D_i = \operatorname{Vect}(e_i)\). A sum \(x_1 e_1 + x_2 e_2 + x_3 e_3\) with \(x_i \in \mathbb{R}\) is the general vector of \(\mathbb{R}^3\), so \(D_1 + D_2 + D_3 = \mathbb{R}^3\): three lines through the origin, not contained in a common plane, already fill the whole space.
Example — Two planes of \(\mathbb{R}^3\)
Take \(P_1 = \operatorname{Vect}(e_1, e_2)\) and \(P_2 = \operatorname{Vect}(e_2, e_3)\), two distinct planes of \(\mathbb{R}^3\). Any \(x = (a, b, c)\) is \((a, b - c, 0) + (0, c, c)\) with the first term in \(P_1\) and the second in \(P_2\), so \(P_1 + P_2 = \mathbb{R}^3\). Two planes whose intersection is only the line \(\operatorname{Vect}(e_2)\) already span the space.
Proposition — The sum is the smallest subspace containing every term
Let \(F_1, \dots, F_p\) be subspaces of \(E\). Then \(\sum_{i=1}^p F_i\) is \textcolor{colorprop}{a subspace of \(E\)}, it \textcolor{colorprop}{contains every \(F_i\)}, and it is \textcolor{colorprop}{contained in every subspace of \(E\) that contains every \(F_i\)} --- it is their smallest common upper bound.

Write \(S = \sum_{i=1}^p F_i\). The zero vector is \(0 = 0 + \dots + 0\) with each term in \(F_i\), so \(0 \in S\). If \(x = \sum x_i\) and \(y = \sum y_i\) lie in \(S\) (with \(x_i, y_i \in F_i\)) and \(\lambda \in \mathbb{K}\), then \(\lambda x + y = \sum (\lambda x_i + y_i)\), and each \(\lambda x_i + y_i \in F_i\) since \(F_i\) is a subspace; so \(\lambda x + y \in S\). Hence \(S\) is a subspace. Each \(F_j\) is contained in \(S\): a vector \(x_j \in F_j\) equals \(0 + \dots + x_j + \dots + 0 \in S\). Finally, let \(H\) be a subspace containing every \(F_i\); for \(x = \sum x_i \in S\) each \(x_i \in F_i \subset H\), and \(H\) is stable by sum, so \(x \in H\). Thus \(S \subset H\).

A vector of the sum is, by definition, expressible as a sum of pieces taken from the \(F_i\). The figure shows one such decomposition, the pieces laid tip to tail.
Here \(x = x_1 + x_2 + x_3\) with \(x_i \in F_i\).
Method — Show that a vector lies in a sum\(\virgule\) or compute a sum
To prove \(x \in F_1 + \dots + F_p\), exhibit one decomposition \(x = x_1 + \dots + x_p\) with each \(x_i \in F_i\). To prove an equality \(F_1 + \dots + F_p = H\), prove the two inclusions: \(\subset\) holds, once \(H\) is checked to be a subspace containing every \(F_i\), by the « smallest subspace » property; \(\supset\) amounts to decomposing every vector of \(H\).
Skills to practice
  • Computing a sum of subspaces
I.2 Direct sum and its characterisations
For two subspaces, \(F \oplus G\) meant that every vector of \(F + G\) splits in exactly one way. The same demand --- uniqueness of the decomposition --- defines the direct sum of a finite family. The interest is practical: the criteria. Three of them are given here, one through the zero vector, one stepwise, one through dimension; each turns « the sum is direct » into something checkable.
Definition — Direct sum of a finite family
The subspaces \(F_1, \dots, F_p\) are said to be in direct sum when every vector \(x\) of \(\sum_{i=1}^p F_i\) has a unique decomposition \(x = x_1 + \dots + x_p\) with \(x_i \in F_i\). The sum is then written \(\bigoplus_{i=1}^p F_i\) or \(F_1 \oplus \dots \oplus F_p\).
Example — A direct sum and a non-direct sum
In \(\mathbb{R}^3\) with the canonical basis, \(\operatorname{Vect}(e_1) \oplus \operatorname{Vect}(e_2) \oplus \operatorname{Vect}(e_3) = \mathbb{R}^3\): a vector \((a, b, c)\) writes only as \(a e_1 + b e_2 + c e_3\), the coordinates being unique. By contrast, with \(F_1 = \operatorname{Vect}(e_1)\), \(F_2 = \operatorname{Vect}(e_2)\) and \(F_3 = \operatorname{Vect}(e_1 + e_2)\), the vector \(e_1 + e_2\) of \(F_1 + F_2 + F_3\) writes both as \(e_1 + e_2 + 0\) and as \(0 + 0 + (e_1 + e_2)\): the sum \(F_1 + F_2 + F_3\) is not direct.
Proposition — Characterisation by the zero vector
The sum \(\sum_{i=1}^p F_i\) is direct if and only if $$ \textcolor{colorprop}{x_1 + \dots + x_p = 0 \ \text{ with } x_i \in F_i \quad \Longrightarrow \quad x_1 = \dots = x_p = 0.} $$ That is: the only decomposition of the zero vector is the trivial one.

If the sum is direct, the zero vector has a unique decomposition; since \(0 = 0 + \dots + 0\) is one, it is the only one, which is the stated implication. Conversely, assume the implication holds, and let \(x \in \sum F_i\) have two decompositions \(x = \sum x_i = \sum x_i'\) with \(x_i, x_i' \in F_i\). Subtracting, \(\sum (x_i - x_i') = 0\) with \(x_i - x_i' \in F_i\); the implication forces \(x_i - x_i' = 0\), i.e. \(x_i = x_i'\) for every \(i\). So the decomposition of \(x\) is unique, and the sum is direct.

Proposition — Stepwise characterisation
The sum \(\sum_{i=1}^p F_i\) is direct if and only if, for every \(k \in \{2, \dots, p\}\), $$ \textcolor{colorprop}{(F_1 + \dots + F_{k-1}) \cap F_k = \{0\}.} $$ Each step adds one subspace, asking that it meet the sum of the previous ones only at \(0\).

For \(p = 1\) there is no condition to check and the sum \(F_1\) is trivially direct, so assume \(p \geq 2\). Suppose first the sum is direct, and fix \(k\). Let \(y \in (F_1 + \dots + F_{k-1}) \cap F_k\); write \(y = x_1 + \dots + x_{k-1}\) with \(x_i \in F_i\). Then \(x_1 + \dots + x_{k-1} + (-y) + 0 + \dots + 0 = 0\) is a decomposition of \(0\) over \(F_1, \dots, F_p\) (the term \(-y\) in \(F_k\)). By the zero-vector characterisation every term is \(0\), in particular \(y = 0\); so the intersection is \(\{0\}\).
Conversely, assume all the intersection conditions hold, and let \(x_1 + \dots + x_p = 0\) with \(x_i \in F_i\). Then \(x_1 + \dots + x_{p-1} = -x_p\) belongs to \((F_1 + \dots + F_{p-1}) \cap F_p = \{0\}\), so \(x_p = 0\) and \(x_1 + \dots + x_{p-1} = 0\). Repeating with \(k = p-1, p-2, \dots, 2\) peels off \(x_{p-1}, \dots, x_2\) in turn, and \(x_1 = 0\) remains. Hence the zero vector has only the trivial decomposition, and the sum is direct.

A pitfall. The stepwise condition is not the same as « the \(F_i\) are pairwise in direct sum ». Three lines can be pairwise independent yet not in direct sum.
Take \(D_1, D_2, D_3\) three distinct lines of \(\mathbb{R}^2\), with \(D_3 = \operatorname{Vect}(e_1 + e_2)\). Pairwise, any two distinct lines meet only at \(0\). Yet \(e_1 + e_2 + \bigl(-(e_1 + e_2)\bigr) = 0\) is a non-trivial decomposition of \(0\) over \(D_1, D_2, D_3\), so \(D_1 + D_2 + D_3\) is not direct: here \((D_1 + D_2) \cap D_3 = D_3 \neq \{0\}\).
Proposition — Characterisation by dimension
If \(F_1, \dots, F_p\) are finite-dimensional, then $$ \dim\left(\sum_{i=1}^p F_i\right) \leq \sum_{i=1}^p \dim F_i, $$ and \textcolor{colorprop}{equality holds if and only if the sum is direct}.

Consider the map \(\Phi : F_1 \times \dots \times F_p \to \sum_{i=1}^p F_i\), \((x_1, \dots, x_p) \mapsto x_1 + \dots + x_p\). It is linear, and surjective by the definition of the sum. The product space \(F_1 \times \dots \times F_p\) has dimension \(\sum \dim F_i\) (dimension of a product, recalled from Finite-dimensional vector spaces). The rank theorem applied to \(\Phi\) gives $$ \sum_{i=1}^p \dim F_i = \dim \operatorname{Im} \Phi + \dim \operatorname{Ker} \Phi = \dim\left(\sum_{i=1}^p F_i\right) + \dim \operatorname{Ker} \Phi. $$ Hence \(\dim(\sum F_i) \leq \sum \dim F_i\), with equality exactly when \(\dim \operatorname{Ker} \Phi = 0\), i.e. \(\Phi\) injective. And \(\Phi\) is injective if and only if \((x_1, \dots, x_p) \mapsto \sum x_i\) has trivial kernel, i.e. \(\sum x_i = 0 \Rightarrow\) all \(x_i = 0\) --- exactly the zero-vector characterisation of directness.

Method — Prove that a sum is direct
Three standard routes, to pick according to the data:
  • Zero vector. Assume \(x_1 + \dots + x_p = 0\) with \(x_i \in F_i\) and deduce that every \(x_i = 0\). The default method, always available.
  • Stepwise. Check \((F_1 + \dots + F_{k-1}) \cap F_k = \{0\}\) for \(k = 2, \dots, p\). Convenient when the partial sums are easy to describe.
  • Dimension. In finite dimension, once \(\dim(\sum F_i) = \sum \dim F_i\) is established, the sum is direct. The fastest route when all dimensions are known.
Never replace these by « pairwise direct sum » --- the pitfall above shows it is strictly weaker.
Skills to practice
  • Proving a sum is direct
I.3 Decomposition of a space and adapted bases
When the direct sum of the \(F_i\) fills the whole space, \(E\) is said to be decomposed. The MPSI chapter built, for \(E = F \oplus G\), a basis « adapted » to the splitting, by concatenating a basis of \(F\) with a basis of \(G\). The same construction works for any number of summands, and gives the criterion that connects a decomposition to a basis.
Definition — Decomposition and adapted basis
One says that \(E\) is decomposed into the direct sum of \(F_1, \dots, F_p\) when \(E = \bigoplus_{i=1}^p F_i\). A basis \(\mathcal{B}\) of \(E\) is adapted to this decomposition when it is the concatenation, in the order \(1, \dots, p\), of a basis \(\mathcal{B}_1\) of \(F_1\), then a basis \(\mathcal{B}_2\) of \(F_2\), \dots, then a basis \(\mathcal{B}_p\) of \(F_p\).
Example — The canonical basis of \(\mathbb{K}^n\)
Let \(D_i = \operatorname{Vect}(e_i)\) be the \(i\)-th coordinate line of \(\mathbb{K}^n\). Then \(\mathbb{K}^n = D_1 \oplus \dots \oplus D_n\), and the canonical basis \((e_1, \dots, e_n)\) --- the concatenation of the one-vector bases \((e_i)\) of the \(D_i\) --- is adapted to this decomposition.
Example — Polynomials by degree
In \(\mathbb{K}_n[X]\), set \(L_k = \operatorname{Vect}(X^k)\) for \(k = 0, \dots, n\). A polynomial of degree at most \(n\) writes uniquely as \(a_0 X^0 + a_1 X^1 + \dots + a_n X^n\), so \(\mathbb{K}_n[X] = L_0 \oplus L_1 \oplus \dots \oplus L_n\), and the monomial basis \((1, X, \dots, X^n)\) is adapted to this decomposition.
Proposition — Adapted-basis criterion
Let \(F_1, \dots, F_p\) be finite-dimensional subspaces of \(E\). Then \(E = \bigoplus_{i=1}^p F_i\) if and only if, for one --- equivalently, for every --- choice of a basis \(\mathcal{B}_i\) of each \(F_i\), the concatenation \(\mathcal{B}_1, \dots, \mathcal{B}_p\) (in this order) is \textcolor{colorprop}{a basis of \(E\)}.

Fix a basis \(\mathcal{B}_i\) of each \(F_i\) and let \(\mathcal{C}\) be their concatenation.
  • If \(E = \bigoplus F_i\). The family \(\mathcal{C}\) generates \(E\), since \(E = \sum F_i\) and each vector of \(F_i\) is a combination of \(\mathcal{B}_i\). It is free: a vanishing combination of \(\mathcal{C}\) groups, by the \(F_i\), into \(y_1 + \dots + y_p = 0\) with \(y_i \in F_i\) (the \(F_i\)-part of the combination); directness forces each \(y_i = 0\), and freeness of \(\mathcal{B}_i\) then forces every coefficient inside \(\mathcal{B}_i\) to vanish. So \(\mathcal{C}\) is a basis of \(E\).
  • If \(\mathcal{C}\) is a basis of \(E\). Then \(E\) is spanned by \(\mathcal{C}\), hence \(E = \sum F_i\). To see the sum is direct, let \(y_1 + \dots + y_p = 0\) with \(y_i \in F_i\); expanding each \(y_i\) on \(\mathcal{B}_i\) turns this into a vanishing combination of \(\mathcal{C}\), so all coefficients are \(0\) by freeness, hence every \(y_i = 0\). The zero-vector characterisation gives \(E = \bigoplus F_i\).
Since the argument never used which bases \(\mathcal{B}_i\) were chosen, the « one choice » and « every choice » versions are equivalent.

Remark (optional reading). When \(E = \bigoplus_{i=1}^p F_i\), the regrouping \(E = F_i \oplus \bigl(\bigoplus_{j \neq i} F_j\bigr)\) holds for each \(i\), so the projection \(p_i\) onto \(F_i\) parallel to \(\bigoplus_{j \neq i} F_j\) is well defined. These projectors satisfy \(p_i^2 = p_i\) and \(p_1 + \dots + p_p = \operatorname{id}_E\). This is a side note; the chapter does not develop it further.
Method — Construct a basis adapted to a decomposition
Given \(E = \bigoplus_{i=1}^p F_i\): choose a basis of each \(F_i\) separately --- the summands are independent, no compatibility to respect --- then write them one after another in the order \(1, \dots, p\). The result is an adapted basis, and by the criterion --- \(E\) being finite-dimensional --- its cardinal is \(\dim F_1 + \dots + \dim F_p = \dim E\).
The figure shows the smallest non-trivial case: \(\mathbb{R}^3\) split as a line plus a plane, with an adapted basis --- one vector spanning the line, two spanning the plane.
\(\mathbb{R}^3 = D \oplus P\): \((e_1, e_2)\) is a basis of the plane \(P\), \((e_3)\) a basis of the line \(D\), and \((e_1, e_2, e_3)\) is a basis of \(\mathbb{R}^3\) adapted to the decomposition.
Skills to practice
  • Forming an adapted basis
II Block matrices
II.1 Block partition and block operations
A matrix can be cut by horizontal and vertical lines into a grid of smaller rectangular matrices, its blocks. The chapter Matrix representation of linear maps already used this cutting in the \(2 \times 2\) case, with the block-product rule. This subsection states the partition and the operations --- sum, transpose, product --- for a general grid: each is computed block by block, exactly as if the blocks were scalars, the order being kept for the product.
Definition — Block partition
Let \(A \in \mathcal{M}_{n,m}(\mathbb{K})\). A block partition of \(A\) is a choice of a splitting of the \(n\) rows into consecutive groups of positive sizes \(n_1, \dots, n_r\) (with \(n_1 + \dots + n_r = n\)) and of the \(m\) columns into consecutive groups of positive sizes \(m_1, \dots, m_s\). The resulting submatrices \(A_{ij} \in \mathcal{M}_{n_i, m_j}(\mathbb{K})\) are the blocks, and one writes \(A = (A_{ij})_{1 \leq i \leq r,\, 1 \leq j \leq s}\). Two partitioned matrices are compatible for the sum when they have the same row and column groupings, and compatible for the product when the column grouping of the left factor equals the row grouping of the right factor.
Example — A matrix partitioned two ways
The same \(3 \times 4\) matrix $$ A = \begin{pmatrix} 1 & 0 & 2 & 3 \\ 0 & 1 & 4 & 5 \\ 0 & 0 & 6 & 7 \end{pmatrix} $$ can be cut with rows \(2 + 1\) and columns \(2 + 2\), giving four blocks with \(A_{11} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\); or with rows \(3\) and columns \(1+1+1+1\), giving its four columns. The partition is a reading grid, not a property of \(A\): one chooses it to suit the computation.
Definition — Block-triangular and block-diagonal matrices
Let \(A \in \mathcal{M}_n(\mathbb{K})\) be a square matrix partitioned with identical row and column groupings \(n_1, \dots, n_r\), so that every diagonal block \(A_{ii}\) is square of size \(n_i\). Then \(A\) is block-upper-triangular if \(A_{ij} = 0\) for \(i > j\), block-lower-triangular if \(A_{ij} = 0\) for \(i < j\), and block-diagonal if \(A_{ij} = 0\) for \(i \neq j\). A block-diagonal matrix with diagonal blocks \(A_1, \dots, A_r\) is written \(\operatorname{Diag}(A_1, \dots, A_r)\).
Example — A block-triangular and a block-diagonal matrix
With row and column groupings \(2 + 1\), the matrix on the left is block-upper-triangular and the one on the right is block-diagonal: $$ \begin{pmatrix} 1 & 2 & 5 \\ 3 & 4 & 6 \\ 0 & 0 & 7 \end{pmatrix}, \qquad \begin{pmatrix} 1 & 2 & 0 \\ 3 & 4 & 0 \\ 0 & 0 & 7 \end{pmatrix} = \operatorname{Diag}\!\left( \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}, (7) \right). $$ The lower-left block, of size \(1 \times 2\), is zero in both; the diagonal blocks have sizes \(2\) and \(1\).
Proposition — Block operations
Let \(A = (A_{ij})\) and \(B = (B_{ij})\) be partitioned matrices.
  1. If \(A\) and \(B\) are compatible for the sum, then \(A + B\) has block \((i,j)\) equal to \(A_{ij} + B_{ij}\), and \(\lambda A\) has block \((i,j)\) equal to \(\lambda A_{ij}\).
  2. The transpose \(A^{\top}\) has block \((i,j)\) equal to \((A_{ji})^{\top}\) (the block grid is transposed, and each block transposed).
  3. If \(A\) and \(B\) are compatible for the product, then \(AB\) has block \((i,j)\) equal to \(\textcolor{colorprop}{\sum_{k} A_{ik} B_{kj}}\), the blocks multiplied in this order.

Parts (1) and (2) are immediate: a block of \(A + B\), of \(\lambda A\) or of \(A^{\top}\) is read off entry by entry, and each entry obeys the scalar rule, so the block obeys the matrix rule.
For (3), write the column grouping of \(A\) (equal to the row grouping of \(B\)) as \(p_1, \dots, p_t\). Fix a scalar entry of \(AB\), say in row-block \(i\) and column-block \(j\), at local positions \((s, u)\). Its value is \(\sum_{\ell} a_{s\ell} b_{\ell u}\), the sum over all columns \(\ell\) of \(A\). Split that index set according to the grouping: \(\ell\) runs over block \(1\), then block \(2\), \dots, then block \(t\). The part of the sum over block \(k\) is \(\sum_{\ell \in \text{block } k} a_{s\ell} b_{\ell u}\), which is exactly the \((s, u)\) entry of the matrix product \(A_{ik} B_{kj}\). Summing over \(k\), the \((s,u)\) entry of block \((i,j)\) of \(AB\) equals the \((s,u)\) entry of \(\sum_k A_{ik} B_{kj}\). As this holds for every local position, block \((i,j)\) of \(AB\) is \(\sum_k A_{ik} B_{kj}\).

Proposition — Product of block-triangular matrices
Let \(A\) and \(B\) be two square matrices sharing the same partition into square diagonal blocks of sizes \(n_1, \dots, n_r\).
  1. If \(A\) and \(B\) are block-upper-triangular, so is \(AB\), and its diagonal blocks are \(A_{ii} B_{ii}\). The same holds with « lower » in place of « upper ».
  2. In particular \(\operatorname{Diag}(A_1, \dots, A_r) \, \operatorname{Diag}(B_1, \dots, B_r) = \operatorname{Diag}(A_1 B_1, \dots, A_r B_r)\).

By the block-product rule, block \((i,j)\) of \(AB\) is \(\sum_k A_{ik} B_{kj}\). Suppose \(A\) and \(B\) block-upper-triangular: \(A_{ik} = 0\) when \(i > k\), and \(B_{kj} = 0\) when \(k > j\). Take \(i > j\). In the sum \(\sum_k A_{ik} B_{kj}\), a term with \(k < i\) has \(A_{ik} = 0\); a term with \(k \geq i\) has \(k \geq i > j\), hence \(k > j\) and \(B_{kj} = 0\). Every term vanishes, so block \((i,j)\) of \(AB\) is \(0\): \(AB\) is block-upper-triangular. For \(i = j\), the only possibly non-zero term is \(k = i\), giving the diagonal block \(A_{ii} B_{ii}\). The lower-triangular case is identical with the inequalities reversed. Part (2) is the special case where all off-diagonal blocks of \(A\) and \(B\) are zero.

Example — A block-triangular inverse
Let \(M = \begin{pmatrix} A & B \\ 0 & D \end{pmatrix}\) be block-upper-triangular with \(A\) and \(D\) invertible square blocks. Then \(M\) is invertible, with $$ M^{-1} = \begin{pmatrix} A^{-1} & -A^{-1} B D^{-1} \\ 0 & D^{-1} \end{pmatrix}. $$ Indeed the block product of \(M\) with this matrix gives \(\begin{pmatrix} A A^{-1} & -A A^{-1} B D^{-1} + B D^{-1} \\ 0 & D D^{-1} \end{pmatrix} = \begin{pmatrix} I & 0 \\ 0 & I \end{pmatrix}\), and likewise on the other side. (That \(A\) and \(D\) invertible is also necessary is shown below.)
Method — Compute with block matrices
Choose the partition so that the blocks are recognisable matrices (identity, zero, a known invertible block). Then sum, transpose and multiply block by block, treating blocks as scalars but keeping the left-to-right order in every product \(A_{ik} B_{kj}\). For an inverse, posit a block matrix of the same shape with unknown blocks, multiply, and solve the resulting block equations.
The figure fixes the vocabulary: a block partition is a grid, the block \(A_{ij}\) sitting at row-group \(i\) and column-group \(j\).
Here the rows are split \(n_1 + n_2\) and the columns \(m_1 + m_2 + m_3\).
Skills to practice
  • Computing with block matrices
II.2 Determinant of a block-triangular matrix
The determinant and its multiplicativity \(\det(XY) = \det X \det Y\) are recalled from Déterminants. When a square matrix is block-triangular --- a whole corner of off-diagonal blocks vanishing --- its determinant collapses to a product over the diagonal blocks --- the block analogue of « the determinant of a triangular matrix is the product of its diagonal entries ».
Theorem — Determinant of a block-triangular matrix
Let \(M \in \mathcal{M}_n(\mathbb{K})\) be block-triangular (upper or lower) with square diagonal blocks \(A_1, \dots, A_r\). Then $$ \textcolor{colorprop}{\det M = \det A_1 \times \det A_2 \times \dots \times \det A_r.} $$

Treat the block-upper-triangular case; the lower case then follows from \(\det M = \det M^{\top}\), since transposing turns a block-lower- into a block-upper-triangular matrix with the same diagonal blocks transposed, and \(\det A_k^{\top} = \det A_k\).
  • Base lemma. For \(\begin{pmatrix} I & C \\ 0 & B \end{pmatrix}\) with \(I\) an identity block, expand the determinant by cofactors along the first column: its only non-zero entry is the \(1\) in position \((1,1)\), whose cofactor is the determinant of the same shape with one fewer identity row and column. Repeating across the identity block, \(\det \begin{pmatrix} I & C \\ 0 & B \end{pmatrix} = \det B\). Likewise, for \(\begin{pmatrix} A & 0 \\ 0 & I \end{pmatrix}\), repeated cofactor expansion along the last column --- whose only non-zero entry is the bottom \(1\) --- gives \(\det \begin{pmatrix} A & 0 \\ 0 & I \end{pmatrix} = \det A\).
  • Two diagonal blocks. For \(M = \begin{pmatrix} A & C \\ 0 & B \end{pmatrix}\), the block product gives the factorisation \(M = \begin{pmatrix} I & C \\ 0 & B \end{pmatrix} \begin{pmatrix} A & 0 \\ 0 & I \end{pmatrix}\). By multiplicativity and the base lemma, \(\det M = \det B \times \det A = \det A \times \det B\).
  • General case. Group the diagonal blocks as \(A_1\) and the block-triangular matrix \(M'\) formed by \(A_2, \dots, A_r\): then \(M\) is block-upper-triangular with the two diagonal blocks \(A_1\) and \(M'\), so \(\det M = \det A_1 \times \det M'\). Induction on \(r\) gives \(\det M' = \det A_2 \times \dots \times \det A_r\), hence the formula.

Example — A numerical block-triangular determinant
The matrix $$ M = \begin{pmatrix} 2 & 1 & 5 & 9 \\ 3 & 2 & 7 & 4 \\ 0 & 0 & 1 & 6 \\ 0 & 0 & 2 & 1 \end{pmatrix} $$ is block-upper-triangular for the grouping \(2 + 2\), with diagonal blocks \(A_1 = \begin{pmatrix} 2 & 1 \\ 3 & 2 \end{pmatrix}\) and \(A_2 = \begin{pmatrix} 1 & 6 \\ 2 & 1 \end{pmatrix}\). Hence \(\det M = \det A_1 \times \det A_2 = (4 - 3)(1 - 12) = 1 \times (-11) = -11\), with no \(4 \times 4\) expansion.
Proposition — Invertibility of a block-triangular matrix
A block-triangular matrix \(M\) with square diagonal blocks \(A_1, \dots, A_r\) is invertible if and only if \textcolor{colorprop}{every diagonal block \(A_k\) is invertible}.

By the invertibility criterion (Déterminants), \(M\) is invertible if and only if \(\det M \neq 0\). The Theorem gives \(\det M = \det A_1 \times \dots \times \det A_r\), a product of scalars; it is non-zero exactly when every factor \(\det A_k\) is non-zero, i.e. when every \(A_k\) is invertible.

Example — The determinant of a symmetric block matrix
Let \(A, B \in \mathcal{M}_n(\mathbb{K})\). We evaluate \(\det \begin{pmatrix} A & B \\ B & A \end{pmatrix}\) by multiplying on the left by \(\begin{pmatrix} I & I \\ 0 & I \end{pmatrix}\), then on the right by \(\begin{pmatrix} I & -I \\ 0 & I \end{pmatrix}\): $$ \begin{pmatrix} I & I \\ 0 & I \end{pmatrix} \begin{pmatrix} A & B \\ B & A \end{pmatrix} \begin{pmatrix} I & -I \\ 0 & I \end{pmatrix} = \begin{pmatrix} A+B & A+B \\ B & A \end{pmatrix} \begin{pmatrix} I & -I \\ 0 & I \end{pmatrix} = \begin{pmatrix} A+B & 0 \\ B & A-B \end{pmatrix}. $$ Both block-triangular factors have determinant \(1\) (their diagonal blocks are identities), so by multiplicativity and the Theorem, $$ \det \begin{pmatrix} A & B \\ B & A \end{pmatrix} = \det \begin{pmatrix} A+B & 0 \\ B & A-B \end{pmatrix} = \det(A+B)\,\det(A-B). $$
Method — Compute a determinant by block-triangularisation
Faced with a determinant of a partitioned matrix: if one off-diagonal block is already zero, apply the Theorem directly. Otherwise, when the block structure allows it, multiply left and right by block-triangular matrices of determinant \(1\) (diagonal blocks all identities) to clear an off-diagonal block, then apply the Theorem --- the symmetric block matrix above is the model case. Multiplicativity guarantees the determinant is unchanged by such multiplications.
The figure summarises the shape the Theorem exploits: below the diagonal, only zeros; on the diagonal, the square blocks whose determinants multiply.
A block-upper-triangular matrix: the \(\ast\) blocks are arbitrary, the lower-left blocks are zero, and \(\det M\) is the product of \(\det A_1, \det A_2, \det A_3\).
Skills to practice
  • Computing a block-triangular determinant
III Stable subspaces and induced endomorphism
III.1 Stable subspaces
Let \(u\) be an endomorphism of \(E\). A subspace on which \(u\) « stays inside » --- sends every vector of the subspace back into the subspace --- can be studied on its own, without the rest of \(E\). Such a subspace is called stable. This subsection defines stability and gives two ways to recognise it.
Definition — Stable subspace
Let \(u \in \mathcal{L}(E)\) and \(F\) a subspace of \(E\). One says that \(F\) is stable by \(u\) --- or that \(u\) stabilises \(F\) --- when \(u(F) \subset F\), that is, when \(u(x) \in F\) for every \(x \in F\).
Example — Subspaces always stable
For any \(u \in \mathcal{L}(E)\), the subspaces \(\{0\}\) and \(E\) are stable, since \(u(\{0\}) = \{0\}\) and \(u(E) \subset E\). So are \(\operatorname{Ker} u\) --- if \(u(x) = 0\) then \(u(u(x)) = u(0) = 0\), so \(u(x) \in \operatorname{Ker} u\) --- and \(\operatorname{Im} u\) --- \(u(\operatorname{Im} u) \subset \operatorname{Im} u\) by definition of the image.
Example — Stable lines and a rotation
For \(x \neq 0\), the line \(\mathbb{K}x\) is stable by \(u\) if and only if \(u(x) \in \mathbb{K}x\), i.e. \(u(x) = \lambda x\) for some scalar \(\lambda\): a stable line is a direction that \(u\) merely rescales. Consequently the rotation of \(\mathbb{R}^2\) by an angle \(\theta \not\equiv 0 \ [\pi]\) stabilises no line: a rotation by such an angle sends no non-zero vector to a multiple of itself, so its only stable subspaces are \(\{0\}\) and \(\mathbb{R}^2\).
Proposition — Stability via a generating family
Let \(u \in \mathcal{L}(E)\) and let \(F = \operatorname{Vect}(e_1, \dots, e_d)\) be a subspace generated by the family \((e_1, \dots, e_d)\). Then $$ \textcolor{colorprop}{F \text{ is stable by } u \iff u(e_k) \in F \text{ for every } k.} $$

If \(F\) is stable, then each \(e_k \in F\) gives \(u(e_k) \in F\). Conversely, suppose \(u(e_k) \in F\) for every \(k\). A vector \(x \in F\) writes \(x = \sum_k \lambda_k e_k\), so by linearity \(u(x) = \sum_k \lambda_k u(e_k)\), a linear combination of the vectors \(u(e_k)\), all in \(F\); since \(F\) is a subspace, \(u(x) \in F\). Hence \(u(F) \subset F\).

Proposition — Commuting endomorphisms preserve kernels and images
Let \(u, v \in \mathcal{L}(E)\) with \(u \circ v = v \circ u\). Then \(\operatorname{Ker} v\) and \(\operatorname{Im} v\) are \textcolor{colorprop}{stable by \(u\)}.

  • Kernel. Let \(x \in \operatorname{Ker} v\), so \(v(x) = 0\). Then \(v(u(x)) = u(v(x)) = u(0) = 0\), so \(u(x) \in \operatorname{Ker} v\). Hence \(u(\operatorname{Ker} v) \subset \operatorname{Ker} v\).
  • Image. \(u(\operatorname{Im} v) = \operatorname{Im}(u \circ v) = \operatorname{Im}(v \circ u) \subset \operatorname{Im} v\), the middle equality being the commutation \(u \circ v = v \circ u\) and the last inclusion holding because \(\operatorname{Im}(v \circ u) = v(u(E)) \subset v(E) = \operatorname{Im} v\).

Example — A kernel stabilised by a commuting endomorphism
Fix \(u \in \mathcal{L}(E)\) and a scalar \(\lambda\), and set \(v = u - \lambda \operatorname{id}_E\). Since \(u\) commutes with \(v\) (both are polynomials in \(u\)), the previous Proposition applies: \(\operatorname{Ker}(u - \lambda \operatorname{id}_E)\) is stable by \(u\). More generally it is stable by every endomorphism commuting with \(u\). This kernel will reappear, named, in the next chapter.
Method — Prove that a subspace is stable
To prove \(F\) stable by \(u\): if \(F\) is given by a generating family, check that the image of each generator lies in \(F\) --- by the Proposition this suffices. If \(F\) is a kernel or an image, look for an endomorphism that commutes with the relevant map. To prove \(F\) not stable, exhibit one vector \(x \in F\) with \(u(x) \notin F\).
The figure shows stability geometrically: a plane \(F\) stable by \(u\) keeps every one of its vectors inside \(F\) after applying \(u\).
Both \(x\) and \(u(x)\) lie in the plane \(F\): applying \(u\) never leaves \(F\).
Skills to practice
  • Proving a subspace is stable
III.2 Induced endomorphism and block-triangular form
When \(F\) is stable by \(u\), restricting \(u\) to \(F\) produces a map from \(F\) to \(F\) --- an endomorphism of \(F\) in its own right, the induced endomorphism. Choosing a basis of \(E\) that starts with a basis of \(F\) then makes stability visible in the matrix: it becomes a zero block.
Definition — Induced endomorphism
Let \(u \in \mathcal{L}(E)\) and let \(F\) be a subspace stable by \(u\). The endomorphism induced by \(u\) on \(F\), denoted \(u_F\), is the map \(u_F : F \to F\), \(x \mapsto u(x)\). It is well defined --- because \(u(F) \subset F\), the value \(u(x)\) lies in \(F\) --- and it is an endomorphism of \(F\).
Example — Two induced endomorphisms
Take \(u \in \mathcal{L}(E)\). On the stable subspace \(\operatorname{Ker} u\), the induced endomorphism \(u_{\operatorname{Ker} u}\) is the zero map of \(\operatorname{Ker} u\), since \(u\) vanishes there. On a stable line \(\mathbb{K}x\) with \(u(x) = \lambda x\), the induced endomorphism is the homothety of ratio \(\lambda\) of that line.
Example — Induced endomorphism versus restriction
The induced endomorphism \(u_F\) and the restriction \(u_{|F}\) carry out the same assignment \(x \mapsto u(x)\) on \(F\); what distinguishes them is the declared target set. The restriction \(u_{|F} : F \to E\) has \(E\) as target set; it is a linear map from \(F\) to \(E\). The induced endomorphism \(u_F : F \to F\) is that same map corestricted to \(F\) as target set --- legitimate only because \(F\) is stable --- and it is an endomorphism, so it can be composed with itself, has a matrix in a basis of \(F\), and so on. The two maps take the same values; they differ in their declared target, and only \(u_F\) is an endomorphism.
Theorem — Matrix translation of stability
Let \(u \in \mathcal{L}(E)\), let \(\mathcal{B} = (e_1, \dots, e_n)\) be a basis of \(E\), and let \(F = \operatorname{Vect}(e_1, \dots, e_r)\) with \(1 \le r \le n-1\). Write the matrix in blocks for the grouping \(r + (n-r)\): $$ \operatorname{Mat}_{\mathcal{B}}(u) = \begin{pmatrix} A & C \\ B & D \end{pmatrix}. $$ Then \(F\) is \textcolor{colorprop}{stable by \(u\) if and only if \(B = 0\)}, and in that case \(A = \operatorname{Mat}_{(e_1, \dots, e_r)}(u_F)\) --- the top-left block is the matrix of the induced endomorphism.

A vector \(x \in F\) has coordinate column \(\begin{pmatrix} X \\ 0 \end{pmatrix}\) (the last \(n-r\) coordinates vanish, since \(x\) is a combination of \(e_1, \dots, e_r\)). The block product gives the coordinate column of \(u(x)\): $$ \operatorname{Mat}_{\mathcal{B}}(u(x)) = \begin{pmatrix} A & C \\ B & D \end{pmatrix}\begin{pmatrix} X \\ 0 \end{pmatrix} = \begin{pmatrix} AX \\ BX \end{pmatrix}. $$ Now \(u(x) \in F\) means its last \(n-r\) coordinates vanish, i.e. \(BX = 0\). So \(F\) is stable --- \(u(x) \in F\) for all \(x \in F\) --- if and only if \(BX = 0\) for every \(X \in \mathbb{K}^r\), which holds if and only if \(B = 0\). When \(B = 0\), the column of \(u(x)\) is \(\begin{pmatrix} AX \\ 0 \end{pmatrix}\), so the coordinates of \(u_F(x)\) in \((e_1, \dots, e_r)\) are \(AX\); as \(X\) ranges over the coordinate columns of \(F\), this says \(A = \operatorname{Mat}_{(e_1, \dots, e_r)}(u_F)\).

Example — Reading stability off a matrix
Let \(u \in \mathcal{L}(\mathbb{R}^3)\) have matrix \(\begin{pmatrix} 2 & 1 & 5 \\ 0 & 3 & 4 \\ 0 & 0 & 7 \end{pmatrix}\) in the canonical basis. For \(F = \operatorname{Vect}(e_1, e_2)\) the lower-left block is \(\begin{pmatrix} 0 & 0 \end{pmatrix}\), zero, so \(F\) is stable, and the induced endomorphism has matrix \(\begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}\) in \((e_1, e_2)\). For \(G = \operatorname{Vect}(e_1)\) the column of \(u(e_1)\) is \((2, 0, 0)\), again in \(G\), so \(G\) is stable too.
Method — Use a stable subspace to triangularise by blocks
Given a subspace \(F\) stable by \(u\): complete a basis of \(F\) into a basis \(\mathcal{B}\) of \(E\). In \(\mathcal{B}\), the matrix of \(u\) is automatically block-upper-triangular, \(\begin{pmatrix} A & C \\ 0 & D \end{pmatrix}\), with \(A\) the matrix of the induced endomorphism \(u_F\). This is the standard first step in simplifying the matrix of an endomorphism.
The figure records the shape: a stable \(F\) spanned by the first basis vectors forces the lower-left block to vanish, leaving \(\operatorname{Mat}(u_F)\) on top.
\(F = \operatorname{Vect}(e_1, \dots, e_r)\) stable: the bottom-left block is zero.
Skills to practice
  • Identifying an induced endomorphism
III.3 Decomposition into stable subspaces
The chapter now closes its loop. A single stable subspace turned the matrix block-triangular; a whole decomposition of \(E\) into stable subspaces turns it block-diagonal. Each diagonal block is the matrix of an induced endomorphism, and the endomorphism \(u\) has been split into independent pieces, one per summand.
Theorem — Block-diagonal form on a stable decomposition
Let \(u \in \mathcal{L}(E)\), assume \(E = \bigoplus_{i=1}^p F_i\) with each \(F_i \neq \{0\}\), and fix a basis \(\mathcal{B}\) adapted to this decomposition. Then \textcolor{colorprop}{every \(F_i\) is stable by \(u\) if and only if \(\operatorname{Mat}_{\mathcal{B}}(u)\) is block-diagonal}, \(\operatorname{Mat}_{\mathcal{B}}(u) = \operatorname{Diag}(A_1, \dots, A_p)\) for the grouping \(\dim F_1, \dots, \dim F_p\). In that case each diagonal block \(A_i\) is the matrix of the endomorphism induced by \(u\) on \(F_i\), in the basis of \(F_i\) extracted from \(\mathcal{B}\).

The adapted basis \(\mathcal{B}\) groups its vectors into \(p\) consecutive runs, the \(i\)-th run being a basis of \(F_i\); partition \(\operatorname{Mat}_{\mathcal{B}}(u)\) into \(p \times p\) blocks accordingly, the block \((i, j)\) having size \(\dim F_i \times \dim F_j\). The column-block \(j\) holds the coordinate columns of the vectors \(u(e)\) for \(e\) in the basis of \(F_j\). By the generating-family criterion, \(F_j\) is stable by \(u\) exactly when each such \(u(e)\) lies in \(F_j\), i.e. exactly when the coordinate columns have non-zero entries only in row-block \(j\) --- that is, every off-diagonal block in column-block \(j\) vanishes. Therefore all \(F_j\) are stable if and only if every off-diagonal block vanishes, i.e. the matrix is block-diagonal. When this holds, the diagonal block \(A_i\) collects the coordinates, in the basis of \(F_i\), of the images \(u(e) = u_{F_i}(e)\) of the basis vectors of \(F_i\); so \(A_i\) is the matrix of \(u_{F_i}\) in that basis. This re-derives, in block-coordinate form, the same stability / zero-block correspondence as the previous subsection.

Example — Two stable planes in \(\mathbb{R}^4\)
Let \(u \in \mathcal{L}(\mathbb{R}^4)\) have matrix \(\begin{pmatrix} 1 & 2 & 0 & 0 \\ 3 & 4 & 0 & 0 \\ 0 & 0 & 5 & 6 \\ 0 & 0 & 7 & 8 \end{pmatrix}\) in the canonical basis. The planes \(F_1 = \operatorname{Vect}(e_1, e_2)\) and \(F_2 = \operatorname{Vect}(e_3, e_4)\) satisfy \(\mathbb{R}^4 = F_1 \oplus F_2\), and the matrix is block-diagonal for the grouping \(2 + 2\): both planes are stable, the induced endomorphism on \(F_1\) has matrix \(\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\), and the one on \(F_2\) has matrix \(\begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix}\).
Example — A stable line and a stable plane
Suppose \(u \in \mathcal{L}(\mathbb{R}^3)\) satisfies \(u(e_1) = 2 e_1\) and stabilises the plane \(P = \operatorname{Vect}(e_2, e_3)\). Then \(\mathbb{R}^3 = \operatorname{Vect}(e_1) \oplus P\) is a decomposition into stable subspaces, and in the adapted basis \((e_1, e_2, e_3)\) the matrix of \(u\) is block-diagonal \(\begin{pmatrix} 2 & 0 & 0 \\ 0 & \ast & \ast \\ 0 & \ast & \ast \end{pmatrix}\): a \(1 \times 1\) block \((2)\) for the line, a \(2 \times 2\) block for the plane.
Method — Block-diagonalise an endomorphism
To put the matrix of \(u\) in block-diagonal form: find a decomposition \(E = \bigoplus F_i\) into subspaces each stable by \(u\), build a basis adapted to it, and read off the diagonal blocks as the matrices of the induced endomorphisms \(u_{F_i}\). The whole effort lies in finding the stable decomposition --- the goal of the reduction theory that follows.
The figure shows the end state: \(E\) split into \(p\) stable pieces, the matrix of \(u\) a diagonal of blocks with zeros everywhere else.
\(\operatorname{Mat}_{\mathcal{B}}(u) = \operatorname{Diag}(A_1, A_2, A_3)\) with \(A_i = \operatorname{Mat}(u_{F_i})\): every off-diagonal block vanishes.
Going further
This chapter has built the machinery but left the central question open: how does one find a decomposition of \(E\) into subspaces stable by a given endomorphism \(u\)? The block-diagonal form is the prize; the stable decomposition is the work. The next chapters answer this --- through eigenvalues, eigenspaces, the characteristic and minimal polynomials --- and the headline result there, diagonalisability, is exactly the case where \(E\) decomposes into stable lines, making every diagonal block a single scalar.
Skills to practice
  • Decomposing into stable subspaces