CommeUnJeu · L1 PCSI

Matrix calculus

⌚ ~107 min ▢ 13 blocks ✓ 45 exercises ➣ Prerequisites : Matrices, Algebraic computations

A matrix is a rectangular array of scalars. As an object, it organizes coefficients of a system of linear equations, the components of a finite list of vectors, or the action of a linear map on a basis. This chapter formalizes the algebraic structure of matrices: $M_{n,p}(\mathbb{K})$ is a $\mathbb{K}$-vector space, the square case $M_n(\mathbb{K})$ is a (non-commutative) ring, invertible square matrices form the linear group $\mathrm{GL}_n(\mathbb{K})$. Throughout the chapter, $\mathbb{K} \in \{\mathbb{R}, \mathbb{C}\}$ and $n, p, q, r$ denote positive integers.
The plan has six sections. We first introduce $M_{n,p}(\mathbb{K})$ with addition and scalar multiplication, and decompose every matrix on the canonical family of elementary matrices $E_{ij}$. We next define the matrix product, prove bilinearity and associativity, identify the identity matrix $I_n$, and expose the three pathologies of matrix multiplication absent from scalar multiplication: non-commutativity, zero divisors, nilpotent elements. We then present the transpose and the symmetric / antisymmetric subspaces, followed by the diagonal and triangular families, stable under linear combination and product. We define invertibility, the linear group $\mathrm{GL}_n(\mathbb{K})$, and prove the closed-form $2 \times 2$ inverse formula. We close with the identification of elementary row and column operations as multiplications by invertible matrices, and use this to compute inverses by Gauss-Jordan elimination and to characterize triangular invertibility.
By convention, technicality is excluded in two places --- the inversion algorithm and the elementary-operations-via-pivot calculation. Cours examples respect this: they stay $2 \times 2$ or simple $3 \times 3$ with small integer entries; the exo file carries heavier drill. The vector-space structure of $M_{n,p}(\mathbb{K})$ is stated « admis » here and proven rigorously in chapter Vector spaces. Solutions of linear systems and the Gauss algorithm itself are deferred to the next chapter Linear systems; this chapter treats only the matrix side. Prerequisites are Algebraic computations ($\sum$-notation, scalar binomial formula) and the rudiments of Logic, Sets, Maps.

I The space $M_{n\virgule p}(\mathbb{K})$

A matrix of size $n \times p$ stores $n p$ scalars in a rectangular grid: $n$ rows and $p$ columns. The set of all such matrices, denoted $M_{n,p}(\mathbb{K})$, carries a natural $\mathbb{K}$-vector space structure (componentwise addition and scalar multiplication). The most economical description writes any matrix as a linear combination of « elementary » matrices $E_{ij}$, each having a single 1 at position $(i, j)$ and zeros elsewhere; this elementary basis recurs throughout the rest of the chapter (in the row × column rule, in the encoding of elementary operations, etc.).

Definition — Matrix

A matrix of size $n \times p$ with coefficients in $\mathbb{K}$ is a family $A = (a_{ij})_{\substack{1 \le i \le n \\ 1 \le j \le p}}$ of $n p$ elements of $\mathbb{K}$ indexed by $\llbracket 1, n \rrbracket \times \llbracket 1, p \rrbracket$, written in the rectangular form $$ A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1p} \\ a_{21} & a_{22} & \cdots & a_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{np} \end{pmatrix}. $$ The scalar $a_{ij}$ is the coefficient of $A$ at position $(i, j)$. The set of all matrices of size $n \times p$ with coefficients in $\mathbb{K}$ is denoted $M_{n, p}(\mathbb{K})$.
Special shapes. A matrix of size $n \times n$ is a square matrix of size $n$; the set is denoted $M_n(\mathbb{K})$. A matrix of size $n \times 1$ is a column matrix (or vector), and a matrix of size $1 \times p$ is a row matrix. The matrix of size $n \times p$ all of whose coefficients are zero is the zero matrix, denoted $0$ (or $0_{n, p}$ when the size matters).

Example

The matrix $$ A = \begin{pmatrix} 2 & -1 & 0 \\ 3 & 5 & 1 \end{pmatrix} \in M_{2, 3}(\mathbb{R}) $$ has coefficients $a_{11} = 2$, $a_{12} = -1$, $a_{13} = 0$, $a_{21} = 3$, $a_{22} = 5$, $a_{23} = 1$. Its first row is $\begin{pmatrix} 2 & -1 & 0 \end{pmatrix}$ and its second column is $\begin{pmatrix} -1 \\ 5 \end{pmatrix}$.

Example

Three matrices in special shapes: $$ \begin{pmatrix} 1 & 2 & 3 \end{pmatrix} \in M_{1, 3}(\mathbb{R}) \text{ (row)}, \qquad \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} \in M_{3, 1}(\mathbb{R}) \text{ (column)}, \qquad 0_{2, 3} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} \text{ (zero)}. $$

Definition — Equality of matrices

Two matrices $A = (a_{ij})$ and $B = (b_{ij})$ are equal if and only if they have the same size and $a_{ij} = b_{ij}$ for every $(i, j)$.

Definition — Addition$\virgule$ scalar multiplication$\virgule$ linear combination

For $A, B \in M_{n, p}(\mathbb{K})$ and $\lambda, \mu \in \mathbb{K}$, the linear combination $\lambda A + \mu B$ is the matrix of $M_{n, p}(\mathbb{K})$ whose coefficient at $(i, j)$ is $\lambda a_{ij} + \mu b_{ij}$: $$ \lambda A + \mu B = \begin{pmatrix} \lambda a_{11} + \mu b_{11} & \cdots & \lambda a_{1p} + \mu b_{1p} \\ \vdots & \ddots & \vdots \\ \lambda a_{n1} + \mu b_{n1} & \cdots & \lambda a_{np} + \mu b_{np} \end{pmatrix}. $$ The special case $\lambda = \mu = 1$ gives the sum $A + B$, and the case $\mu = 0$ gives the scalar multiple $\lambda A$.

Example

With $A = \begin{pmatrix} 2 & 1 \\ 0 & 4 \end{pmatrix}$ and $B = \begin{pmatrix} 1 & 1 \\ -2 & 3 \end{pmatrix}$: $$ 3 A - 2 B = \begin{pmatrix} 3 \cdot 2 - 2 \cdot 1 & 3 \cdot 1 - 2 \cdot 1 \\ 3 \cdot 0 - 2 \cdot (-2) & 3 \cdot 4 - 2 \cdot 3 \end{pmatrix} = \begin{pmatrix} 4 & 1 \\ 4 & 6 \end{pmatrix}. $$

Proposition — Vector space structure of $M_{n\virgule p}(\mathbb{K})$

Equipped with the addition $A + B$ and the scalar multiplication $\lambda A$ defined above, $M_{n, p}(\mathbb{K})$ is a $\mathbb{K}$-vector space. Concretely: addition is associative and commutative, the zero matrix $0_{n, p}$ is a neutral element, every matrix $A$ has an opposite $-A$, and the standard scalar-distributivity rules hold. Admis at this stage; full justification is given in chapter Vector spaces.

Definition — Elementary matrices

For $(i, j) \in \llbracket 1, n \rrbracket \times \llbracket 1, p \rrbracket$, the elementary matrix $E_{ij}$ of $M_{n, p}(\mathbb{K})$ is the matrix all of whose coefficients are zero, except the coefficient at position $(i, j)$, which equals $1$.

Example

The elementary matrices of $M_{2, 3}(\mathbb{R})$ are: $$ \begin{aligned} &E_{11} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}, \quad E_{21} = \begin{pmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \end{pmatrix}, \quad E_{12} = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \end{pmatrix}, \\ &E_{22} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix}, \quad E_{13} = \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}, \quad E_{23} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}. \end{aligned} $$ Every matrix $M \in M_{2, 3}(\mathbb{R})$ is a linear combination of these: $$ M = \begin{pmatrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \end{pmatrix} = m_{11} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} + m_{21} \begin{pmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \end{pmatrix} + m_{12} \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \end{pmatrix} + \ldots = \sum_{\substack{1 \le i \le 2 \\ 1 \le j \le 3}} m_{ij} E_{ij}. $$

Proposition — Decomposition on the elementary matrices

Every matrix $M = (m_{ij}) \in M_{n, p}(\mathbb{K})$ writes uniquely as a linear combination of the elementary matrices: $$ M = \sum_{\substack{1 \le i \le n \\ 1 \le j \le p}} m_{ij} E_{ij}. $$ The scalars $m_{ij}$ are the coefficients of $M$ themselves: every matrix decomposes uniquely on the family of elementary matrices. We will later call $(E_{ij})_{i, j}$ the canonical basis of the $\mathbb{K}$-vector space $M_{n, p}(\mathbb{K})$ (vocabulary deferred to chapter Vector spaces).

Proof

The coefficient of the right-hand side at position $(k, l) \in \llbracket 1, n \rrbracket \times \llbracket 1, p \rrbracket$ is $$ \sum_{i, j} m_{ij} (E_{ij})_{kl} = \sum_{i, j} m_{ij} \cdot \delta_{ik} \delta_{jl} = m_{kl}, $$ since $(E_{ij})_{kl} = 1$ if and only if $(i, j) = (k, l)$, and $0$ otherwise. This matches the coefficient of $M$ at $(k, l)$, hence the equality $M = \sum_{i, j} m_{ij} E_{ij}$.
For uniqueness, suppose $M = \sum_{i, j} \lambda_{ij} E_{ij}$. Reading the coefficient at $(k, l)$ on both sides gives $m_{kl} = \lambda_{kl}$, so the scalars are forced.

Example

In $M_{2, 3}(\mathbb{R})$, the matrix $$ M = \begin{pmatrix} 2 & -1 & 0 \\ 3 & 5 & 1 \end{pmatrix} $$ decomposes as $$ M = 2 E_{11} - E_{12} + 0 \cdot E_{13} + 3 E_{21} + 5 E_{22} + E_{23}. $$

Skills to practice

Computing a linear combination
Decomposing on the elementary matrices

II Matrix product

The matrix product is the operation that links matrices to systems of linear equations (every system $A X = B$ multiplies the column $X$ on the left by $A$), to compositions of linear maps (the matrix of $g \circ f$ is the product of the matrices of $g$ and $f$), and to changes of basis. It is defined only when the inner sizes match: $A$ has size $p \times q$ and $B$ has size $q \times r$ --- a single $q$. The product is bilinear, associative, and admits the identity matrix $I_n$ as a neutral element on the square case. But it is not commutative, and contrary to scalars two matrices may multiply to zero without either being zero --- phenomena that drive much of what follows.

Definition — Matrix product

For $A \in M_{p, q}(\mathbb{K})$ and $B \in M_{q, r}(\mathbb{K})$, the matrix product $A B \in M_{p, r}(\mathbb{K})$ is defined by $$ (A B)_{ij} = \sum_{k = 1}^{q} a_{ik} b_{kj} \qquad \text{for } i \in \llbracket 1, p \rrbracket, \ j \in \llbracket 1, r \rrbracket. $$ The product $A B$ is defined only when the number of columns of $A$ equals the number of rows of $B$ (both equal $q$). The size of the product is then $(\text{rows of } A) \times (\text{columns of } B) = p \times r$.

Example

With $A = \begin{pmatrix} 2 & 1 \\ 1 & 3 \\ 2 & 0 \end{pmatrix} \in M_{3, 2}(\mathbb{R})$ and $B = \begin{pmatrix} 1 & 0 & 1 \\ 2 & -1 & 3 \end{pmatrix} \in M_{2, 3}(\mathbb{R})$, the product $A B \in M_{3, 3}(\mathbb{R})$: $$ A B = \begin{pmatrix} 2 \cdot 1 + 1 \cdot 2 & 2 \cdot 0 + 1 \cdot (-1) & 2 \cdot 1 + 1 \cdot 3 \\ 1 \cdot 1 + 3 \cdot 2 & 1 \cdot 0 + 3 \cdot (-1) & 1 \cdot 1 + 3 \cdot 3 \\ 2 \cdot 1 + 0 \cdot 2 & 2 \cdot 0 + 0 \cdot (-1) & 2 \cdot 1 + 0 \cdot 3 \end{pmatrix} = \begin{pmatrix} 4 & -1 & 5 \\ 7 & -3 & 10 \\ 2 & 0 & 2 \end{pmatrix}. $$

Method — Row $\times$ column rule

The coefficient $(A B)_{ij}$ is the « dot product » of the $i$-th row of $A$ with the $j$-th column of $B$: write the row horizontally above the column vertically and pair them term-by-term, then add. This is the most useful mental shortcut to compute a single coefficient of a product without writing out the full matrix.

Proposition — Column interpretation of $A X$

Let $A \in M_{n, p}(\mathbb{K})$ with columns $C_1(A), \ldots, C_p(A) \in M_{n, 1}(\mathbb{K})$, and let $X = \begin{pmatrix} x_1 \\ \vdots \\ x_p \end{pmatrix} \in M_{p, 1}(\mathbb{K})$. Then $$ A X = \sum_{j = 1}^{p} x_j \, \textcolor{colorprop}{C_j(A)}. $$ The product $A X$ is the linear combination of the columns of $A$ with coefficients given by the components of $X$. More generally, for $B \in M_{p, r}(\mathbb{K})$ with columns $C_1(B), \ldots, C_r(B)$, $$ A B = \begin{pmatrix} A C_1(B) & \cdots & A C_r(B) \end{pmatrix}. $$

Example

With $A = \begin{pmatrix} 1 & 3 \\ 2 & 4 \end{pmatrix}$ and $X = \begin{pmatrix} 5 \\ 6 \end{pmatrix}$: $$ A X = 5 \begin{pmatrix} 1 \\ 2 \end{pmatrix} + 6 \begin{pmatrix} 3 \\ 4 \end{pmatrix} = \begin{pmatrix} 5 + 18 \\ 10 + 24 \end{pmatrix} = \begin{pmatrix} 23 \\ 34 \end{pmatrix}. $$

Proposition — Bilinearity and associativity of the matrix product

Let $A, A' \in M_{p, q}(\mathbb{K})$, $B, B' \in M_{q, r}(\mathbb{K})$, $C \in M_{r, s}(\mathbb{K})$, $\lambda, \mu \in \mathbb{K}$. Then:

Bilinearity: $(\lambda A + \mu A') B = \lambda A B + \mu A' B$ and $A (\lambda B + \mu B') = \lambda A B + \mu A B'$.
Associativity: $(A B) C = A (B C)$.

Proof

Bilinearity. For any $(i, j) \in \llbracket 1, p \rrbracket \times \llbracket 1, r \rrbracket$, $$ \begin{aligned} \bigl( (\lambda A + \mu A') B \bigr)_{ij} &= \sum_{k = 1}^{q} (\lambda a_{ik} + \mu a'_{ik}) b_{kj} && \text{(definition of the product, with } (\lambda A + \mu A')_{ik} = \lambda a_{ik} + \mu a'_{ik}\text{)} \\ &= \lambda \sum_{k = 1}^{q} a_{ik} b_{kj} + \mu \sum_{k = 1}^{q} a'_{ik} b_{kj} && \text{(linearity of } \sum\text{)} \\ &= \lambda (A B)_{ij} + \mu (A' B)_{ij}. \end{aligned} $$ The right-bilinearity is symmetric.
Associativity. For $(i, j) \in \llbracket 1, p \rrbracket \times \llbracket 1, s \rrbracket$, $$ \begin{aligned} \bigl( (A B) C \bigr)_{ij} &= \sum_{l = 1}^{r} (A B)_{il} c_{lj} && \text{(definition of the outer product)} \\ &= \sum_{l = 1}^{r} \left( \sum_{k = 1}^{q} a_{ik} b_{kl} \right) c_{lj} && \text{(definition of the inner product } A B\text{)} \\ &= \sum_{l = 1}^{r} \sum_{k = 1}^{q} a_{ik} b_{kl} c_{lj} && \text{(distributivity over } c_{lj}\text{)} \\ &= \sum_{k = 1}^{q} \sum_{l = 1}^{r} a_{ik} b_{kl} c_{lj} && \text{(swap of finite sums)} \\ &= \sum_{k = 1}^{q} a_{ik} \sum_{l = 1}^{r} b_{kl} c_{lj} && \text{(factor } a_{ik}\text{ out of the inner sum)} \\ &= \sum_{k = 1}^{q} a_{ik} (B C)_{kj} && \text{(recognize } B C\text{)} \\ &= \bigl( A (B C) \bigr)_{ij} && \text{(definition of the product } A (B C)\text{)}. \end{aligned} $$

Definition — Identity matrix

The identity matrix of size $n$ is the square matrix $$ I_n = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix} \in M_n(\mathbb{K}), $$ that is, $(I_n)_{ij} = \delta_{ij}$ where $\delta_{ij}$ is the Kronecker symbol ($1$ if $i = j$, $0$ otherwise).

Proposition — Identity matrix is neutral

For every $A \in M_{n, p}(\mathbb{K})$: $$ I_n A = A \qquad \text{and} \qquad A I_p = A. $$

Proof

For $(i, j) \in \llbracket 1, n \rrbracket \times \llbracket 1, p \rrbracket$, $$ \begin{aligned} (I_n A)_{ij} &= \sum_{k = 1}^{n} (I_n)_{ik} a_{kj} && \text{(definition of the product)} \\ &= \sum_{k = 1}^{n} \delta_{ik} a_{kj} && \text{(definition of } I_n\text{)} \\ &= a_{ij} && \text{(only the term } k = i \text{ survives, where } \delta_{ii} = 1\text{)}. \end{aligned} $$ The proof of $A I_p = A$ is symmetric.

Proposition — Product of elementary matrices

For $E_{ij} \in M_{n, p}(\mathbb{K})$ and $E_{kl} \in M_{p, q}(\mathbb{K})$, $$ \textcolor{colorprop}{E_{ij} E_{kl} = \delta_{jk} E_{il}}. $$ The product of two elementary matrices vanishes unless the inner indices match ($j = k$); when they match, the result is the elementary matrix carrying the outer indices ($i, l$).

Proof

For $(s, t) \in \llbracket 1, n \rrbracket \times \llbracket 1, q \rrbracket$, $$ \begin{aligned} (E_{ij} E_{kl})_{st} &= \sum_{r = 1}^{p} (E_{ij})_{sr} (E_{kl})_{rt} && \text{(definition of the product)} \\ &= \sum_{r = 1}^{p} \delta_{is} \delta_{jr} \cdot \delta_{kr} \delta_{lt} && \text{(definition of $E_{ij}$ and $E_{kl}$)} \\ &= \delta_{is} \delta_{lt} \sum_{r = 1}^{p} \delta_{jr} \delta_{kr} && \text{(factor out the indices independent of $r$)} \\ &= \delta_{is} \delta_{lt} \cdot \delta_{jk} && \text{(only the term $r = j$ survives, contributing $\delta_{kj}$)} \\ &= \delta_{jk} (E_{il})_{st} && \text{(recognize $(E_{il})_{st} = \delta_{is} \delta_{lt}$)}. \end{aligned} $$

Proposition — Ring structure of $M_n(\mathbb{K})$

Equipped with matrix addition and matrix product, $M_n(\mathbb{K})$ is a ring with identity element $I_n$. For $n \ge 2$, this ring is not commutative in general.

Example — Non-commutativity

The matrix product is not commutative. With $A = \begin{pmatrix} 0 & 1 \\ 0 & 1 \end{pmatrix}$ and $B = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}$: $$ A B = \begin{pmatrix} 1 & 0 \\ 1 & 0 \end{pmatrix} \neq \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix} = B A. $$ This is a fundamental difference from scalar multiplication, where $\lambda \mu = \mu \lambda$ always.

Example — Zero divisors

A product of matrices can be zero without either factor being zero. With $A = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$ and $B = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}$: $$ A B = \begin{pmatrix} 0 \cdot 1 + 1 \cdot 0 & 0 \cdot 1 + 1 \cdot 0 \\ 0 \cdot 1 + 0 \cdot 0 & 0 \cdot 1 + 0 \cdot 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}, $$ even though $A \neq 0$ and $B \neq 0$. This exhibits non-zero zero divisors in the ring $M_2(\mathbb{K})$. The phenomenon is impossible in $\mathbb{K}$: if $\lambda \mu = 0$ in $\mathbb{R}$ or $\mathbb{C}$, then $\lambda = 0$ or $\mu = 0$.

Definition — Matrix powers and nilpotent matrices

For $A \in M_n(\mathbb{K})$, the powers of $A$ are defined by $A^0 = I_n$ and $A^{k + 1} = A \cdot A^k$ for $k \in \mathbb{N}$. When $A$ is invertible, this is extended to negative integers by $A^{-k} = (A^{-1})^k$ for $k \in \mathbb{N}^*$, so that $A^k$ is defined for every $k \in \mathbb{Z}$. The matrix $A$ is nilpotent if $A^p = 0$ for some $p \in \mathbb{N}^*$. The smallest such $p$ is the index of nilpotence.

Example

The matrix $N = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$ is nilpotent of index $2$: $$ N^2 = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 \cdot 0 + 1 \cdot 0 & 0 \cdot 1 + 1 \cdot 0 \\ 0 \cdot 0 + 0 \cdot 0 & 0 \cdot 1 + 0 \cdot 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}. $$ Hence $N^k = 0$ for all $k \ge 2$, and $N \neq 0$.

Proposition — Binomial formula and difference of powers for commuting matrices

Let $A, B \in M_n(\mathbb{K})$ with $A B = B A$ (the matrices commute). Then for every $k \in \mathbb{N}$, $$ \textcolor{colorprop}{(A + B)^k = \sum_{i = 0}^{k} \binom{k}{i} A^i B^{k - i}}, $$ and for every $k \in \mathbb{N}^*$, $$ \textcolor{colorprop}{A^k - B^k = (A - B) \sum_{i = 0}^{k - 1} A^i B^{k - 1 - i}}. $$ Warning. The hypothesis $A B = B A$ is essential: without it, $(A + B)^2 = A^2 + A B + B A + B^2 \neq A^2 + 2 A B + B^2$ in general, and the difference-of-powers identity also fails.

Proof

The proof is identical to the scalar binomial proof in Algebraic calculus, since the only ingredient (beyond ring axioms) is the commutation $A B = B A$ used to reorder factors.
Binomial formula. Induction on $k \in \mathbb{N}$.

Initialization. For $k = 0$: $(A + B)^0 = I_n$ and $\sum_{i = 0}^{0} \binom{0}{i} A^0 B^0 = \binom{0}{0} I_n = I_n$.
Heredity. Assume the formula at rank $k$. Multiply by $(A + B)$ on the right: $$ \begin{aligned} (A + B)^{k + 1} &= (A + B)^k (A + B) && \text{(power decomposition)} \\ &= \left( \sum_{i = 0}^{k} \binom{k}{i} A^i B^{k - i} \right) (A + B) && \text{(induction hypothesis)} \\ &= \sum_{i = 0}^{k} \binom{k}{i} A^i B^{k - i} A + \sum_{i = 0}^{k} \binom{k}{i} A^i B^{k - i + 1} && \text{(distributivity)} \\ &= \sum_{i = 0}^{k} \binom{k}{i} A^{i + 1} B^{k - i} + \sum_{i = 0}^{k} \binom{k}{i} A^i B^{k - i + 1} && \text{(commutation } A^i B^{k - i} A = A^{i + 1} B^{k - i}\text{)} \\ &= \sum_{i = 1}^{k + 1} \binom{k}{i - 1} A^i B^{k - i + 1} + \sum_{i = 0}^{k} \binom{k}{i} A^i B^{k - i + 1} && \text{(reindex first sum: } i \mapsto i - 1\text{)} \\ &= A^{k + 1} + \sum_{i = 1}^{k} \left[ \binom{k}{i - 1} + \binom{k}{i} \right] A^i B^{k - i + 1} + B^{k + 1} && \text{(separate end terms)} \\ &= A^{k + 1} + \sum_{i = 1}^{k} \binom{k + 1}{i} A^i B^{k + 1 - i} + B^{k + 1} && \text{(Pascal's rule)} \\ &= \sum_{i = 0}^{k + 1} \binom{k + 1}{i} A^i B^{k + 1 - i} && \text{(reabsorb end terms)}. \end{aligned} $$
Conclusion. The formula holds for all $k \in \mathbb{N}$ by induction.

Difference of powers. For $k \in \mathbb{N}^*$, expand the right-hand side using $A B = B A$ to commute factors: $$ \begin{aligned} (A - B) \sum_{i = 0}^{k - 1} A^i B^{k - 1 - i} &= \sum_{i = 0}^{k - 1} A^{i + 1} B^{k - 1 - i} - \sum_{i = 0}^{k - 1} B \cdot A^i B^{k - 1 - i} && \text{(distributivity)} \\ &= \sum_{i = 0}^{k - 1} A^{i + 1} B^{k - 1 - i} - \sum_{i = 0}^{k - 1} A^i B^{k - i} && \text{(commutation: } B \cdot A^i = A^i B \text{)} \\ &= \sum_{j = 1}^{k} A^{j} B^{k - j} - \sum_{i = 0}^{k - 1} A^i B^{k - i} && \text{(reindex first sum: } j = i + 1\text{)} \\ &= A^k - B^k && \text{(telescoping: matched terms } j = i \text{ cancel)}. \end{aligned} $$

Method — Powers via $\alpha I_n + N$ with $N$ nilpotent

To compute $A^k$ when $A = \alpha I_n + N$ with $\alpha \in \mathbb{K}$ and $N \in M_n(\mathbb{K})$ nilpotent of index $p$, observe that $\alpha I_n$ commutes with every matrix (in particular with $N$). Apply the binomial formula: $$ A^k = \sum_{i = 0}^{k} \binom{k}{i} (\alpha I_n)^{k - i} N^i = \sum_{i = 0}^{\min(k, p - 1)} \binom{k}{i} \alpha^{k - i} N^i. $$ The sum truncates at $i = p - 1$ because $N^p = 0$ --- only finitely many terms contribute.

Example

Let $A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}$. Write $A = I_3 + N$ with $N = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}$. Compute $N^2$: $$ N^2 = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}, $$ so $N$ is nilpotent of index $2$. Since $I_3$ and $N$ commute, the binomial formula truncates at $i = 1$: $$ A^k = (I_3 + N)^k = \binom{k}{0} I_3 + \binom{k}{1} N = I_3 + k N = \begin{pmatrix} 1 & k & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}. $$

Skills to practice

Computing a matrix product
Identifying non-commutativity and zero divisors
Computing matrix powers

III Transpose$\virgule$ symmetric and antisymmetric matrices

Swapping rows and columns produces the transpose of a matrix. The transpose is a linear involution that turns a product into the reversed product: $(A B)^\top = B^\top A^\top$. Two natural subspaces of $M_n(\mathbb{K})$ emerge: matrices fixed by transposition (symmetric) and matrices reversed in sign by transposition (antisymmetric).

Definition — Transpose

For $A = (a_{ij}) \in M_{n, p}(\mathbb{K})$, the transpose of $A$ is the matrix $A^\top = (a_{ji}) \in M_{p, n}(\mathbb{K})$. Concretely, the rows of $A$ become the columns of $A^\top$ (and vice versa); the coefficient of $A^\top$ at $(i, j)$ is the coefficient of $A$ at $(j, i)$.

Example

$$ \begin{pmatrix} 3 & 0 & 1 \\ 5 & 2 & 7 \end{pmatrix}^\top = \begin{pmatrix} 3 & 5 \\ 0 & 2 \\ 1 & 7 \end{pmatrix}. $$ The transpose of a column matrix is a row matrix, and conversely.

Proposition — Properties of the transpose

For $A, B \in M_{n, p}(\mathbb{K})$, $C \in M_{p, q}(\mathbb{K})$, $\lambda, \mu \in \mathbb{K}$:

Linearity: $(\lambda A + \mu B)^\top = \lambda A^\top + \mu B^\top$.
Involutivity: $(A^\top)^\top = A$.
Reversal of product: $(A C)^\top = C^\top A^\top$.

Proof

Linearity and involutivity are direct consequences of the index swap $(a_{ij}) \mapsto (a_{ji})$. The product reversal is the only non-trivial point. For $(i, j) \in \llbracket 1, q \rrbracket \times \llbracket 1, n \rrbracket$, $$ \begin{aligned} \bigl( (A C)^\top \bigr)_{ij} &= (A C)_{ji} && \text{(definition of transpose)} \\ &= \sum_{k = 1}^{p} a_{jk} c_{ki} && \text{(definition of the matrix product)} \\ &= \sum_{k = 1}^{p} (A^\top)_{kj} (C^\top)_{ik} && \text{(}a_{jk} = (A^\top)_{kj}\text{ and }c_{ki} = (C^\top)_{ik}\text{)} \\ &= \sum_{k = 1}^{p} (C^\top)_{ik} (A^\top)_{kj} && \text{(reorder the scalar product in } \mathbb{K}\text{)} \\ &= (C^\top A^\top)_{ij} && \text{(recognize the matrix product } C^\top A^\top\text{)}. \end{aligned} $$

Definition — Symmetric and antisymmetric matrices

A matrix $A \in M_n(\mathbb{K})$ is symmetric if $A^\top = A$ and antisymmetric if $A^\top = -A$. The set of symmetric (resp. antisymmetric) matrices is denoted $S_n(\mathbb{K})$ (resp. $A_n(\mathbb{K})$). Remark. An antisymmetric matrix has all diagonal coefficients equal to zero (since $a_{ii} = -a_{ii}$).

Example

The matrix $\begin{pmatrix} 1 & 2 & 3 \\ 2 & 5 & 0 \\ 3 & 0 & 6 \end{pmatrix}$ is symmetric (mirror across the diagonal); the matrix $\begin{pmatrix} 0 & 1 & -5 \\ -1 & 0 & 2 \\ 5 & -2 & 0 \end{pmatrix}$ is antisymmetric (sign flips across the diagonal, zeros on it).

Skills to practice

Manipulating the transpose
Working with symmetric and antisymmetric matrices

IV Diagonal and triangular matrices

Two distinguished families of square matrices are stable under linear combination and matrix product: diagonal matrices (zero outside the diagonal) and triangular matrices (zero below or above the diagonal). Their stability under product makes them useful subrings of $M_n(\mathbb{K})$, and their diagonal coefficients control invertibility (treated in the sections on invertibility and Gauss-Jordan below).

Definition — Diagonal$\virgule$ scalar$\virgule$ triangular matrices

Let $A = (a_{ij}) \in M_n(\mathbb{K})$.

$A$ is diagonal if $a_{ij} = 0$ whenever $i \neq j$. The matrix is then noted $\mathrm{diag}(a_{11}, \ldots, a_{nn})$.
$A$ is a scalar matrix if $A = \lambda I_n$ for some $\lambda \in \mathbb{K}$ (a diagonal matrix with all diagonal coefficients equal).
$A$ is upper triangular if $a_{ij} = 0$ whenever $i > j$ (zeros strictly below the diagonal); lower triangular if $a_{ij} = 0$ whenever $i < j$ (zeros strictly above).

A matrix is diagonal if and only if it is both upper and lower triangular.

Example

Three concrete examples in $M_3(\mathbb{R})$: $$ \mathrm{diag}(2, -1, 5) = \begin{pmatrix} 2 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{pmatrix}, \qquad \begin{pmatrix} 1 & 4 & 7 \\ 0 & 2 & 8 \\ 0 & 0 & 3 \end{pmatrix} \text{ (upper triangular)}, \qquad \begin{pmatrix} 1 & 0 & 0 \\ 4 & 2 & 0 \\ 7 & 8 & 3 \end{pmatrix} \text{ (lower triangular)}. $$

Proposition — Scalar matrices commute with everything

For every $\lambda \in \mathbb{K}$ and every $A \in M_{n, p}(\mathbb{K})$, $$ \textcolor{colorprop}{(\lambda I_n) A = A (\lambda I_p) = \lambda A}. $$ Scalar matrices act exactly as the underlying scalar; in particular, they commute with every square matrix of compatible size.

Proposition — Stability of triangular and diagonal families

Let $A, B \in M_n(\mathbb{K})$ both upper triangular (resp. both lower triangular, resp. both diagonal), and $\lambda, \mu \in \mathbb{K}$. Then $\lambda A + \mu B$ and $A B$ are upper triangular (resp. lower triangular, resp. diagonal). Moreover, the diagonal coefficients of the product satisfy $$ \textcolor{colorprop}{(A B)_{ii} = a_{ii} b_{ii}} \qquad \text{for all } i \in \llbracket 1, n \rrbracket. $$

Proof

Treat the upper triangular case; the lower case follows by transposition, and the diagonal case is the intersection.

Linear combination. For $i > j$, $(\lambda A + \mu B)_{ij} = \lambda a_{ij} + \mu b_{ij} = \lambda \cdot 0 + \mu \cdot 0 = 0$, so $\lambda A + \mu B$ is upper triangular.
Product. For $i, j \in \llbracket 1, n \rrbracket$ with $i > j$, splitting the sum into $k \le i - 1$ and $k \ge i$: $$ \begin{aligned} (A B)_{ij} &= \sum_{k = 1}^{n} a_{ik} b_{kj} = \sum_{k = 1}^{i - 1} \underbrace{a_{ik}}_{= 0 \text{ since } k < i} b_{kj} + \sum_{k = i}^{n} a_{ik} \underbrace{b_{kj}}_{= 0 \text{ since } k \ge i > j} && \text{(zone analysis)} \\ &= 0 + 0 = 0, \end{aligned} $$ so $A B$ is upper triangular. For $i \in \llbracket 1, n \rrbracket$, the same zone analysis at $i = j$ leaves only the term $k = i$: $$ (A B)_{ii} = \sum_{k = 1}^{i - 1} \underbrace{a_{ik}}_{= 0} b_{ki} + a_{ii} b_{ii} + \sum_{k = i + 1}^{n} a_{ik} \underbrace{b_{ki}}_{= 0} = a_{ii} b_{ii}. $$

Method — Product of diagonal matrices

For $\alpha_1, \ldots, \alpha_n, \beta_1, \ldots, \beta_n \in \mathbb{K}$: $$ \mathrm{diag}(\alpha_1, \ldots, \alpha_n) \cdot \mathrm{diag}(\beta_1, \ldots, \beta_n) = \mathrm{diag}(\alpha_1 \beta_1, \ldots, \alpha_n \beta_n). $$ The product is computed entry-wise on the diagonal --- no row $\times$ column gymnastics. This makes diagonal matrices the simplest stable family inside $M_n(\mathbb{K})$ to manipulate.

Example

$$ \mathrm{diag}(2, 3, 5) \cdot \mathrm{diag}(1, 4, 1) = \mathrm{diag}(2 \cdot 1, \ 3 \cdot 4, \ 5 \cdot 1) = \mathrm{diag}(2, 12, 5). $$

Skills to practice

Computing products of diagonal and triangular matrices

V Invertible matrices

The scalar version of « invertibility » is « non-zero » : every non-zero scalar has a multiplicative inverse. The matrix version is more delicate: only square matrices can be inverted, and not all non-zero ones admit an inverse (Example $\begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}$). When the inverse exists, it is unique. The set of invertible matrices of $M_n(\mathbb{K})$ forms a group under multiplication, the linear group $\mathrm{GL}_n(\mathbb{K})$. For $2 \times 2$ matrices a closed-form inverse formula is available; for larger sizes one uses elementary operations (treated in the Gauss-Jordan section below).

Definition — Invertible matrix$\virgule$ inverse$\virgule$ linear group

A matrix $A \in M_n(\mathbb{K})$ is invertible if there exists $B \in M_n(\mathbb{K})$ such that $$ A B = B A = I_n. $$ Such a matrix $B$, when it exists, is unique (proof below); it is called the inverse of $A$ and denoted $A^{-1}$. The set of invertible matrices of $M_n(\mathbb{K})$ is denoted $\mathrm{GL}_n(\mathbb{K})$ and called the linear group of dimension $n$ over $\mathbb{K}$.

Proposition — Uniqueness of the inverse

If $A \in M_n(\mathbb{K})$ is invertible, then the matrix $B$ such that $A B = B A = I_n$ is unique.

Proof

Suppose $B$ and $B'$ both satisfy $A B = B A = I_n$ and $A B' = B' A = I_n$. Then $$ B = B I_n = B (A B') = (B A) B' = I_n B' = B', $$ using successively the neutral element, the hypothesis $A B' = I_n$, associativity, the hypothesis $B A = I_n$, and the neutral element. Hence $B = B'$.

Example

The matrix $A = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix}$ is invertible with inverse $A^{-1} = \begin{pmatrix} 1 & -1 \\ -1 & 2 \end{pmatrix}$, since $$ A A^{-1} = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} 1 & -1 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 2 \cdot 1 + 1 \cdot (-1) & 2 \cdot (-1) + 1 \cdot 2 \\ 1 \cdot 1 + 1 \cdot (-1) & 1 \cdot (-1) + 1 \cdot 2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I_2, $$ and similarly $A^{-1} A = I_2$.

Theorem — Inverse of a $2 \times 2$ matrix

Let $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in M_2(\mathbb{K})$. Define the determinant $$ \det A = a d - b c \in \mathbb{K}. $$ Then $A$ is invertible if and only if $\det A \neq 0$, and in that case $$ \textcolor{colorprop}{A^{-1} = \frac{1}{\det A} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}}. $$

Proof

Compute the product $$ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} = \begin{pmatrix} a d - b c & -a b + a b \\ c d - c d & -b c + a d \end{pmatrix} = (a d - b c) I_2 = (\det A) I_2. $$ Symmetrically, $\begin{pmatrix} d & -b \\ -c & a \end{pmatrix} \begin{pmatrix} a & b \\ c & d \end{pmatrix} = (\det A) I_2$.

($\Leftarrow$) If $\det A \neq 0$, divide both products by $\det A$ to get $A \cdot \frac{1}{\det A} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} = I_2$ and the symmetric identity. So $A$ is invertible with the announced inverse.
($\Rightarrow$) Conversely, suppose by contradiction that $\det A = 0$ and $A$ is invertible. Multiply both sides of $A \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} = (\det A) I_2 = 0_{2, 2}$ on the left by $A^{-1}$: $$ \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} = A^{-1} \cdot 0_{2, 2} = 0_{2, 2}, $$ which forces $a = b = c = d = 0$, hence $A = 0_{2, 2}$. But the zero matrix is not invertible (its product with any matrix is zero, not $I_2$), contradiction. So $\det A \neq 0$.

Method — Invert a $2 \times 2$ matrix

Given $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$:

Compute $\det A = a d - b c$.
If $\det A = 0$: $A$ is not invertible. Stop.
If $\det A \neq 0$: $A^{-1} = \dfrac{1}{\det A} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$ --- swap the diagonal, negate the anti-diagonal, divide by the determinant.

Example

For $A = \begin{pmatrix} 3 & 5 \\ 1 & 2 \end{pmatrix}$: $\det A = 3 \cdot 2 - 5 \cdot 1 = 6 - 5 = 1$, non-zero, so $A$ is invertible with $$ A^{-1} = \frac{1}{1} \begin{pmatrix} 2 & -5 \\ -1 & 3 \end{pmatrix} = \begin{pmatrix} 2 & -5 \\ -1 & 3 \end{pmatrix}. $$ Check: $A A^{-1} = \begin{pmatrix} 3 \cdot 2 + 5 \cdot (-1) & 3 \cdot (-5) + 5 \cdot 3 \\ 1 \cdot 2 + 2 \cdot (-1) & 1 \cdot (-5) + 2 \cdot 3 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I_2$. $\checkmark$

Proposition — Operations on invertible matrices

Let $A, B \in \mathrm{GL}_n(\mathbb{K})$.

$A^{-1} \in \mathrm{GL}_n(\mathbb{K})$ and $(A^{-1})^{-1} = A$.
$A B \in \mathrm{GL}_n(\mathbb{K})$ and $(A B)^{-1} = B^{-1} A^{-1}$ --- the inverse reverses the order.
$A^\top \in \mathrm{GL}_n(\mathbb{K})$ and $(A^\top)^{-1} = (A^{-1})^\top$.
For every $k \in \mathbb{Z}$, $A^k \in \mathrm{GL}_n(\mathbb{K})$ and $(A^k)^{-1} = (A^{-1})^k$.

In particular, $\mathrm{GL}_n(\mathbb{K})$ is a group for the matrix product, with identity element $I_n$.

Proof

Self-inverse. The relations $A A^{-1} = A^{-1} A = I_n$ show, by symmetry, that $A^{-1}$ is invertible with inverse $A$.
Product. Compute $(A B)(B^{-1} A^{-1}) = A (B B^{-1}) A^{-1} = A I_n A^{-1} = A A^{-1} = I_n$ (and the symmetric identity $(B^{-1} A^{-1})(A B) = I_n$). So $A B$ is invertible with inverse $B^{-1} A^{-1}$.
Transpose. Apply transpose to $A A^{-1} = I_n$ and use $(X Y)^\top = Y^\top X^\top$: $(A^{-1})^\top A^\top = I_n^\top = I_n$, and similarly $A^\top (A^{-1})^\top = I_n$. So $A^\top$ is invertible with inverse $(A^{-1})^\top$.
Powers. For $k \ge 0$, induct on $k$ using the product result. For $k < 0$, set $A^k = (A^{-1})^{-k}$ and apply the previous case.

Example — Involutory matrices

If $A \in M_n(\mathbb{K})$ satisfies $A^2 = I_n$, then $A$ is invertible with $A^{-1} = A$. Indeed, in the definition of inverse the two conditions $A B = I_n$ and $B A = I_n$ both reduce, with $B = A$, to the single equation $A^2 = I_n$ --- which holds by hypothesis. Such a matrix is called an involution. Examples: $I_n$, $-I_n$, the « swap » matrix $\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$.

Skills to practice

Inverting a $2 \times 2$ matrix
Using $(A B)^{-1}$ and $(A^\top)^{-1}$

VI Elementary operations and computing the inverse

Three elementary operations on the rows of a matrix --- swapping two rows, multiplying a row by a non-zero scalar, adding a multiple of one row to another --- are exactly the operations used in Gauss elimination. The same three exist on columns. The key fact: each elementary row operation amounts to left-multiplication by an invertible matrix, and each elementary column operation amounts to right-multiplication by an invertible matrix. This identification is the bridge between the « algorithmic » view of Gauss reduction and the algebraic structure of $\mathrm{GL}_n(\mathbb{K})$, and gives a constructive way to compute inverses: reduce the augmented array $(A | I_n)$ to $(I_n | A^{-1})$. The triangular case admits a sharp criterion (invertibility iff all diagonal coefficients are non-zero), which we prove directly.

Definition — Elementary operations

Let $i, j \in \llbracket 1, n \rrbracket$ with $i \neq j$, and $\lambda \in \mathbb{K}$. The three elementary row operations on a matrix $A \in M_{n, p}(\mathbb{K})$ are:

Swap: $L_i \leftrightarrow L_j$ --- exchange row $i$ and row $j$.
Scaling: $L_i \leftarrow \lambda L_i$ --- multiply row $i$ by $\lambda \neq 0$.
Transvection: $L_i \leftarrow L_i + \lambda L_j$ --- add $\lambda$ times row $j$ to row $i$ (with $i \neq j$).

The three elementary column operations are defined analogously, replacing rows by columns: $C_i \leftrightarrow C_j$, $C_i \leftarrow \lambda C_i$, $C_i \leftarrow C_i + \lambda C_j$.

Theorem — Elementary operation $\equal$ multiplication by an invertible matrix

Each elementary row operation on $A \in M_{n, p}(\mathbb{K})$ is realized by left-multiplication by an invertible matrix $P \in \mathrm{GL}_n(\mathbb{K})$. The matrix $P$ is obtained by applying the same operation to the identity matrix $I_n$.
Symmetrically, each elementary column operation on $A$ is realized by right-multiplication by an invertible matrix $Q \in \mathrm{GL}_p(\mathbb{K})$, where $Q$ is obtained by applying the same operation to $I_p$.

Proposition — Elementary operations preserve invertibility

Let $A \in M_n(\mathbb{K})$. Applying any elementary row or column operation to $A$ preserves invertibility: the result is invertible if and only if $A$ itself is invertible.

Proof

A row operation transforms $A$ into $P A$ with $P \in \mathrm{GL}_n(\mathbb{K})$ (Theorem above). Since the product of two invertible matrices is invertible (Proposition « operations on invertible matrices » above), $P A$ is invertible iff $A$ is. The argument for column operations is symmetric: $A$ becomes $A Q$ with $Q \in \mathrm{GL}_n(\mathbb{K})$.

Method — The three elementary matrices

Let $i \neq j \in \llbracket 1, n \rrbracket$ and $\lambda \in \mathbb{K}$. The matrix realizing the row operation, applied to $I_n$:

Swap matrix $P_{i \leftrightarrow j}$: $I_n$ with rows $i$ and $j$ exchanged. Inverse: $P_{i \leftrightarrow j}$ itself (involution).
Dilatation $D_i(\lambda)$ ($\lambda \neq 0$): $I_n$ with the diagonal $1$ at position $i$ replaced by $\lambda$. Inverse: $D_i(1 / \lambda)$.
Transvection $T_{ij}(\lambda) = I_n + \lambda E_{ij}$ (with $i \neq j$): $I_n$ with an extra $\lambda$ at position $(i, j)$. Inverse: $T_{ij}(-\lambda)$.

For columns: the same matrices, but they multiply $A$ on the right (and $T_{ij}(\lambda)$ on the right realizes $C_j \leftarrow C_j + \lambda C_i$, i.e.\ adding $\lambda C_i$ to $C_j$ --- note the index swap).

Example — Swap matrix on $M_3$

The matrix realizing $L_1 \leftrightarrow L_2$ on a $3 \times p$ matrix is $$ P_{1 \leftrightarrow 2} = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}. $$ Verify on $A = \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix}$: $$ P_{1 \leftrightarrow 2} A = \begin{pmatrix} d & e & f \\ a & b & c \\ g & h & i \end{pmatrix}. $$ The matrix is its own inverse: $P_{1 \leftrightarrow 2}^2 = I_3$.

Example — Transvection matrix on $M_3$

The matrix realizing $L_2 \leftarrow L_2 + 3 L_1$ on a $3 \times p$ matrix is $$ T_{21}(3) = I_3 + 3 E_{21} = \begin{pmatrix} 1 & 0 & 0 \\ 3 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \qquad T_{21}(3)^{-1} = T_{21}(-3) = \begin{pmatrix} 1 & 0 & 0 \\ -3 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}. $$

Method — Invert a matrix by row operations (Gauss-Jordan)

To compute $A^{-1}$ for $A \in M_n(\mathbb{K})$:

Form the augmented array $(A | I_n) \in M_{n, 2n}(\mathbb{K})$.
Apply elementary row operations to reduce $A$ to $I_n$. Apply the same operations to the right block.
If $A$ reduces to $I_n$: the right block has become $A^{-1}$. The array reads $(I_n | A^{-1})$.
If a zero row appears on the left: $A$ is not invertible. Stop.

Justification: each row operation amounts to left-multiplication by an invertible matrix (Theorem above). If $P_r \cdots P_1$ is the composite that takes $A$ to $I_n$, then $P_r \cdots P_1 \cdot A = I_n$, so $A^{-1} = P_r \cdots P_1 = P_r \cdots P_1 \cdot I_n$ --- exactly the right block at the end.

Example — Invert a $3 \times 3$ matrix by Gauss-Jordan

Compute the inverse of $A = \begin{pmatrix} 1 & 3 & 1 \\ 1 & 2 & 2 \\ 1 & 2 & 1 \end{pmatrix}$.

Answer

Form $(A | I_3)$ and reduce one elementary row operation at a time: $$ \begin{aligned} & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 1 & 2 & 2 & 0 & 1 & 0 \\ 1 & 2 & 1 & 0 & 0 & 1 \end{array} \right) && \text{(initial)} \\ \xrightarrow[L_2 \leftarrow L_2 - L_1]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 0 & -1 & 1 & -1 & 1 & 0 \\ 1 & 2 & 1 & 0 & 0 & 1 \end{array} \right) && \text{(transvection: zero out } a_{21}\text{)} \\ \xrightarrow[L_3 \leftarrow L_3 - L_1]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 0 & -1 & 1 & -1 & 1 & 0 \\ 0 & -1 & 0 & -1 & 0 & 1 \end{array} \right) && \text{(transvection: zero out } a_{31}\text{)} \\ \xrightarrow[L_3 \leftarrow L_3 - L_2]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 0 & -1 & 1 & -1 & 1 & 0 \\ 0 & 0 & -1 & 0 & -1 & 1 \end{array} \right) && \text{(transvection: zero out the new } a_{32}\text{)} \\ \xrightarrow[L_2 \leftarrow -L_2]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 0 & 1 & -1 & 1 & -1 & 0 \\ 0 & 0 & -1 & 0 & -1 & 1 \end{array} \right) && \text{(scaling by } -1\text{: pivot $1$ at $(2,2)$)} \\ \xrightarrow[L_3 \leftarrow -L_3]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 0 & 1 & -1 & 1 & -1 & 0 \\ 0 & 0 & 1 & 0 & 1 & -1 \end{array} \right) && \text{(scaling by } -1\text{: pivot $1$ at $(3,3)$)} \\ \xrightarrow[L_2 \leftarrow L_2 + L_3]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & 0 & 1 & -1 \end{array} \right) && \text{(transvection: zero out } a_{23}\text{ above the pivot)} \\ \xrightarrow[L_1 \leftarrow L_1 - L_3]{} \ & \left( \begin{array}{ccc|ccc} 1 & 3 & 0 & 1 & -1 & 1 \\ 0 & 1 & 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & 0 & 1 & -1 \end{array} \right) && \text{(transvection: zero out } a_{13}\text{ above the pivot)} \\ \xrightarrow[L_1 \leftarrow L_1 - 3 L_2]{} \ & \left( \begin{array}{ccc|ccc} 1 & 0 & 0 & -2 & -1 & 4 \\ 0 & 1 & 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & 0 & 1 & -1 \end{array} \right) && \text{(transvection: zero out } a_{12}\text{ above the pivot)}. \end{aligned} $$ The left block has become $I_3$, so $A^{-1} = \begin{pmatrix} -2 & -1 & 4 \\ 1 & 0 & -1 \\ 0 & 1 & -1 \end{pmatrix}$.
Verification. $A A^{-1} = \begin{pmatrix} 1 & 3 & 1 \\ 1 & 2 & 2 \\ 1 & 2 & 1 \end{pmatrix} \begin{pmatrix} -2 & -1 & 4 \\ 1 & 0 & -1 \\ 0 & 1 & -1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} = I_3$. $\checkmark$

Method — Compute the inverse by solving $A X \equal Y$

An alternative to Gauss-Jordan: introduce a generic right-hand side $Y = (y_1, \ldots, y_n)^\top$ and solve the system $A X = Y$ for $X$ in terms of $Y$. If the resolution produces a closed form $X = B Y$ valid for every $Y \in \mathbb{K}^n$, then $A B = B A = I_n$, hence $A^{-1} = B$. If the resolution shows that some choices of $Y$ make the system inconsistent, $A$ is not invertible.
Why it works: $A^{-1}$ is the unique matrix such that $A^{-1} Y$ solves $A X = Y$ for every $Y$. Reading off the linear dependence of $X$ on $Y$ is reading off the rows of $A^{-1}$.

Example — Inverse via $A X \equal Y$

Invert $A = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix}$ by solving $A X = Y$ for $X = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$ in terms of $Y = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix}$.

Answer

The system reads $$ \begin{cases} 2 x_1 + x_2 = y_1, \\ x_1 + x_2 = y_2. \end{cases} $$ Subtract: $x_1 = y_1 - y_2$. Substitute into the second: $x_2 = y_2 - x_1 = -y_1 + 2 y_2$. So $$ X = \begin{pmatrix} y_1 - y_2 \\ -y_1 + 2 y_2 \end{pmatrix} = \begin{pmatrix} 1 & -1 \\ -1 & 2 \end{pmatrix} Y, $$ which gives $A^{-1} = \begin{pmatrix} 1 & -1 \\ -1 & 2 \end{pmatrix}$. (The reader can check that this matches the closed-form $2 \times 2$ inverse with $\det A = 2 \cdot 1 - 1 \cdot 1 = 1$.)

Theorem — Triangular invertibility

Let $A \in M_n(\mathbb{K})$ be triangular (upper or lower). Then $A$ is invertible if and only if $\text{all its diagonal coefficients are non-zero}$. In that case, $A^{-1}$ is triangular of the same type, with diagonal coefficients $1 / a_{ii}$ for $i \in \llbracket 1, n \rrbracket$. Special case (diagonal). $\mathrm{diag}(\alpha_1, \ldots, \alpha_n)$ is invertible iff $\alpha_i \neq 0$ for every $i$, with inverse $\mathrm{diag}(1 / \alpha_1, \ldots, 1 / \alpha_n)$.

Proof

Treat the upper triangular case; the lower case follows by transposition (Proposition « operations on invertible matrices » above).

($\Rightarrow$) If $A$ is invertible, all diagonal coefficients are non-zero. Set $B = A^{-1}$. Both $A$ and $B$ are square of size $n$; we will read $A B = I_n$ row by row, from the bottom up, to extract $a_{nn} \ne 0, a_{n - 1, n - 1} \ne 0, \ldots, a_{11} \ne 0$. The bottom-right entry of $A B = I_n$ reads $\sum_{k = 1}^{n} a_{nk} b_{kn} = (I_n)_{nn} = 1$. Since $A$ is upper triangular, $a_{nk} = 0$ for $k < n$, and the sum collapses to $a_{nn} b_{nn} = 1$. In particular $a_{nn} \ne 0$ (and $b_{nn} = 1 / a_{nn}$). For descending induction, assume $a_{ii} \ne 0$ for $i = m + 1, \ldots, n$. Read entry $(m, m)$ of $A B = I_n$: $$ 1 = (A B)_{mm} = \sum_{k = 1}^{n} a_{mk} b_{km} = \sum_{k = m}^{n} a_{mk} b_{km}, $$ where the lower bound jumped to $k = m$ because $a_{mk} = 0$ for $k < m$ (upper triangularity). Reading entry $(i, m)$ for $i > m$ gives $0 = (A B)_{im} = \sum_{k = i}^{n} a_{ik} b_{km}$; with the previously known $a_{ii} \ne 0$ ($i = m + 1, \ldots, n$), descending induction on $i$ from $i = n$ down to $i = m + 1$ forces $b_{km} = 0$ for $k > m$. Substituting back into the entry-$(m, m)$ equation, only the term $k = m$ survives: $1 = a_{mm} b_{mm}$, so $a_{mm} \ne 0$. The induction step is complete; iterating to $m = 1$ gives $a_{ii} \ne 0$ for every $i$.
($\Leftarrow$) If all diagonal coefficients are non-zero, $A$ is invertible. Apply column operations to $A$ to reduce it to $I_n$, while the corresponding right-multiplications by elementary matrices accumulate into $A^{-1}$.
- Step 1. Apply $C_1 \leftarrow (1 / a_{11}) C_1$ (legal since $a_{11} \ne 0$) to put $1$ in position $(1, 1)$. Then for each $j \ge 2$, apply $C_j \leftarrow C_j - a_{1j} C_1$ to zero out the rest of row 1.
- Step $k$. The first $k - 1$ rows of the current matrix already match $I_n$ in their first $k - 1$ entries. The $k$-th diagonal entry is still $a_{kk}$ (column operations on earlier columns did not touch entry $(k, k)$ because those columns had a zero on row $k$ above the original diagonal --- upper triangularity). Apply $C_k \leftarrow (1 / a_{kk}) C_k$ then $C_j \leftarrow C_j - (\text{current}_{k j}) C_k$ for $j > k$.
- Conclusion. After $n$ steps, $A$ has been transformed into $I_n$ by right-multiplication by elementary (hence invertible) matrices. So $A$ is invertible (product of invertible matrices), and applying the same operations to $I_n$ yields $A^{-1}$. Each elementary operation in the above algorithm preserves the upper-triangular shape (column $j$ is modified only by columns of index $\le j$, which carry zeros strictly below row $j$), so $A^{-1}$ is upper triangular. Its diagonal coefficients are exactly $1 / a_{11}, \ldots, 1 / a_{nn}$ (each step rescales one diagonal entry by its inverse and leaves the rest fixed).

Method — Invert a triangular matrix

Given $A$ triangular (upper or lower):

Read the diagonal $(a_{11}, \ldots, a_{nn})$.
If any $a_{ii} = 0$: $A$ is not invertible. Stop.
Otherwise: $A^{-1}$ is triangular of the same type, with diagonal $(1 / a_{11}, \ldots, 1 / a_{nn})$. Compute the off-diagonal coefficients by Gauss-Jordan, restricting attention to the appropriate triangle.

Example — Invert an upper triangular $3 \times 3$ matrix

Invert $A = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 3 & 1 \\ 0 & 0 & 4 \end{pmatrix}$.

Answer

The diagonal coefficients $2, 3, 4$ are all non-zero, so $A$ is invertible. The inverse is upper triangular with diagonal $(1/2, 1/3, 1/4)$. Apply Gauss-Jordan one row operation at a time to find the off-diagonal entries: $$ \begin{aligned} & \left( \begin{array}{ccc|ccc} 2 & 1 & 0 & 1 & 0 & 0 \\ 0 & 3 & 1 & 0 & 1 & 0 \\ 0 & 0 & 4 & 0 & 0 & 1 \end{array} \right) && \text{(initial)} \\ \xrightarrow[L_1 \leftarrow L_1 / 2]{} \ & \left( \begin{array}{ccc|ccc} 1 & 1/2 & 0 & 1/2 & 0 & 0 \\ 0 & 3 & 1 & 0 & 1 & 0 \\ 0 & 0 & 4 & 0 & 0 & 1 \end{array} \right) && \text{(scale row 1 to put $1$ in position $(1,1)$)} \\ \xrightarrow[L_2 \leftarrow L_2 / 3]{} \ & \left( \begin{array}{ccc|ccc} 1 & 1/2 & 0 & 1/2 & 0 & 0 \\ 0 & 1 & 1/3 & 0 & 1/3 & 0 \\ 0 & 0 & 4 & 0 & 0 & 1 \end{array} \right) && \text{(scale row 2 to put $1$ in position $(2,2)$)} \\ \xrightarrow[L_3 \leftarrow L_3 / 4]{} \ & \left( \begin{array}{ccc|ccc} 1 & 1/2 & 0 & 1/2 & 0 & 0 \\ 0 & 1 & 1/3 & 0 & 1/3 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1/4 \end{array} \right) && \text{(scale row 3 to put $1$ in position $(3,3)$)} \\ \xrightarrow[L_2 \leftarrow L_2 - L_3 / 3]{} \ & \left( \begin{array}{ccc|ccc} 1 & 1/2 & 0 & 1/2 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1/3 & -1/12 \\ 0 & 0 & 1 & 0 & 0 & 1/4 \end{array} \right) && \text{(zero out } a_{23}\text{ above the pivot)} \\ \xrightarrow[L_1 \leftarrow L_1 - L_2 / 2]{} \ & \left( \begin{array}{ccc|ccc} 1 & 0 & 0 & 1/2 & -1/6 & 1/24 \\ 0 & 1 & 0 & 0 & 1/3 & -1/12 \\ 0 & 0 & 1 & 0 & 0 & 1/4 \end{array} \right) && \text{(zero out } a_{12}\text{ above the pivot)}. \end{aligned} $$ Hence $A^{-1} = \begin{pmatrix} 1/2 & -1/6 & 1/24 \\ 0 & 1/3 & -1/12 \\ 0 & 0 & 1/4 \end{pmatrix}$, upper triangular with the announced diagonal.

Skills to practice

Writing the matrix of an elementary row operation
Inverting a matrix by row operations
Inverting a triangular matrix