CommeUnJeu · L1 MPSI
Real pre-Hilbert spaces
At secondary school the scalar product was introduced through angles and lengths: \(\vec{u} \cdot \vec{v} = \|\vec{u}\| \|\vec{v}\| \cos(\widehat{\vec{u}, \vec{v}})\). In this chapter we invert that theory. We define the scalar product first as an abstract algebraic object on a real vector space, and we then derive from it the notions of norm, distance, orthogonality, and angle. The pay-off is that the same theory applies to function spaces \(C^0([a, b], \mathbb{R})\), polynomial spaces \(\mathbb{R}[X]\), matrix spaces \(\mathcal{M}_{n, p}(\mathbb{R})\) --- far beyond the geometric \(\mathbb{R}^2\) and \(\mathbb{R}^3\) context of the lycée.
Angle formula --- secondary-school recollection and preview. In \(\mathbb{R}^2\) and \(\mathbb{R}^3\) one remembers the formula \(\langle u, v \rangle = \|u\| \|v\| \cos \theta\). Later in this chapter, the Cauchy-Schwarz inequality below will give \(-1 \le \frac{\langle u, v \rangle}{\|u\| \|v\|} \le 1\) when \(u, v \ne 0\); the formula can then be inverted to define the angle by $$ \theta = \arccos\!\left( \frac{\langle u, v \rangle}{\|u\| \|v\|} \right). $$ At this stage this is only a preview --- not yet a definition --- since the right-hand side is not yet known to be in \([-1, 1]\). The figure below is a geometric recollection from secondary school, with \(\theta\) as the visible angle between \(\vec{u}\) and \(\vec{v}\).
Conventions for the chapter.
Angle formula --- secondary-school recollection and preview. In \(\mathbb{R}^2\) and \(\mathbb{R}^3\) one remembers the formula \(\langle u, v \rangle = \|u\| \|v\| \cos \theta\). Later in this chapter, the Cauchy-Schwarz inequality below will give \(-1 \le \frac{\langle u, v \rangle}{\|u\| \|v\|} \le 1\) when \(u, v \ne 0\); the formula can then be inverted to define the angle by $$ \theta = \arccos\!\left( \frac{\langle u, v \rangle}{\|u\| \|v\|} \right). $$ At this stage this is only a preview --- not yet a definition --- since the right-hand side is not yet known to be in \([-1, 1]\). The figure below is a geometric recollection from secondary school, with \(\theta\) as the visible angle between \(\vec{u}\) and \(\vec{v}\).
- Throughout, \(E\) is an \(\mathbb{R}\)-vector space, finite or infinite-dimensional unless stated otherwise.
- The scalar product is denoted \(\langle x, y \rangle\) throughout the chapter. The notations \((x | y)\) and \(x \cdot y\) are alternative; we mention them once in the Definition below and then keep \(\langle x, y \rangle\).
- The induced norm is \(\|x\| = \sqrt{\langle x, x \rangle}\); the distance is \(d(x, y) = \|x - y\|\).
- Orthogonality: \(x \perp y\) if and only if \(\langle x, y \rangle = 0\); \(X^\perp = \{ t \in E \mid \forall x \in X, \langle t, x \rangle = 0 \}\).
- Kronecker symbol: \(\delta_{i, j} = 1\) if \(i = j\), \(0\) otherwise.
I
Scalar product
I.1
Definition and first examples
A scalar product is an application \(E \times E \to \mathbb{R}\) that captures four algebraic properties: bilinearity (linearity in each variable), symmetry, positivity, and separation (or definiteness). The four properties are the bones of the theory; everything that follows --- norms, orthogonality, projections --- is derived from them. We start by isolating the four properties, then state two pedagogical Remarks (a verification shortcut and a Method for checking that a candidate is a scalar product), and finally we test the definition on four stacked examples: one canonical, two counter-examples (one fails separation, one fails positivity), and one valid non-canonical via a symmetric positive-definite matrix.
Definition — Scalar product
A scalar product on an \(\mathbb{R}\)-vector space \(E\) is an application \(\langle \cdot, \cdot \rangle : E \times E \to \mathbb{R}\) satisfying the four properties below. - Bilinear: for all \(x, x', y, y' \in E\) and \(\lambda, \mu \in \mathbb{R}\), $$ \langle \lambda x + \mu x', y \rangle = \lambda \langle x, y \rangle + \mu \langle x', y \rangle, \quad \langle x, \lambda y + \mu y' \rangle = \lambda \langle x, y \rangle + \mu \langle x, y' \rangle. $$
- Symmetric: for all \(x, y \in E\), \(\langle y, x \rangle = \langle x, y \rangle\).
- Positive: for all \(x \in E\), \(\langle x, x \rangle \ge 0\).
- Separated (or definite): for all \(x \in E\), \(\langle x, x \rangle = 0\) implies \(x = 0_E\).
Definition — Real pre-Hilbert space\(\virgule\) Euclidean space
A real pre-Hilbert space is an \(\mathbb{R}\)-vector space \(E\) equipped with a scalar product \(\langle \cdot, \cdot \rangle\). A Euclidean space is a real pre-Hilbert space of finite dimension.
Pedagogical shortcut. The four properties look like four independent checks. In practice, once symmetry is established, the linearity-in-the-first-variable axiom implies the linearity-in-the-second-variable axiom (by symmetry). So one usually proves symmetry first, then linearity in one variable only, and bilinearity is obtained for free. Positivity is then a direct sign check on \(\langle x, x \rangle\), and separation requires the « if \(\langle x, x \rangle = 0\) then \(x = 0_E\) » implication, often the most delicate step.
Method — Verify that an application is a scalar product
Given a candidate \(\varphi : E \times E \to \mathbb{R}\), check the four properties in the order below. - Step 1 --- Symmetry. Show \(\varphi(y, x) = \varphi(x, y)\) for all \(x, y \in E\).
- Step 2 --- Linearity in the first variable only. Show \(\varphi(\lambda x + \mu x', y) = \lambda \varphi(x, y) + \mu \varphi(x', y)\). By Step 1, linearity in the second variable follows automatically, hence bilinearity.
- Step 3 --- Positivity. Show \(\varphi(x, x) \ge 0\) for all \(x \in E\).
- Step 4 --- Separation. Show that \(\varphi(x, x) = 0\) implies \(x = 0_E\). This is often the only nontrivial step.
Example — Canonical preview on \(\mathbb{R}^2\)
The map \(\varphi((x_1, x_2), (y_1, y_2)) = x_1 y_1 + x_2 y_2\) is a scalar product on \(\mathbb{R}^2\). The four-step check: - Symmetry: \(\varphi(y, x) = y_1 x_1 + y_2 x_2 = x_1 y_1 + x_2 y_2 = \varphi(x, y)\).
- Linearity in the first variable: direct from the formula.
- Positivity: \(\varphi(x, x) = x_1^2 + x_2^2 \ge 0\).
- Separation: if \(\varphi(x, x) = x_1^2 + x_2^2 = 0\), then \(x_1 = x_2 = 0\), so \(x = 0_{\mathbb{R}^2}\).
Example — Counter-example: lack of separation
The map \(\varphi((x_1, x_2), (y_1, y_2)) = x_1 y_1\) on \(\mathbb{R}^2\) is bilinear, symmetric, and positive, but not separated: \(\varphi((0, 1), (0, 1)) = 0 \cdot 0 = 0\) while \((0, 1) \ne 0_{\mathbb{R}^2}\). The « positive semi-definite » bilinear form \(\varphi\) is therefore not a scalar product. The separation axiom is the one that fails: \(\varphi\) « forgets » the second coordinate. Example — Counter-example: lack of positivity
The map \(\varphi((x_1, x_2), (y_1, y_2)) = x_1 y_1 - x_2 y_2\) on \(\mathbb{R}^2\) is bilinear and symmetric, but not positive: \(\varphi((0, 1), (0, 1)) = -1 < 0\). So \(\varphi\) is not a scalar product. Such a symmetric bilinear form is indefinite, not positive definite. Example — Valid non-canonical scalar product via an SPD matrix
The map \(\varphi(X, Y) = X^\top \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} Y\) is a scalar product on \(\mathbb{R}^2\). The matrix is symmetric positive definite (SPD), and bilinearity, symmetry are direct. For positivity and separation, expand for \(X = (x_1, x_2)^\top\): $$ X^\top \begin{pmatrix} 2 & 1 \\
1 & 2 \end{pmatrix} X = 2 x_1^2 + 2 x_1 x_2 + 2 x_2^2 = x_1^2 + x_2^2 + (x_1 + x_2)^2. $$ The right-hand side is a sum of three squares, so \(\ge 0\) (positivity); it vanishes only when \(x_1 = x_2 = x_1 + x_2 = 0\), hence \(X = 0\) (separation). This example will reappear in the orthonormal-families subsection below to show that the canonical basis of \(\mathbb{R}^2\) is not orthonormal for this \(\varphi\). Skills to practice
- Checking the four axioms
I.2
Canonical scalar products on \(\mathbb{R}^n\) and \(\mathcal{M}_{n\virgule p}(\mathbb{R})\)
We promote the \(\mathbb{R}^2\) preview of the previous subsection to general \(\mathbb{R}^n\) and to general matrix spaces \(\mathcal{M}_{n, p}(\mathbb{R})\), with one canonical scalar product on each. These two products recover the secondary-school dot-product as a special case and are the working scalar products of the projection-and-distance section below.
Proposition — Canonical scalar product on \(\mathbb{R}^n\)
The application $$ \textcolor{colorprop}{(X, Y) \longmapsto X^\top Y = \sum_{i = 1}^n x_i y_i} $$ is a scalar product on \(\mathbb{R}^n\), called the canonical scalar product. Here \(X = (x_1, \dots, x_n)^\top\) and \(Y = (y_1, \dots, y_n)^\top\) are viewed as column vectors, so \(X^\top Y\) is a \(1 \times 1\) matrix identified with its single entry.
Apply the Method above, four steps.
- Symmetry. \(X^\top Y\) is a \(1 \times 1\) matrix, hence equal to its transpose: \(X^\top Y = (X^\top Y)^\top = Y^\top X\).
- Linearity in the first variable. For \(\lambda \in \mathbb{R}\), \(X, X', Y \in \mathbb{R}^n\), \((\lambda X + X')^\top Y = \lambda X^\top Y + (X')^\top Y\) by linearity of matrix product in the first factor.
- Positivity. \(X^\top X = \sum_{i = 1}^n x_i^2 \ge 0\).
- Separation. If \(X^\top X = \sum x_i^2 = 0\), then each \(x_i^2 = 0\) (sum of non-negatives), so each \(x_i = 0\), so \(X = 0_{\mathbb{R}^n}\).
Example — Numerical illustration on \(\mathbb{R}^3\)
On \(\mathbb{R}^3\) with the canonical scalar product, \(\langle (1, 2, 3), (4, 5, 6) \rangle = 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 4 + 10 + 18 = 32\). Proposition — Canonical scalar product on \(\mathcal{M}_{n\virgule p}(\mathbb{R})\)
The application $$ \textcolor{colorprop}{(A, B) \longmapsto \operatorname{tr}(A^\top B) = \sum_{1 \le i \le n,\ 1 \le j \le p} a_{i, j} b_{i, j}} $$ is a scalar product on \(\mathcal{M}_{n, p}(\mathbb{R})\), called the canonical (Frobenius) scalar product.
First the equality \(\operatorname{tr}(A^\top B) = \sum_{i, j} a_{i, j} b_{i, j}\): by the definition of matrix product, \((A^\top B)_{j, j} = \sum_{i = 1}^n (A^\top)_{j, i} B_{i, j} = \sum_{i = 1}^n a_{i, j} b_{i, j}\), then sum over \(j\) for the trace. Now the four-step Method.
- Symmetry. \(\operatorname{tr}(A^\top B) = \operatorname{tr}((A^\top B)^\top) = \operatorname{tr}(B^\top A)\), using \(\operatorname{tr}(M^\top) = \operatorname{tr}(M)\).
- Linearity in the first variable. For \(\lambda \in \mathbb{R}\) and \(A, A', B \in \mathcal{M}_{n, p}(\mathbb{R})\), \((\lambda A + A')^\top = \lambda A^\top + (A')^\top\) by linearity of transposition, then \(\operatorname{tr}((\lambda A^\top + (A')^\top) B) = \lambda \operatorname{tr}(A^\top B) + \operatorname{tr}((A')^\top B)\) by linearity of trace and matrix product.
- Positivity. \(\operatorname{tr}(A^\top A) = \sum_{i, j} a_{i, j}^2 \ge 0\).
- Separation. If \(\sum_{i, j} a_{i, j}^2 = 0\), each \(a_{i, j}^2 = 0\), so each \(a_{i, j} = 0\), so \(A = 0_{\mathcal{M}_{n, p}(\mathbb{R})}\).
Skills to practice
- Computing canonical scalar products
I.3
Integral scalar product on continuous functions on a segment
We break the lycée-only mental model with an infinite-dimensional example: the integral scalar product on the space \(C^0([a, b], \mathbb{R})\) of continuous functions on a segment. The key separation argument is the lemma from Integration on a segment: if a continuous non-negative function on \([a, b]\) has zero integral, it is identically zero.
Proposition — Integral scalar product
Let \(a, b \in \mathbb{R}\) with \(a < b\). The application $$ \textcolor{colorprop}{(f, g) \longmapsto \langle f, g \rangle = \int_a^b f(t) g(t)\, \mathrm{d}t} $$ is a scalar product on \(C^0([a, b], \mathbb{R})\).
Apply the four-step Method.
- Symmetry. \(\int_a^b f(t) g(t)\, \mathrm{d}t = \int_a^b g(t) f(t)\, \mathrm{d}t\).
- Linearity in the first variable. By linearity of the integral with respect to its integrand.
- Positivity. For \(f \in C^0([a, b], \mathbb{R})\), \(\int_a^b f(t)^2\, \mathrm{d}t \ge 0\) since \(f(t)^2 \ge 0\) for all \(t\).
- Separation. If \(\int_a^b f(t)^2\, \mathrm{d}t = 0\), since \(f^2\) is continuous and non-negative on \([a, b]\), the prerequisite lemma from Integration on a segment gives \(f^2 = 0\) on \([a, b]\), hence \(f = 0\) on \([a, b]\).
Caveat. The space \(C^0([a, b], \mathbb{R})\) is real pre-Hilbert but not Euclidean, because it has infinite dimension (the family \((t \mapsto t^n)_{n \in \mathbb{N}}\) is free, hence the dimension is \(\ge n\) for every \(n\)). All notions of scalar product, norm, and orthogonality (defined above) still apply --- we just cannot use the finite-dim-specific Theorems on orthonormal bases and orthogonal projection (below) directly on the whole space \(C^0([a, b], \mathbb{R})\) ; they will apply to its finite-dim sub-spaces.
Example — Integral scalar product on the polynomial ring
Let \(a < b\). The formula \(\langle P, Q \rangle = \int_a^b P(t) Q(t)\, \mathrm{d}t\) defines a scalar product on \(\mathbb{R}[X]\). Indeed, every polynomial restricts to a continuous function on \([a, b]\), so \(\mathbb{R}[X] \subset C^0([a, b], \mathbb{R})\) by restriction, and the four properties checked in the previous Proposition transfer. The only non-trivial point is separation: if \(\int_a^b P(t)^2\, \mathrm{d}t = 0\), then \(P\) vanishes on \([a, b]\) (by the continuous-positivity lemma), so \(P\) has infinitely many roots and is therefore the zero polynomial.This is the working scalar product behind the orthogonal-polynomial families (Legendre on \([-1, 1]\), etc.) that appear in the exo.
Example — Variant on polynomial space via evaluation points
Let \(x_0, \dots, x_n \in \mathbb{R}\) be \(n + 1\) pairwise distinct points. The application \((P, Q) \mapsto \sum_{k = 0}^n P(x_k) Q(x_k)\) is a scalar product on \(\mathbb{R}_n[X]\). Symmetry, bilinearity, positivity are immediate. Separation: if \(\sum P(x_k)^2 = 0\), then each \(P(x_k) = 0\), so \(P\) has \(n + 1\) distinct roots; since \(\deg P \le n\), this forces \(P = 0\). Skills to practice
- Computing integral scalar products
II
Norm associated with a scalar product
II.1
Norm\(\virgule\) distance\(\virgule\) and basic properties
The scalar product induces a candidate norm, a candidate length, a candidate distance. We define them here as derived objects; the four norm-axioms (non-negativity, separation, homogeneity, triangle inequality) are then checked step by step --- the last axiom (triangle inequality) requires the Cauchy-Schwarz inequality below, so we state the first three explicitly now and defer the triangle inequality to the dedicated subsection below.
Definition — Associated norm\(\virgule\) unit vector\(\virgule\) distance
Let \((E, \langle \cdot, \cdot \rangle)\) be a real pre-Hilbert space. For \(x, y \in E\): - the associated norm of \(x\) is \(\|x\| = \sqrt{\langle x, x \rangle}\);
- \(x\) is called unit vector (or normalised vector) if \(\|x\| = 1\);
- the distance between \(x\) and \(y\) is \(d(x, y) = \|x - y\|\).
Proposition — Basic norm properties
For all \(x \in E\) and all \(\lambda \in \mathbb{R}\): - (i) \textcolor{colorprop}{\(\|x\| \ge 0\)} (non-negativity).
- (ii) \textcolor{colorprop}{\(\|x\| = 0 \iff x = 0_E\)} (separation of the norm).
- (iii) \textcolor{colorprop}{\(\|\lambda x\| = |\lambda| \cdot \|x\|\)} (positive homogeneity).
- (i) By definition \(\|x\| = \sqrt{\langle x, x \rangle}\); the square root is non-negative.
- (ii) \(\|x\|^2 = \langle x, x \rangle = 0 \iff x = 0_E\) by the separation axiom of the scalar product.
- (iii) \(\|\lambda x\|^2 = \langle \lambda x, \lambda x \rangle = \lambda^2 \langle x, x \rangle = \lambda^2 \|x\|^2\) by bilinearity; taking the square root, \(\|\lambda x\| = \sqrt{\lambda^2} \cdot \|x\| = |\lambda| \cdot \|x\|\).
Caveat. The distance \(d(x, y) = \|x - y\|\) depends on the choice of scalar product. For the non-canonical scalar product \(\varphi(X, Y) = X^\top \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} Y\) introduced earlier, \(\|(1, 0)\|^2 = 2\), hence \(\|(1, 0)\| = \sqrt{2} \ne 1\). The same vector has different norms in different pre-Hilbert structures on the same underlying vector space.
Skills to practice
- Computing norms in various scalar products
II.2
Remarkable identity and polarisation formula
Bilinearity gives algebraic identities relating the norm of a sum and the scalar product. The first is the remarkable identity --- a direct expansion. The second is its inversion: starting from the norm alone, we can recover the scalar product. This is the polarisation formula.
Proposition — Remarkable identity
For all \(x, y \in E\): $$ \textcolor{colorprop}{\|x + y\|^2 = \|x\|^2 + 2 \langle x, y \rangle + \|y\|^2}, \qquad \textcolor{colorprop}{\langle x + y, x - y \rangle = \|x\|^2 - \|y\|^2}. $$ - First identity. $$ \begin{aligned} \|x + y\|^2 &= \langle x + y, x + y \rangle && \text{(definition of \(\|\cdot\|\))} \\ &= \langle x, x \rangle + \langle x, y \rangle + \langle y, x \rangle + \langle y, y \rangle && \text{(bilinearity)} \\ &= \|x\|^2 + 2 \langle x, y \rangle + \|y\|^2 && \text{(symmetry: \(\langle y, x \rangle = \langle x, y \rangle\)).} \end{aligned} $$
- Second identity. $$ \begin{aligned} \langle x + y, x - y \rangle &= \langle x, x \rangle - \langle x, y \rangle + \langle y, x \rangle - \langle y, y \rangle && \text{(bilinearity)} \\ &= \|x\|^2 - \|y\|^2 && \text{(symmetry cancels the cross terms).} \end{aligned} $$
Proposition — Polarisation formulas
For all \(x, y \in E\): $$ \textcolor{colorprop}{\langle x, y \rangle = \frac{1}{2} \big( \|x + y\|^2 - \|x\|^2 - \|y\|^2 \big) = \frac{1}{4} \big( \|x + y\|^2 - \|x - y\|^2 \big)}. $$ - First formula. Isolate \(\langle x, y \rangle\) from the first remarkable identity: \(2 \langle x, y \rangle = \|x + y\|^2 - \|x\|^2 - \|y\|^2\), then divide by \(2\).
- Second formula. Apply the first remarkable identity with \(-y\) in place of \(y\): \(\|x - y\|^2 = \|x\|^2 - 2 \langle x, y \rangle + \|y\|^2\). Subtract this from the original: \(\|x + y\|^2 - \|x - y\|^2 = 4 \langle x, y \rangle\), then divide by \(4\).
Parallelogram identity (one-line corollary). Adding the remarkable identity for \(\|x + y\|^2\) and for \(\|x - y\|^2\): $$ \|x + y\|^2 + \|x - y\|^2 = 2 \big( \|x\|^2 + \|y\|^2 \big). $$ Geometric interpretation: in any real pre-Hilbert space, the sum of the squared diagonals of a parallelogram equals twice the sum of the squared sides.
Skills to practice
- Using the parallelogram identity
II.3
Cauchy-Schwarz inequality and equality case
We now establish the central inequality of the chapter: \(|\langle x, y \rangle| \le \|x\| \|y\|\). The proof technique --- standard at this level --- is to study the function \(\varphi(t) = \|x + t y\|^2\) as a real polynomial in \(t\) of degree at most \(2\), exploit its non-negativity on \(\mathbb{R}\), and read off the discriminant condition. The equality case characterises colinearity.
Theorem — Cauchy-Schwarz inequality
For all \(x, y \in E\): $$ \textcolor{colorprop}{|\langle x, y \rangle| \le \|x\| \cdot \|y\|}, $$ with equality if and only if \(x\) and \(y\) are colinear.Convention. \(0_E\) is considered colinear with every vector of \(E\); this makes the equality case correct without exception when \(x = 0_E\) or \(y = 0_E\).
Define \(\varphi : \mathbb{R} \to \mathbb{R}\) by \(\varphi(t) = \|x + t y\|^2\). Expanding via the remarkable identity: $$ \varphi(t) = \|x\|^2 + 2 t \langle x, y \rangle + t^2 \|y\|^2. $$ This is a polynomial in \(t\) of degree at most \(2\). By positivity of \(\|\cdot\|^2\), \(\varphi(t) \ge 0\) for all \(t \in \mathbb{R}\). We split into two cases.
Equality case (case 2), converse. Conversely, if \(x, y\) are colinear with \(y \ne 0_E\), then \(x = \lambda y\) for some \(\lambda \in \mathbb{R}\). Computing \(\varphi(-\lambda) = \|x - \lambda y\|^2 = 0\) exhibits a real root of \(\varphi\), so \(\Delta \ge 0\); combined with the always-true \(\Delta \le 0\), we get \(\Delta = 0\), hence \(|\langle x, y \rangle|^2 = \|x\|^2 \|y\|^2\), i.e. Cauchy-Schwarz equality.
- Case 1: \(\|y\|^2 = 0\). Then \(y = 0_E\) by separation. We have \(\langle x, y \rangle = 0\) and \(\|y\| = 0\), so both sides of Cauchy-Schwarz are zero --- equality holds. By the convention, \(x\) and \(y = 0_E\) are colinear.
- Case 2: \(\|y\|^2 \ne 0\). Then \(\varphi\) is a polynomial of degree exactly \(2\) in \(t\), with leading coefficient \(\|y\|^2 > 0\), and \(\varphi(t) \ge 0\) for all \(t \in \mathbb{R}\). Its discriminant satisfies $$ \Delta = 4 \langle x, y \rangle^2 - 4 \|x\|^2 \|y\|^2 \le 0, $$ which gives \(\langle x, y \rangle^2 \le \|x\|^2 \|y\|^2\), hence \(|\langle x, y \rangle| \le \|x\| \|y\|\).
Equality case (case 2), converse. Conversely, if \(x, y\) are colinear with \(y \ne 0_E\), then \(x = \lambda y\) for some \(\lambda \in \mathbb{R}\). Computing \(\varphi(-\lambda) = \|x - \lambda y\|^2 = 0\) exhibits a real root of \(\varphi\), so \(\Delta \ge 0\); combined with the always-true \(\Delta \le 0\), we get \(\Delta = 0\), hence \(|\langle x, y \rangle|^2 = \|x\|^2 \|y\|^2\), i.e. Cauchy-Schwarz equality.
Method — Apply Cauchy-Schwarz to derive an inequality
To derive an inequality between sums or integrals by Cauchy-Schwarz: - Step 1. Identify the candidate pre-Hilbert space and its scalar product (typically \(\mathbb{R}^n\) canonical, or \(C^0([a, b], \mathbb{R})\) integral).
- Step 2. Choose two vectors \(x, y\) in that space whose scalar product \(\langle x, y \rangle\) equals the expression to bound, and whose norms \(\|x\|, \|y\|\) are computable.
- Step 3. Write \(|\langle x, y \rangle| \le \|x\| \cdot \|y\|\) and square or simplify as needed.
Example — Sum of values bounded by the sum of squares
For all \(x_1, \dots, x_n \in \mathbb{R}\), \(\big( \sum_{k = 1}^n x_k \big)^2 \le n \sum_{k = 1}^n x_k^2\), with equality if and only if \(x_1 = \dots = x_n\).
Apply Cauchy-Schwarz on \(\mathbb{R}^n\) canonical to \(X = (x_1, \dots, x_n)\) and \(Y = (1, \dots, 1)\): $$ \langle X, Y \rangle = \sum_{k = 1}^n x_k, \qquad \|X\|^2 = \sum_{k = 1}^n x_k^2, \qquad \|Y\|^2 = n. $$ Cauchy-Schwarz: \(|\langle X, Y \rangle| \le \|X\| \|Y\|\), then square: \(\big( \sum x_k \big)^2 \le n \sum x_k^2\). Equality iff \(X\) and \(Y\) colinear, i.e. \(X = \lambda Y = (\lambda, \dots, \lambda)\) for some \(\lambda \in \mathbb{R}\), i.e. \(x_1 = \dots = x_n\).
Skills to practice
- Applying classical Cauchy-Schwarz inequalities
II.4
Triangle inequality and equality cases
Cauchy-Schwarz now lets us prove the triangle inequality, the last missing norm-axiom. We state both bounds (the usual one and its reverse, sometimes called the inverse triangle inequality), and we describe both equality cases: same-sense colinear for the upper bound, opposite-sense colinear for the lower bound.
Theorem — Triangle inequality
For all \(x, y \in E\): $$ \textcolor{colorprop}{\big| \|x\| - \|y\| \big| \le \|x + y\| \le \|x\| + \|y\|}. $$ Equality cases. - Right equality \(\|x + y\| = \|x\| + \|y\|\) holds if and only if \(x\) and \(y\) are positively colinear, i.e. one of them is a non-negative scalar multiple of the other.
- Left equality \(\|x + y\| = \big| \|x\| - \|y\| \big|\) holds if and only if \(x\) and \(y\) are negatively colinear, i.e. one of them is a non-positive scalar multiple of the other.
- Right inequality. Square and apply Cauchy-Schwarz: $$ \begin{aligned} \|x + y\|^2 &= \|x\|^2 + 2 \langle x, y \rangle + \|y\|^2 && \text{(remarkable identity)} \\ &\le \|x\|^2 + 2 \|x\| \|y\| + \|y\|^2 && \text{(\(\langle x, y \rangle \le |\langle x, y \rangle| \le \|x\| \|y\|\))} \\ &= (\|x\| + \|y\|)^2. \end{aligned} $$ Taking the non-negative square root: \(\|x + y\| \le \|x\| + \|y\|\).
- Right-equality case. \(\|x + y\| = \|x\| + \|y\|\) iff equality at the Cauchy-Schwarz step, i.e. \(\langle x, y \rangle = \|x\| \|y\|\). Cauchy-Schwarz equality gives \(x, y\) colinear: say \(y = \lambda x\). Then \(\langle x, y \rangle = \lambda \|x\|^2\) and \(\|x\| \|y\| = |\lambda| \|x\|^2\). Equality forces \(\lambda = |\lambda|\), i.e. \(\lambda \ge 0\) (or \(x = 0_E\)). So \(x, y\) are positively colinear.
- Left inequality. Apply the right inequality to \(x + y\) and \(-y\): \(\|x\| = \|(x + y) + (-y)\| \le \|x + y\| + \|y\|\), so \(\|x\| - \|y\| \le \|x + y\|\). Similarly, swapping roles, \(\|y\| - \|x\| \le \|x + y\|\). Together: \(\big| \|x\| - \|y\| \big| \le \|x + y\|\).
- Left-equality case. Suppose without loss of generality \(\|x\| \ge \|y\|\) (the other case is symmetric). Equality in the left inequality means \(\|x\| - \|y\| = \|x + y\|\), i.e. \(\|x\| = \|x + y\| + \|y\|\). This is exactly equality in the triangle inequality applied to the decomposition \(x = (x + y) + (-y)\). By the right-equality case (already proved above), \(x + y\) and \(-y\) are positively colinear: \(x + y = \mu (-y)\) for some \(\mu \ge 0\), hence \(x = -(1 + \mu) y\) with \(1 + \mu \ge 0\). So \(x\) is a non-positive scalar multiple of \(y\), i.e. \(x\) and \(y\) are negatively colinear.
Corollary --- the associated norm is a norm; the distance is a distance. Combining the basic norm properties established earlier (non-negativity, separation, homogeneity) with the triangle inequality just proved, the application \(\|\cdot\| : E \to \mathbb{R}_+\) satisfies the four norm axioms in the sense of normed-vector-space theory. The application \(d(x, y) = \|x - y\|\) inherits the corresponding distance axioms. From now on we call \(\|\cdot\|\) « the norm » and \(d\) « the distance » without further caveat.
Skills to practice
- Bounding norms with the triangle inequality
III
Orthogonality
III.1
Orthogonal vectors and parts
Our geometric theory has its head upside-down. In secondary school, orthogonality was primary --- defined by right angles between sticks --- and the scalar product was a derived computation. Here we invert: the scalar product is primary, and orthogonality is its zero-set, \(x \perp y\) if and only if \(\langle x, y \rangle = 0\). In particular, each scalar product brings its own notion of orthogonality, and the canonical basis of \(\mathbb{R}^2\) may be orthonormal for one scalar product and not for another.
Definition — Orthogonal vectors\(\virgule\) orthogonal parts
Let \((E, \langle \cdot, \cdot \rangle)\) be a real pre-Hilbert space, let \(x, y \in E\), and let \(X, Y\) be parts of \(E\). - \(x\) and \(y\) are orthogonal, written \(x \perp y\), if \(\langle x, y \rangle = 0\).
- \(X\) and \(Y\) are orthogonal, written \(X \perp Y\), if for all \(x \in X\) and \(y \in Y\), \(\langle x, y \rangle = 0\).
Example — Orthogonal vectors in \(\mathbb{R}^3\) canonical
On \(\mathbb{R}^3\) with the canonical scalar product, \((1, 2, 3)\) and \((2, -1, 0)\) are orthogonal since \(\langle (1, 2, 3), (2, -1, 0) \rangle = 1 \cdot 2 + 2 \cdot (-1) + 3 \cdot 0 = 0\). By contrast, \((1, 2, 3)\) and \((1, 1, 1)\) are not orthogonal since \(\langle (1, 2, 3), (1, 1, 1) \rangle = 6 \ne 0\). Example — Orthogonal parts in matrix space
On \(\mathcal{M}_n(\mathbb{R})\) with the Frobenius scalar product, let \(X\) be the set of diagonal matrices with zero trace and \(Y\) the set of strictly upper-triangular matrices. Then \(X \perp Y\): for any diagonal \(D \in X\) and any strictly upper-triangular \(U \in Y\), \(D^\top U = D U\) is strictly upper-triangular (its diagonal is zero), so \(\langle D, U \rangle = \operatorname{tr}(D U) = 0\). The parts \(X\) and \(Y\) as a whole are orthogonal, even though neither is reduced to \(\{0_E\}\).
Key remark. Among all vectors of \(E\), only the null vector \(0_E\) is orthogonal to itself, and only the null vector is orthogonal to every vector of \(E\). Indeed, \(x \perp x\) means \(\langle x, x \rangle = 0\), which forces \(x = 0_E\) by the separation axiom. And if \(x\) is orthogonal to every \(y \in E\), take \(y = x\) to get \(\langle x, x \rangle = 0\), hence \(x = 0_E\). This twin property of \(0_E\) is used constantly.
Skills to practice
- Checking orthogonality
III.2
Orthogonal and orthonormal families\(\virgule\) Pythagoras
We extend orthogonality to indexed families and state the most useful consequence: the Pythagoras Theorem in its generalised \(n\)-aire form, plus the fact that orthogonal families of nonzero vectors are automatically free. Both will power Gram-Schmidt and the orthogonal-projection theory developed later in this chapter.
Definition — Orthogonal and orthonormal families
Let \((x_i)_{i \in I}\) be a family of vectors of \(E\). - \((x_i)_{i \in I}\) is orthogonal if for all \(i, j \in I\) with \(i \ne j\), \(\langle x_i, x_j \rangle = 0\).
- \((x_i)_{i \in I}\) is orthonormal if it is orthogonal and each \(x_i\) has unit norm; equivalently, for all \(i, j \in I\), \(\langle x_i, x_j \rangle = \delta_{i, j}\).
Example — Canonical basis of \(\mathbb{R}^n\) is orthonormal
For the canonical scalar product on \(\mathbb{R}^n\), the canonical basis \((E_1, \dots, E_n)\) satisfies \(E_i^\top E_j = \delta_{i, j}\): the \(E_i\) are pairwise orthogonal and each has unit norm. Example — A non-canonical scalar product breaks the orthonormality
For the non-canonical scalar product \(\varphi(X, Y) = X^\top \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} Y\) introduced earlier in this chapter, the canonical basis of \(\mathbb{R}^2\) is not orthonormal: \(\varphi(E_1, E_1) = 2 \ne 1\), and \(\varphi(E_1, E_2) = 1 \ne 0\). The pair \(\left( \frac{(1, 0)}{\sqrt{2}}, \frac{(1, -2)}{\sqrt{6}} \right)\) is orthonormal for \(\varphi\), as a direct \(2 \times 2\) check shows. Example — Sine family on continuous functions
For the rescaled integral product \(\langle f, g \rangle = \frac{1}{\pi} \int_0^{2\pi} f(t) g(t)\, \mathrm{d}t\) on \(C^0([0, 2\pi], \mathbb{R})\), the family \(s_n : t \mapsto \sin(n t)\) for \(n \in \mathbb{N}^*\) is orthonormal. By standard trig integrals, \(\|s_n\|^2 = \frac{1}{\pi} \int_0^{2\pi} \sin^2(nt)\, \mathrm{d}t = 1\), and for \(m \ne n\), \(\langle s_m, s_n \rangle = \frac{1}{2\pi} \int_0^{2\pi} \big[ \cos((m-n)t) - \cos((m+n)t) \big]\, \mathrm{d}t = 0\). Theorem — Pythagoras
Let \(x, y \in E\). Then \(x \perp y\) if and only if \(\|x + y\|^2 = \|x\|^2 + \|y\|^2\).More generally, for an orthogonal family \((x_1, \dots, x_n)\) of \(E\): $$ \textcolor{colorprop}{\Big\| \sum_{i = 1}^n x_i \Big\|^2 = \sum_{i = 1}^n \|x_i\|^2}. $$
For the binary form: by the remarkable identity, \(\|x + y\|^2 = \|x\|^2 + 2 \langle x, y \rangle + \|y\|^2\). The cross term \(2 \langle x, y \rangle\) vanishes if and only if \(\langle x, y \rangle = 0\), i.e. \(x \perp y\).
For the \(n\)-aire form: expand by bilinearity: $$ \begin{aligned} \Big\| \sum_{i = 1}^n x_i \Big\|^2 &= \Big\langle \sum_{i = 1}^n x_i, \sum_{j = 1}^n x_j \Big\rangle && \text{(definition of \(\|\cdot\|^2\))} \\ &= \sum_{i = 1}^n \sum_{j = 1}^n \langle x_i, x_j \rangle && \text{(bilinearity)} \\ &= \sum_{i = 1}^n \|x_i\|^2 + 2 \sum_{1 \le i < j \le n} \langle x_i, x_j \rangle && \text{(split diagonal terms via \(\langle x_j, x_i \rangle = \langle x_i, x_j \rangle\))} \\ &= \sum_{i = 1}^n \|x_i\|^2 && \text{(orthogonality kills cross terms).} \end{aligned} $$
For the \(n\)-aire form: expand by bilinearity: $$ \begin{aligned} \Big\| \sum_{i = 1}^n x_i \Big\|^2 &= \Big\langle \sum_{i = 1}^n x_i, \sum_{j = 1}^n x_j \Big\rangle && \text{(definition of \(\|\cdot\|^2\))} \\ &= \sum_{i = 1}^n \sum_{j = 1}^n \langle x_i, x_j \rangle && \text{(bilinearity)} \\ &= \sum_{i = 1}^n \|x_i\|^2 + 2 \sum_{1 \le i < j \le n} \langle x_i, x_j \rangle && \text{(split diagonal terms via \(\langle x_j, x_i \rangle = \langle x_i, x_j \rangle\))} \\ &= \sum_{i = 1}^n \|x_i\|^2 && \text{(orthogonality kills cross terms).} \end{aligned} $$
Proposition — Orthogonal family of nonzero vectors is free
Let \((x_1, \dots, x_n)\) be an orthogonal family of nonzero vectors of \(E\). Then \((x_1, \dots, x_n)\) is free (linearly independent). In particular, every orthonormal family is free.
Suppose \(\sum_{i = 1}^n \lambda_i x_i = 0_E\) for some \(\lambda_1, \dots, \lambda_n \in \mathbb{R}\). Fix \(j \in \{1, \dots, n\}\) and take the scalar product with \(x_j\): $$ 0 = \langle 0_E, x_j \rangle = \Big\langle \sum_{i = 1}^n \lambda_i x_i, x_j \Big\rangle = \sum_{i = 1}^n \lambda_i \langle x_i, x_j \rangle = \lambda_j \|x_j\|^2, $$ where the last equality uses orthogonality \(\langle x_i, x_j \rangle = 0\) for \(i \ne j\). Since \(x_j \ne 0_E\), \(\|x_j\|^2 > 0\) by separation, hence \(\lambda_j = 0\). This holds for every \(j\), so the family is free.
Skills to practice
- Using Pythagoras and verifying orthonormal families
III.3
Orthogonal of a part
The orthogonal of a part \(X\) packages all vectors perpendicular to everything in \(X\). We will see that this packaging is automatically a sub-space, and that it is monotone (larger \(X\) gives smaller \(X^\perp\)). These structural properties are used continuously in exercises: \((F + G)^\perp = F^\perp \cap G^\perp\), computation of orthogonals of explicit sub-spaces, characterisation of hyperplanes, etc.
Definition — Orthogonal of a part
Let \(X\) be a part of \(E\). The orthogonal of \(X\) is $$ \textcolor{colordef}{X^\perp = \{ t \in E \mid \forall x \in X, \langle t, x \rangle = 0 \}}. $$ Proposition — Properties of \(X^\perp\)
Let \(X, Y\) be parts of \(E\). - (i) \(X^\perp\) is a sub-space of \(E\), orthogonal to \(X\).
- (ii) \(X \cap X^\perp \subset \{ 0_E \}\) (for any part \(X\)). When \(X\) is a sub-space containing \(0_E\), this becomes \(X \cap X^\perp = \{ 0_E \}\).
- (iii) \(X^\perp = \operatorname{Vect}(X)^\perp\), and \(X \subset X^{\perp\perp}\).
- (iv) Monotonicity: if \(X \subset Y\), then \(Y^\perp \subset X^\perp\).
- (i) Sub-space. \(0_E \in X^\perp\) since \(\langle 0_E, x \rangle = 0\) for all \(x \in X\). For closure: for \(t, t' \in X^\perp\) and \(\lambda \in \mathbb{R}\), \(\langle \lambda t + t', x \rangle = \lambda \langle t, x \rangle + \langle t', x \rangle = 0 + 0 = 0\) for all \(x \in X\), so \(\lambda t + t' \in X^\perp\).
- (ii) \(X \cap X^\perp \subset \{ 0_E \}\). If \(x \in X \cap X^\perp\), then \(x \in X\) and \(x \perp X\), so in particular \(x \perp x\), i.e. \(\langle x, x \rangle = 0\), i.e. \(x = 0_E\) by separation. When \(X\) is a sub-space containing \(0_E\), the inclusion becomes equality.
- (iii) \(X^\perp = \operatorname{Vect}(X)^\perp\). If \(t \in X^\perp\) and \(y \in \operatorname{Vect}(X)\), write \(y = \sum \mu_i x_i\) with \(x_i \in X\); then \(\langle t, y \rangle = \sum \mu_i \langle t, x_i \rangle = 0\), so \(t \in \operatorname{Vect}(X)^\perp\). Conversely, if \(t \in \operatorname{Vect}(X)^\perp\), then for every \(x \in X \subset \operatorname{Vect}(X)\), \(\langle t, x \rangle = 0\); hence \(t \in X^\perp\). For \(X \subset X^{\perp\perp}\): every \(x \in X\) is orthogonal to every \(t \in X^\perp\) by definition, so \(x \in X^{\perp\perp}\).
- (iv) Monotonicity. If \(t \in Y^\perp\) and \(x \in X \subset Y\), then \(\langle t, x \rangle = 0\) since \(x \in Y\), so \(t \in X^\perp\).
A picture of the orthogonal in \(\mathbb{R}^3\): given a nonzero vector \(a\), its orthogonal \(\{a\}^\perp\) is the plane through the origin perpendicular to \(a\).
Example — Hyperplane via a normal vector
In the canonical Euclidean \(\mathbb{R}^3\), the plane \(H\) of equation \(3x - y + 2z = 0\) is the orthogonal of \(\{(3, -1, 2)\}\): indeed, for \((x, y, z) \in \mathbb{R}^3\), \(\langle (3, -1, 2), (x, y, z) \rangle = 3x - y + 2z\), hence \(H = \{ (x, y, z) \mid \langle (3, -1, 2), (x, y, z) \rangle = 0 \} = \{(3, -1, 2)\}^\perp\). Example — Symmetric vs skew-symmetric matrices
Equip \(\mathcal{M}_n(\mathbb{R})\) with its canonical Frobenius scalar product. Then \(\mathcal{S}_n(\mathbb{R})^\perp = \mathcal{A}_n(\mathbb{R})\) (the orthogonal of the symmetric matrices is the skew-symmetric ones).Inclusion \(\mathcal{A}_n(\mathbb{R}) \subset \mathcal{S}_n(\mathbb{R})^\perp\). For \(S\) symmetric and \(A\) skew-symmetric, \(\langle S, A \rangle = \operatorname{tr}(S^\top A) = \operatorname{tr}(S A) = \operatorname{tr}(A S) = -\operatorname{tr}(A^\top S) = -\langle A, S \rangle = -\langle S, A \rangle\), so \(\langle S, A \rangle = 0\).
Inclusion \(\mathcal{S}_n(\mathbb{R})^\perp \subset \mathcal{A}_n(\mathbb{R})\). Let \(M \in \mathcal{S}_n(\mathbb{R})^\perp\). Decompose \(M = S + A\) with \(S = (M + M^\top)/2 \in \mathcal{S}_n(\mathbb{R})\) and \(A = (M - M^\top)/2 \in \mathcal{A}_n(\mathbb{R})\). Since \(A \in \mathcal{A}_n(\mathbb{R}) \subset \mathcal{S}_n(\mathbb{R})^\perp\) (by the first inclusion), we have \(\langle A, S \rangle = 0\). Then $$ 0 = \langle M, S \rangle = \langle S + A, S \rangle = \|S\|^2 + \langle A, S \rangle = \|S\|^2, $$ hence \(S = 0\) by separation, so \(M = A \in \mathcal{A}_n(\mathbb{R})\).
Skills to practice
- Computing \(X^\perp\) explicitly
III.4
Gram-Schmidt orthonormalisation algorithm
Given an arbitrary free family \((e_1, \dots, e_n)\) of \(E\), can we always produce an orthonormal family with the same span? Yes. The procedure, due to Gram and Schmidt, is iterative: build \(u_1\) from \(e_1\) alone by normalising; then for each \(k \ge 2\), kill the components of \(e_k\) along the previous \(u_1, \dots, u_{k-1}\) to obtain a vector \(\hat u_k\) orthogonal to them, and normalise. We first isolate the key one-step lemma (« component along a vector »), then state and prove the full algorithm.
Proposition — Component along a vector
- (i) Single unit vector. Let \(u \in E\) be a unit vector. For every \(x \in E\), the vector \(x - \langle x, u \rangle u\) is orthogonal to \(u\). The vector \(\langle x, u \rangle u\) is called the component of \(x\) along \(u\).
- (ii) Orthonormal family. Let \((u_1, \dots, u_n)\) be an orthonormal family of \(E\). For every \(x \in E\), the vector \(x - \sum_{i = 1}^n \langle x, u_i \rangle u_i\) is orthogonal to each of \(u_1, \dots, u_n\).
- (i). \(\langle x - \langle x, u \rangle u, u \rangle = \langle x, u \rangle - \langle x, u \rangle \langle u, u \rangle = \langle x, u \rangle - \langle x, u \rangle \cdot 1 = 0\), using \(\|u\|^2 = 1\).
- (ii). Fix \(j\) and compute \(\langle x - \sum_i \langle x, u_i \rangle u_i, u_j \rangle = \langle x, u_j \rangle - \sum_i \langle x, u_i \rangle \langle u_i, u_j \rangle = \langle x, u_j \rangle - \sum_i \langle x, u_i \rangle \delta_{i, j} = \langle x, u_j \rangle - \langle x, u_j \rangle = 0\).
Theorem — Gram-Schmidt orthonormalisation
Let \((e_1, \dots, e_n)\) be a free family of \(E\). There exists an orthonormal family \((u_1, \dots, u_n)\) of \(E\) such that for every \(k \in \{1, \dots, n\}\), $$ \textcolor{colorprop}{\operatorname{Vect}(u_1, \dots, u_k) = \operatorname{Vect}(e_1, \dots, e_k)}. $$ The \(u_k\) are built inductively. Setting \(\hat u_k = e_k - \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i\) (with the convention \(\hat u_1 = e_1\)), one has \(\hat u_k \ne 0_E\) and \(u_k = \pm \hat u_k / \|\hat u_k\|\) (up to a sign choice).
By induction on \(k\).
Initialisation (\(k = 1\)). The family \((e_1)\) is free, so \(e_1 \ne 0_E\), hence \(\|e_1\| \ne 0\). Set \(u_1 = e_1 / \|e_1\|\), which has unit norm. Then \(\operatorname{Vect}(u_1) = \operatorname{Vect}(e_1)\) since \(u_1\) is a nonzero scalar multiple of \(e_1\).
Heredity. Suppose that for some \(k \in \{2, \dots, n\}\) we have built an orthonormal family \((u_1, \dots, u_{k - 1})\) such that \(\operatorname{Vect}(u_1, \dots, u_{k - 1}) = \operatorname{Vect}(e_1, \dots, e_{k - 1})\). Define \(\hat u_k = e_k - \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i\). By the previous Proposition, \(\hat u_k\) is orthogonal to each of \(u_1, \dots, u_{k - 1}\). We claim \(\hat u_k \ne 0_E\): otherwise \(e_k = \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i \in \operatorname{Vect}(u_1, \dots, u_{k - 1}) = \operatorname{Vect}(e_1, \dots, e_{k - 1})\), contradicting the freeness of \((e_1, \dots, e_n)\). Define \(u_k = \hat u_k / \|\hat u_k\|\) (or its negation). Then \((u_1, \dots, u_k)\) is orthonormal. Finally, \(u_k \in \operatorname{Vect}(u_1, \dots, u_{k - 1}, e_k) = \operatorname{Vect}(e_1, \dots, e_k)\) since \(u_k\) is a linear combination of the \(u_i\) and \(e_k\). Conversely \(e_k = \hat u_k + \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i = \pm \|\hat u_k\| u_k + \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i \in \operatorname{Vect}(u_1, \dots, u_k)\) (the \(\pm\) tracks the sign choice in \(u_k = \pm \hat u_k / \|\hat u_k\|\)). So the two spans are equal.
Conclusion. By induction, \((u_1, \dots, u_n)\) exists with the required properties.
Initialisation (\(k = 1\)). The family \((e_1)\) is free, so \(e_1 \ne 0_E\), hence \(\|e_1\| \ne 0\). Set \(u_1 = e_1 / \|e_1\|\), which has unit norm. Then \(\operatorname{Vect}(u_1) = \operatorname{Vect}(e_1)\) since \(u_1\) is a nonzero scalar multiple of \(e_1\).
Heredity. Suppose that for some \(k \in \{2, \dots, n\}\) we have built an orthonormal family \((u_1, \dots, u_{k - 1})\) such that \(\operatorname{Vect}(u_1, \dots, u_{k - 1}) = \operatorname{Vect}(e_1, \dots, e_{k - 1})\). Define \(\hat u_k = e_k - \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i\). By the previous Proposition, \(\hat u_k\) is orthogonal to each of \(u_1, \dots, u_{k - 1}\). We claim \(\hat u_k \ne 0_E\): otherwise \(e_k = \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i \in \operatorname{Vect}(u_1, \dots, u_{k - 1}) = \operatorname{Vect}(e_1, \dots, e_{k - 1})\), contradicting the freeness of \((e_1, \dots, e_n)\). Define \(u_k = \hat u_k / \|\hat u_k\|\) (or its negation). Then \((u_1, \dots, u_k)\) is orthonormal. Finally, \(u_k \in \operatorname{Vect}(u_1, \dots, u_{k - 1}, e_k) = \operatorname{Vect}(e_1, \dots, e_k)\) since \(u_k\) is a linear combination of the \(u_i\) and \(e_k\). Conversely \(e_k = \hat u_k + \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i = \pm \|\hat u_k\| u_k + \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i \in \operatorname{Vect}(u_1, \dots, u_k)\) (the \(\pm\) tracks the sign choice in \(u_k = \pm \hat u_k / \|\hat u_k\|\)). So the two spans are equal.
Conclusion. By induction, \((u_1, \dots, u_n)\) exists with the required properties.
The iconic figure of the \(n = 2\) case visualises the algorithm: \(e_1\) is normalised to \(u_1\) (same direction, unit length); \(e_2\) has its component \(\langle e_2, u_1 \rangle u_1\) subtracted to produce \(\hat u_2 \perp u_1\), then \(\hat u_2\) is normalised to \(u_2\).
Method — Orthonormalise a free family
Given a free family \((e_1, \dots, e_n)\), build \((u_1, \dots, u_n)\) orthonormal by iterating: - Step 1. \(u_1 = e_1 / \|e_1\|\).
- Step k (for \(k = 2, \dots, n\)). Compute \(\hat u_k = e_k - \sum_{i = 1}^{k - 1} \langle e_k, u_i \rangle u_i\), then \(u_k = \hat u_k / \|\hat u_k\|\).
Example — Orthonormalising the monomials of degree at most 2
Orthonormalise \((1, X, X^2)\) on \(\mathbb{R}[X]\) equipped with the integral product \(\langle P, Q \rangle = \int_0^1 P(t) Q(t)\, \mathrm{d}t\). - Step 1. \(\|1\|^2 = \int_0^1 1\, \mathrm{d}t = 1\), so \(u_1 = 1\).
- Step 2. \(\langle X, u_1 \rangle = \int_0^1 t\, \mathrm{d}t = \tfrac{1}{2}\), so \(\hat u_2 = X - \tfrac{1}{2}\). Then \(\|\hat u_2\|^2 = \int_0^1 (t - \tfrac{1}{2})^2\, \mathrm{d}t = \tfrac{1}{12}\), so \(u_2 = \sqrt{12} (X - \tfrac{1}{2}) = \sqrt{3} (2 X - 1)\).
- Step 3. \(\langle X^2, u_1 \rangle = \int_0^1 t^2\, \mathrm{d}t = \tfrac{1}{3}\), \(\langle X^2, u_2 \rangle = \sqrt{3} \int_0^1 t^2 (2 t - 1)\, \mathrm{d}t = \sqrt{3} (\tfrac{1}{2} - \tfrac{1}{3}) = \tfrac{\sqrt{3}}{6}\). So \(\hat u_3 = X^2 - \tfrac{1}{3} - \tfrac{\sqrt{3}}{6} \cdot \sqrt{3}(2 X - 1) = X^2 - \tfrac{1}{3} - \tfrac{1}{2}(2 X - 1) = X^2 - X + \tfrac{1}{6}\). Then \(\|\hat u_3\|^2 = \int_0^1 (t^2 - t + \tfrac{1}{6})^2\, \mathrm{d}t = \tfrac{1}{180}\), so \(u_3 = \sqrt{180}(X^2 - X + \tfrac{1}{6}) = \sqrt{5}(6 X^2 - 6 X + 1)\).
Skills to practice
- Orthonormalising a free family
IV
Orthonormal bases in finite dimension
IV.1
Existence of an orthonormal basis\(\virgule\) incomplete orthonormal basis
In finite dimension, two structural Theorems follow directly from Gram-Schmidt: every Euclidean space has an orthonormal basis (BON), and every orthonormal family can be completed to a BON. Both follow by applying Gram-Schmidt to a suitable basis input.
Theorem — Existence of an orthonormal basis
Every Euclidean space \(E\) (finite-dim real pre-Hilbert space) admits a orthonormal basis.
\(E\) has finite dimension \(n \ge 0\). If \(n = 0\), the empty family is orthonormal and is a basis of \(\{0_E\}\). If \(n \ge 1\), \(E\) has a basis \((e_1, \dots, e_n)\) (from the prerequisite chapter Dimension finie). Applying Gram-Schmidt to \((e_1, \dots, e_n)\) produces an orthonormal family \((u_1, \dots, u_n)\) with \(\operatorname{Vect}(u_1, \dots, u_n) = \operatorname{Vect}(e_1, \dots, e_n) = E\). Since \((u_1, \dots, u_n)\) is orthonormal of \(n\) vectors of \(E\) spanning \(E\), it is an orthonormal basis.
Theorem — Incomplete orthonormal basis
Let \(E\) be a Euclidean space and \((u_1, \dots, u_p)\) an orthonormal family of \(E\). Then \((u_1, \dots, u_p)\) can be completed to an orthonormal basis \((u_1, \dots, u_p, u_{p+1}, \dots, u_n)\) of \(E\).
The orthonormal family \((u_1, \dots, u_p)\) is free (Proposition « Orthogonal family of nonzero vectors is free » above). By the incomplete basis theorem from Dimension finie, complete it to a basis \((u_1, \dots, u_p, v_{p+1}, \dots, v_n)\) of \(E\). Apply Gram-Schmidt to this completed basis. For \(k \le p\): \(u_k\) is already in the desired form, and Gram-Schmidt at step \(k\) computes \(\hat u_k = u_k - \sum_{i = 1}^{k-1} \langle u_k, u_i \rangle u_i = u_k\) since the \(u_i\) are orthonormal and \(\langle u_k, u_i \rangle = \delta_{k, i} = 0\) for \(i < k\). So Gram-Schmidt does not modify \(u_1, \dots, u_p\). For \(k = p + 1, \dots, n\), Gram-Schmidt produces new orthonormal \(u_k\) from \(v_k\). The resulting family is an orthonormal basis extending \((u_1, \dots, u_p)\).
Skills to practice
- Constructing a BON of a small sub-space
IV.2
Coordinates\(\virgule\) scalar product\(\virgule\) and norm in an orthonormal basis
Once we have a BON \((e_1, \dots, e_n)\) of a Euclidean \(E\), scalars and vectors are tied by a simple formula: the \(i\)-th coordinate of \(x\) is \(\langle x, e_i \rangle\). Consequently, in BON coordinates, every Euclidean scalar product computes as the canonical scalar product of coordinates. This is the structural collapse that justifies the slogan « the canonical scalar product on \(\mathbb{R}^n\) is the archetype of every Euclidean scalar product ».
Theorem — Coordinates in an orthonormal basis
Let \(E\) be a Euclidean space, \((e_1, \dots, e_n)\) an orthonormal basis of \(E\), and \(x \in E\). Then $$ \textcolor{colorprop}{x = \sum_{i = 1}^n \langle x, e_i \rangle\, e_i}. $$ The scalar \(\langle x, e_i \rangle\) is the \(i\)-th coordinate of \(x\) in the BON.
The vector \(x - \sum_{i = 1}^n \langle x, e_i \rangle e_i\) is orthogonal to each \(e_j\) by the Proposition « Component along a vector » (ii) above. Since \((e_1, \dots, e_n)\) is a basis of \(E\), every vector of \(E\) is a linear combination of the \(e_j\), so \(x - \sum \langle x, e_i \rangle e_i\) is orthogonal to every vector of \(E\). By the key remark on orthogonal vectors above, only \(0_E\) has this property: \(x - \sum \langle x, e_i \rangle e_i = 0_E\), hence \(x = \sum \langle x, e_i \rangle e_i\).
Theorem — Scalar product and norm in a BON
Let \(E\) be a Euclidean space, \((e_1, \dots, e_n)\) an orthonormal basis of \(E\), and \(x, y \in E\) with coordinate vectors \(X = (x_1, \dots, x_n)^\top\) and \(Y = (y_1, \dots, y_n)^\top\) in the BON. Then $$ \textcolor{colorprop}{\langle x, y \rangle = \sum_{i = 1}^n x_i y_i = X^\top Y, \qquad \|x\|^2 = \sum_{i = 1}^n x_i^2 = X^\top X}. $$
By the previous Theorem, \(x = \sum_i x_i e_i\) and \(y = \sum_j y_j e_j\). By bilinearity: $$ \begin{aligned} \langle x, y \rangle &= \Big\langle \sum_i x_i e_i, \sum_j y_j e_j \Big\rangle \\
&= \sum_{i, j} x_i y_j \langle e_i, e_j \rangle && \text{(bilinearity)} \\
&= \sum_{i, j} x_i y_j \delta_{i, j} && \text{(orthonormality)} \\
&= \sum_i x_i y_i = X^\top Y. \end{aligned} $$ Take \(y = x\) to get \(\|x\|^2 = \sum_i x_i^2 = X^\top X\).
Caveat. The formula \(\langle x, y \rangle = \sum x_i y_i\) holds only in an orthonormal basis. If the basis is not orthonormal, the coordinates of \(x\) in it have nothing to do with \(\langle x, e_i \rangle\), and the scalar product does not reduce to the canonical product of coordinates.
Pedagogical takeaway. In BON coordinates, every Euclidean scalar product computes as the canonical scalar product of \(\mathbb{R}^n\). So abstract Euclidean theory is, computationally, \(\mathbb{R}^n\) in disguise --- once we fix a BON.
Pedagogical takeaway. In BON coordinates, every Euclidean scalar product computes as the canonical scalar product of \(\mathbb{R}^n\). So abstract Euclidean theory is, computationally, \(\mathbb{R}^n\) in disguise --- once we fix a BON.
Skills to practice
- Reading coordinates\(\virgule\) scalar product\(\virgule\) and norm in a BON
V
Orthogonal supplementary and orthogonal projection
V.1
Orthogonal supplementary of a finite-dim sub-space
The load-bearing hypothesis throughout this section is the finite-dimensionality of \(F\), not of \(E\). In an infinite-dim pre-Hilbert space, an infinite-dim sub-space \(F\) may fail to satisfy \(E = F \oplus F^\perp\) (counter-example in \(C^0([0, 1], \mathbb{R})\): take \(F = \{ f \mid f(0) = 0 \}\), then \(F^\perp = \{0\}\) while \(F \ne E\)). But as soon as \(F\) has finite dimension, the orthogonal supplementary exists, is unique, and is the algebraic core of orthogonal projection.
A picture: \(F\) as a line in \(\mathbb{R}^3\), \(F^\perp\) as the orthogonal plane.
A picture: \(F\) as a line in \(\mathbb{R}^3\), \(F^\perp\) as the orthogonal plane.
Theorem — Orthogonal supplementary of a finite-dim sub-space
Let \(E\) be a real pre-Hilbert space (possibly infinite-dim), and let \(F\) be a sub-space of \(E\) of finite dimension. Then: - (i) \textcolor{colorprop}{\(E = F \oplus F^\perp\)}.
- (ii) \textcolor{colorprop}{\(F^\perp\) is the unique sub-space \(G\) of \(E\) such that \(E = F \oplus G\) and \(G \perp F\)}. We call \(F^\perp\) the orthogonal supplementary of \(F\). (Note: \(F\) may have many algebraic supplementaries; among them, exactly one is orthogonal to \(F\).)
- (iii) If \(E\) has finite dim: \(\dim F^\perp = \dim E - \dim F\).
- (iv) \(F^{\perp\perp} = F\).
- (i) \(E = F \oplus F^\perp\). We already have \(F \cap F^\perp = \{0_E\}\) by the Proposition « Orthogonal of a part » (ii) above. For \(E = F + F^\perp\): \(F\) has finite dim, so by the Theorem « Existence of an orthonormal basis » above it has an orthonormal basis \((f_1, \dots, f_p)\). For every \(x \in E\), set \(f = \sum_{i = 1}^p \langle x, f_i \rangle f_i \in F\). By the Proposition « Component along a vector » (ii) above, \(x - f\) is orthogonal to each \(f_i\), hence to \(\operatorname{Vect}(f_1, \dots, f_p) = F\), i.e. \(x - f \in F^\perp\). So \(x = f + (x - f) \in F + F^\perp\).
- (ii) Unique orthogonal supplementary. Let \(G\) be any sub-space with \(E = F \oplus G\) and \(G \perp F\). By \(G \perp F\) definition, \(G \subset F^\perp\). Conversely, for \(x \in F^\perp\), decompose \(x = f + g \in F + G\) (since \(E = F + G\)). Then \(\langle f, f \rangle = \langle x, f \rangle - \langle g, f \rangle = 0 - 0 = 0\) (the first vanishes since \(x \in F^\perp\) and \(f \in F\); the second since \(g \in G\) and \(G \perp F\)). So \(f = 0_E\) by separation, and \(x = g \in G\). Thus \(F^\perp \subset G\), giving \(G = F^\perp\).
- (iii) Dimension formula. If \(E\) has finite dim, \(\dim(F \oplus F^\perp) = \dim F + \dim F^\perp = \dim E\), hence \(\dim F^\perp = \dim E - \dim F\).
- (iv) \(F^{\perp\perp} = F\). We already have \(F \subset F^{\perp\perp}\) by the Proposition « Orthogonal of a part » (iii) above. Conversely, for \(x \in F^{\perp\perp}\), decompose \(x = f + f'\) with \(f \in F\) and \(f' \in F^\perp\). Then \(\langle f', f' \rangle = \langle x, f' \rangle - \langle f, f' \rangle = 0 - 0 = 0\) (the first vanishes since \(x \in F^{\perp\perp}\) and \(f' \in F^\perp\); the second since \(f \in F \subset F^{\perp\perp}\) and \(f' \in F^\perp\), by \(F \perp F^\perp\)). So \(f' = 0_E\) by separation, hence \(x = f \in F\).
Caveat (finite-dim of \(F\) is essential). The simplest infinite-dim counterexample: in \(E = C^0([0, 1], \mathbb{R})\) with the integral scalar product, take \(F = \{ f \in E \mid f(0) = 0 \}\). One can show \(F^\perp = \{0\}\) (any continuous \(g\) orthogonal to every \(f\) vanishing at \(0\) must itself be zero on \((0, 1]\), hence on \([0, 1]\) by continuity), so \(F + F^\perp = F \subsetneq E\). This is why this section systematically requires the dimension of \(F\), not of \(E\), to be finite.
Skills to practice
- Computing \(F^\perp\) and verifying \(E \equal F \oplus F^\perp\)
V.2
Orthogonal projection on a finite-dim sub-space
Once \(E = F \oplus F^\perp\) for \(F\) finite-dim, we can project any \(x \in E\) on \(F\) along \(F^\perp\). The image \(p(x) \in F\) is the unique vector of \(F\) such that \(x - p(x) \in F^\perp\), i.e. \(x - p(x)\) is orthogonal to all of \(F\).
The picture: \(x\) decomposes into \(f \in F\) (the projection) and \(f' \in F^\perp\) (the residual).
The picture: \(x\) decomposes into \(f \in F\) (the projection) and \(f' \in F^\perp\) (the residual).
Definition — Orthogonal projection
Let \(F\) be a sub-space of \(E\) of finite dimension, so \(E = F \oplus F^\perp\). The projection on \(F\) parallel to \(F^\perp\) is called the orthogonal projection on \(F\), denoted \(p_F\) (or just \(p\) when \(F\) is clear). Theorem — Expression of the orthogonal projection in a BON of \(F\)
Let \(F\) be a sub-space of \(E\) of finite dimension, \((f_1, \dots, f_p)\) an orthonormal basis of \(F\), and \(x \in E\). Then $$ \textcolor{colorprop}{p_F(x) = \sum_{i = 1}^p \langle x, f_i \rangle\, f_i}. $$
Set \(f = \sum_{i = 1}^p \langle x, f_i \rangle f_i\). Then \(f \in F = \operatorname{Vect}(f_1, \dots, f_p)\). By the Proposition « Component along a vector » (ii) above, \(x - f\) is orthogonal to each \(f_i\), hence to \(F\), so \(x - f \in F^\perp\). The decomposition \(x = f + (x - f) \in F + F^\perp\) is exactly the unique \(F \oplus F^\perp\) decomposition of \(x\), with \(F\)-component \(f\). Hence \(p_F(x) = f\).
Method — Compute \(p_F(x)\) when the basis of \(F\) is not orthonormal
Two strategies. Strategy B is usually faster. - Strategy A (Gram-Schmidt route). Orthonormalise the basis \((f_1, \dots, f_p)\) of \(F\) via Gram-Schmidt into a BON \((u_1, \dots, u_p)\) of \(F\), then apply \(p_F(x) = \sum \langle x, u_i \rangle u_i\). Drawback: integral or matrix scalar products lead to ugly \(1 / \sqrt{\text{stuff}}\) normalisations that propagate.
- Strategy B (orthogonality system --- recommended). Write \(p_F(x) = \sum_{i = 1}^p \lambda_i f_i\) with unknowns \((\lambda_1, \dots, \lambda_p)\). Use the characterisation « \(p_F(x) \in F\) and \(x - p_F(x) \in F^\perp\) »: the second gives \(\langle x - p_F(x), f_j \rangle = 0\) for \(j = 1, \dots, p\). This yields the \(p \times p\) linear system $$ \sum_{i = 1}^p \lambda_i \langle f_i, f_j \rangle = \langle x, f_j \rangle, \qquad j = 1, \dots, p. $$ Solve it. Cleaner than Strategy A in practice.
Example — Project the identity on the span of cosine and sine
Compute the orthogonal projection of \(\operatorname{Id} : t \mapsto t\) on \(F = \operatorname{Vect}(\cos, \sin)\) in \(C^0([0, 2\pi], \mathbb{R})\) with the integral product \(\langle f, g \rangle = \int_0^{2\pi} f(t) g(t)\, \mathrm{d}t\).
Use Strategy B. Write \(p(\operatorname{Id}) = \lambda \cos + \mu \sin\) and require \(\langle \operatorname{Id} - p(\operatorname{Id}), \cos \rangle = 0\) and \(\langle \operatorname{Id} - p(\operatorname{Id}), \sin \rangle = 0\). Useful integrals: \(\|\cos\|^2 = \|\sin\|^2 = \pi\), \(\langle \cos, \sin \rangle = 0\) (so the family \((\cos, \sin)\) is already orthogonal, but not normalised); \(\langle \operatorname{Id}, \cos \rangle = \int_0^{2\pi} t \cos t\, \mathrm{d}t = 0\) (integration by parts) and \(\langle \operatorname{Id}, \sin \rangle = \int_0^{2\pi} t \sin t\, \mathrm{d}t = -2\pi\).
The system \(\lambda \|\cos\|^2 + \mu \langle \sin, \cos \rangle = \langle \operatorname{Id}, \cos \rangle\) and the symmetric companion become \(\lambda \pi = 0\) and \(\mu \pi = -2 \pi\). Hence \(\lambda = 0\) and \(\mu = -2\), giving \(p(\operatorname{Id}) = -2 \sin\).
The system \(\lambda \|\cos\|^2 + \mu \langle \sin, \cos \rangle = \langle \operatorname{Id}, \cos \rangle\) and the symmetric companion become \(\lambda \pi = 0\) and \(\mu \pi = -2 \pi\). Hence \(\lambda = 0\) and \(\mu = -2\), giving \(p(\operatorname{Id}) = -2 \sin\).
Skills to practice
- Projecting onto a small sub-space
V.3
Special case: projection on a hyperplane
For a hyperplane \(H\) of a finite-dim Euclidean space, \(H^\perp\) has dimension \(1\). So a single nonzero vector --- the normal vector --- characterises \(H^\perp\), and the projection formulas become especially clean.
Theorem — Normal vector\(\virgule\) projection\(\virgule\) reflection on a hyperplane
Let \(E\) be a Euclidean space of dimension \(n \ge 1\), and \(H\) a hyperplane of \(E\) (sub-space of dimension \(n - 1\)). - (i) \(H^\perp\) is a line of \(E\). Any nonzero \(a \in H^\perp\) is called a normal vector to \(H\). (Two normal vectors differ by a nonzero scalar.)
- (ii) For every \(x \in E\) and every normal vector \(a\) to \(H\): $$ \textcolor{colorprop}{p_H(x) = x - \frac{\langle x, a \rangle}{\|a\|^2}\, a}, $$ which simplifies to \(p_H(x) = x - \langle x, a \rangle a\) when \(a\) has unit norm.
- (iii) The reflection of \(E\) across \(H\) is \(s_H(x) = 2 p_H(x) - x\), which expands to $$ \textcolor{colorprop}{s_H(x) = x - 2\, \frac{\langle x, a \rangle}{\|a\|^2}\, a}. $$
- (i). \(H\) has dim \(n - 1\) in \(E\) of dim \(n\). By the dimension formula from the orthogonal-supplementary theorem above (item (iii)), \(\dim H^\perp = n - (n - 1) = 1\). Any nonzero element of \(H^\perp\) generates it.
- (ii). The vector \(a / \|a\|\) is a unit-norm basis of \(H^\perp\). By the projection formula on \(H^\perp\) (orthogonal-projection Theorem above applied to the dim-\(1\) sub-space \(H^\perp\)), \(p_{H^\perp}(x) = \langle x, a / \|a\| \rangle \cdot (a / \|a\|) = \frac{\langle x, a \rangle}{\|a\|^2} a\). By the decomposition \(x = p_H(x) + p_{H^\perp}(x)\) (since \(E = H \oplus H^\perp\) and projections are complementary), \(p_H(x) = x - p_{H^\perp}(x) = x - \frac{\langle x, a \rangle}{\|a\|^2} a\).
- (iii). The reflection across \(H\) is by definition \(s_H(x) = 2 p_H(x) - x\). Substituting (ii) gives \(s_H(x) = 2 x - 2 \frac{\langle x, a \rangle}{\|a\|^2} a - x = x - 2 \frac{\langle x, a \rangle}{\|a\|^2} a\).
Forward pointer. Reflections across hyperplanes are the building blocks of isometry theory, where they generate the orthogonal group \(\mathcal{O}(E)\) (Cartan-Dieudonné Theorem, MP). The equation \(a_1 x_1 + \dots + a_n x_n = 0\) of a hyperplane in BON coordinates is exactly \(\langle x, a \rangle = 0\), with the coefficient vector \((a_1, \dots, a_n)\) being the normal vector --- a recurring motif in Espaces affines.
Skills to practice
- Using the normal-vector formula
V.4
Distance to a finite-dim sub-space
The orthogonal projection \(p_F(x)\) realises the unique minimum of \(\|x - f\|\) over \(f \in F\). Pythagoras explains why: any \(f \in F\) decomposes \(x - f\) into the \(F^\perp\)-residual \(x - p_F(x)\) and the \(F\)-correction \(p_F(x) - f\), two orthogonal pieces. The \(F\)-correction grows the distance.
The picture: \(d(x, F) = \|x - p_F(x)\|\) is the length of the perpendicular drop from \(x\) to \(F\).
The picture: \(d(x, F) = \|x - p_F(x)\|\) is the length of the perpendicular drop from \(x\) to \(F\).
Definition — Distance to a part
Let \(A\) be a non-empty part of \(E\) and \(x \in E\). The distance from \(x\) to \(A\) is $$ d(x, A) = \inf_{a \in A} \|x - a\|. $$ In general, this is only an infimum (not necessarily a minimum). For \(A\) a finite-dim sub-space, the next Theorem shows it is a genuine minimum. Theorem — Distance to a finite-dim sub-space
Let \(F\) be a sub-space of \(E\) of finite dimension, \(x \in E\), and \(p_F\) the orthogonal projection on \(F\). Then \(d(x, F)\) is a minimum, attained uniquely at \(p_F(x)\): $$ \textcolor{colorprop}{d(x, F) = \|x - p_F(x)\|, \qquad d(x, F)^2 = \|x\|^2 - \|p_F(x)\|^2}. $$
For every \(f \in F\), decompose \(x - f = (x - p_F(x)) + (p_F(x) - f)\). The first summand lies in \(F^\perp\) (since \(x - p_F(x) \in F^\perp\) by definition of orthogonal projection), the second lies in \(F\) (since \(p_F(x), f \in F\)). By Pythagoras, $$ \|x - f\|^2 = \|x - p_F(x)\|^2 + \|p_F(x) - f\|^2 \ge \|x - p_F(x)\|^2, $$ with equality if and only if \(\|p_F(x) - f\|^2 = 0\), i.e. \(f = p_F(x)\) by separation. Taking square roots, \(\|x - f\| \ge \|x - p_F(x)\|\) with equality iff \(f = p_F(x)\). So \(d(x, F) = \inf_{f \in F} \|x - f\| = \|x - p_F(x)\|\) is a minimum, attained only at \(p_F(x)\).
For the alternative formula: apply Pythagoras to \(x = p_F(x) + (x - p_F(x))\) with the orthogonal decomposition \(p_F(x) \in F\) and \(x - p_F(x) \in F^\perp\): $$ \|x\|^2 = \|p_F(x)\|^2 + \|x - p_F(x)\|^2 = \|p_F(x)\|^2 + d(x, F)^2, $$ hence \(d(x, F)^2 = \|x\|^2 - \|p_F(x)\|^2\).
For the alternative formula: apply Pythagoras to \(x = p_F(x) + (x - p_F(x))\) with the orthogonal decomposition \(p_F(x) \in F\) and \(x - p_F(x) \in F^\perp\): $$ \|x\|^2 = \|p_F(x)\|^2 + \|x - p_F(x)\|^2 = \|p_F(x)\|^2 + d(x, F)^2, $$ hence \(d(x, F)^2 = \|x\|^2 - \|p_F(x)\|^2\).
Method — Compute \(d(x\virgule F)\)
Two steps. - Step 1. Compute \(p_F(x)\) via Strategy A or B from the orthogonal-projection subsection above.
- Step 2. Compute \(d(x, F) = \|x - p_F(x)\|\), or equivalently \(d(x, F)^2 = \|x\|^2 - \|p_F(x)\|^2\) (often the cleaner formula when \(\|x\|\) and \(\|p_F(x)\|\) are simpler to compute than \(\|x - p_F(x)\|\)).
Example — Minimisation problem as a squared distance
Compute \(I = \displaystyle \inf_{a, b \in \mathbb{R}} \int_0^{2\pi} (t - a \cos t - b \sin t)^2\, \mathrm{d}t\).
Reformulate as a squared distance in \(C^0([0, 2\pi], \mathbb{R})\) with the integral product: \(I = \inf_{f \in F} \|\operatorname{Id} - f\|^2 = d(\operatorname{Id}, F)^2\) where \(F = \operatorname{Vect}(\cos, \sin)\). We already computed \(p(\operatorname{Id}) = -2 \sin\) in the orthogonal-projection subsection above. So $$ I = \|\operatorname{Id}\|^2 - \|p(\operatorname{Id})\|^2 = \int_0^{2\pi} t^2\, \mathrm{d}t - 4 \|\sin\|^2 = \frac{8 \pi^3}{3} - 4 \pi. $$ The infimum is attained at \((a, b) = (0, -2)\).
Forward pointer. The « minimisation as a squared distance » template is the algebraic core of least squares --- the workhorse method of statistics, signal processing, and numerical analysis, treated in MP and MP* and present in every engineering curriculum. The lesson: many « find \(a, b\) that minimise such-and-such integral or sum » problems reduce, after a careful rewrite, to a projection on a finite-dim sub-space of a pre-Hilbert space.
Skills to practice
- Computing \(d(x\virgule F)\) via the projection
Jump to section