CommeUnJeu · L2 MP

Expectation, variance and generating functions

⌚ ~123 min ▢ 15 blocks ✓ 30 exercises ➣ Prerequisites : Discrete random variables, Summable families, Expectation, variance, covariance, Power series

Ex 1

Differentiability extends to functions $f : I \to \mathbb{C}$ via componentwise differentiability. The algebraic toolbox carries over almost verbatim. The crucial obstruction: Rolle's theorem and the equality form of the mean value theorem are false over $\mathbb{C}$. However the mean value inequality for $C^1$ functions still holds --- admitted here, the proof being deferred to Integration on a segment.

I Expectation of a discrete random variable

Definition — Complex-valued differentiability

$f : I \to \mathbb{C}$, $a \in I$. We say $f$ is differentiable at $a$ if the difference quotient $$ \tau_a(h) = \frac{f(a + h) - f(a)}{h} $$ admits a finite limit in $\mathbb{C}$ as $h \to 0$ with $a + h \in I$. The limit is denoted $\textcolor{colordef}{f'(a)}$.

I.1 Definition; existence

Proposition — Componentwise characterization

$f : I \to \mathbb{C}$ is differentiable at $a \in I$ if and only if $\mathrm{Re}\,f$ and $\mathrm{Im}\,f$ are both differentiable at $a$. In this case, $$ \textcolor{colorprop}{f'(a) = (\mathrm{Re}\,f)'(a) + i \, (\mathrm{Im}\,f)'(a)}. $$

Proof

For $h \ne 0$ with $a + h \in I$, $$ \tau_a^f(h) = \tau_a^{\mathrm{Re}\,f}(h) + i \, \tau_a^{\mathrm{Im}\,f}(h) $$ by $\mathbb{R}$-linearity of $\mathrm{Re}$ and $\mathrm{Im}$. By the componentwise characterization of complex-valued function limits (Limits and continuity, complex-valued functions section), $\tau_a^f(h)$ has a finite limit in $\mathbb{C}$ as $h \to 0$ iff $\tau_a^{\mathrm{Re}\,f}$ and $\tau_a^{\mathrm{Im}\,f}$ both have finite limits in $\mathbb{R}$, and the limit decomposes as $(\mathrm{Re}\,f)'(a) + i (\mathrm{Im}\,f)'(a)$.

Example

Compute the derivative of $f : \mathbb{R} \to \mathbb{C}$, $f(t) = \exp(i t)$.

Answer

$\mathrm{Re}\,f(t) = \cos t$, $\mathrm{Im}\,f(t) = \sin t$. Both are differentiable on $\mathbb{R}$ with $(\cos)'(t) = -\sin t$ and $(\sin)'(t) = \cos t$ (admitted from Real functions: lycée recap). By P7.1, $f$ is differentiable on $\mathbb{R}$ with $$ f'(t) = -\sin t + i \cos t = i (\cos t + i \sin t) = i \exp(i t). $$

Proposition — What does NOT change

Let $f, g : I \to \mathbb{C}$ differentiable on $I$, $\lambda, \mu \in \mathbb{C}$. Then:

$\lambda f + \mu g$ is differentiable on $I$ with $\textcolor{colorprop}{(\lambda f + \mu g)' = \lambda f' + \mu g'}$ ;
$f g$ is differentiable on $I$ with $\textcolor{colorprop}{(f g)' = f' g + f g'}$ ;
if $g$ does not vanish on $I$, $f/g$ is differentiable with $\textcolor{colorprop}{(f/g)' = (f' g - f g')/g^2}$ ;
if $\varphi : J \to I$ is a real-valued differentiable function, $f \circ \varphi$ is differentiable with $\textcolor{colorprop}{(f \circ \varphi)' = (f' \circ \varphi) \cdot \varphi'}$.

Each is deduced from the real case by componentwise decomposition via P7.1.

Proposition — What DOES change

Rolle's theorem and the equality form of the mean value theorem are false for complex-valued functions.

Counter-example

The function $f : [0, 2 \pi] \to \mathbb{C}$, $f(t) = \exp(i t)$, satisfies $f(0) = f(2 \pi) = 1$ and is $C^\infty$ on $[0, 2 \pi]$ with $f'(t) = i \exp(i t)$ (Ex7.1). But $|f'(t)| = 1$ everywhere, so $f'$ never vanishes --- Rolle fails. As a consequence, no value of $c \in {]}0, 2 \pi[$ satisfies $f(2\pi) - f(0) = f'(c) \cdot 2 \pi$ (the left side is $0$, the right side has modulus $2 \pi$), so the equality form of the mean value theorem fails too.

Theorem — Mean value inequality for complex-valued $C^1$ functions

Let $f : I \to \mathbb{C}$ of class $C^1$ on $I$ with $|f'(x)| \le K$ for all $x \in I$. Then for all $(x, y) \in I^2$, $$ \textcolor{colorprop}{|f(y) - f(x)| \le K |y - x|}. $$

Remark --- admitted here

T7.1 is admitted in this chapter. The inequality results from a simple integral upper bound, justified later in the Integration section. The full proof will be given in Integration on a segment (via $f(y) - f(x) = \int_x^y f'(t) dt$ and the triangle inequality for integrals). Until then, T7.1 is used as a black box.

Example

Use T7.1 to show $|\exp(i t) - 1| \le |t|$ for all $t \in \mathbb{R}$.

Answer

Set $f(t) = \exp(i t)$ on $\mathbb{R}$. Then $f$ is $C^1$ on $\mathbb{R}$ with $f'(t) = i \exp(i t)$ (Ex7.1), so $|f'(t)| = |i| \cdot |\exp(i t)| = 1$ for all $t$. By T7.1 with $K = 1$, $|f(t) - f(0)| \le 1 \cdot |t - 0|$, i.e. $|\exp(i t) - 1| \le |t|$.

Ex 2

Skills to practice

Determining whether $X$ has finite expectation

I.2 The transfer formula

Ex 3 Ex 4 Ex 5

Convexity captures, in a single inequality, the geometric idea of a curve that always lies below its chords. From this one definition flows a toolbox: a slope-monotonicity characterization, Jensen's inequality (finite-point version of convexity), a derivative-based test ($f'$ nondecreasing or $f'' \ge 0$), and the dual « graph above its tangents » statement that fuels classical bounds like $e^x \ge 1 + x$, $\ln(1+x) \le x$, and $\sin x \ge 2x/\pi$ on $[0, \pi/2]$. The chapter closes with a brief, scoped treatment of inflection points.
Throughout, $I$ denotes an interval of $\mathbb{R}$ (possibly unbounded, possibly closed or open). The duality « concave $\Leftrightarrow$ $-f$ convex » is stated once at the start and used implicitly thereafter; concave inequalities reverse the convex ones. Jensen's inequality is stated and proved for finite convex combinations; the H\"older and Minkowski inequalities belong to later integration / normed-spaces material. Inflection points are introduced in the closing section as a safe sufficient criterion.

The geometric idea: a function is convex if its graph stays below every chord drawn between two of its points. We translate this into the chord inequality, then show two characterizations: monotonicity of slopes and Jensen's inequality for finite convex combinations.

Skills to practice

Computing an expectation by transfer

I.3 Linearity$\virgule$ positivity$\virgule$ monotonicity

Definition — Convex and concave function

Let $f : I \to \mathbb{R}$. We say that $f$ is convex on $I$ if $$ \textcolor{colordef}{\forall x, y \in I, \ \forall \lambda \in [0, 1], \quad f((1 - \lambda) x + \lambda y) \le (1 - \lambda) f(x) + \lambda f(y)}. $$ We say that $f$ is concave on $I$ if $-f$ is convex on $I$, equivalently $$ \forall x, y \in I, \ \forall \lambda \in [0, 1], \quad f((1 - \lambda) x + \lambda y) \ge (1 - \lambda) f(x) + \lambda f(y). $$

Definition — Chord and secant line

For $x, y \in I$ with $x < y$, the chord of $f$ on $[x, y]$ is the line segment in the plane joining $(x, f(x))$ and $(y, f(y))$. The secant line through these two points is the (full) line containing the chord.

Figure --- convex and concave graphs

Side by side: a parabola $y = \tfrac{1}{2} x^2$ (convex, graph below the chord between any two of its points) and a (translated) logarithm $y = \ln x + 0.3$ (concave, graph above the chord between any two of its points). The vertical scaling and shift are cosmetic --- convexity/concavity is preserved.

The convex graph is below the chord $AB$; the concave graph is above the chord $AB$.

Proposition — Position of the graph with respect to a secant

Let $f : I \to \mathbb{R}$ be convex on $I$, and let $x, y \in I$ with $x < y$. Then

the graph of $f$ lies \textcolor{colorprop}{below} the secant line through $(x, f(x))$ and $(y, f(y))$ on the segment $[x, y]$;
the graph of $f$ lies \textcolor{colorprop}{above} that same secant line on $I \setminus [x, y]$.

Proof

The secant has equation $L(t) = f(x) + \frac{f(y) - f(x)}{y - x} (t - x)$.

Below on $[x, y]$. For $t \in [x, y]$, write $t = (1 - \lambda) x + \lambda y$ with $\lambda = (t - x)/(y - x) \in [0, 1]$. By convexity of $f$ (Definition D1.1), $$ f(t) \le (1 - \lambda) f(x) + \lambda f(y) = L(t). $$
Above on $I \setminus [x, y]$. Take $t \in I$ with $t > y$ (the case $t < x$ is analogous). Set $\theta = (y - x)/(t - x) \in \,]0, 1[$. Then $$ y = (1 - \theta) x + \theta t, $$ a genuine convex combination of $x$ and $t$. By convexity of $f$, $$ f(y) \le (1 - \theta) f(x) + \theta f(t). $$ Hence $$ f(t) \ge \frac{f(y) - (1 - \theta) f(x)}{\theta}. $$ Substituting $\theta = (y - x)/(t - x)$ and $1 - \theta = (t - y)/(t - x)$ and rearranging: \begin{align*} f(t) &\ge \frac{(t - x) f(y) - (t - y) f(x)}{y - x} && \text{(substitute $\theta$, $1-\theta$)}
&= f(x) + \frac{f(y) - f(x)}{y - x} (t - x) && \text{(rearrange into secant form)}
&= L(t). \end{align*}

Figure --- graph below/above its secant

The graph of a convex function lies below its secant on $[x, y]$ and above the secant outside $[x, y]$.

Proposition — Slope-monotonicity characterization

Let $f : I \to \mathbb{R}$. The following are equivalent:

(i) $f$ is convex on $I$;
(ii) for every $a \in I$ and every $u, v \in I \setminus \{a\}$ with $u < v$, $$ \textcolor{colorprop}{\frac{f(u) - f(a)}{u - a} \le \frac{f(v) - f(a)}{v - a}}. $$

In particular, in (ii), $t \mapsto (f(t) - f(a))/(t - a)$ is nondecreasing on $I \setminus \{a\}$ for every $a \in I$.

Proof

$(i) \Rightarrow (ii)$. Fix $a \in I$ and $u < v$ in $I \setminus \{a\}$.
- Case $a < u < v$. Write $u = (1 - \lambda) a + \lambda v$ with $\lambda = (u - a)/(v - a) \in {]}0, 1[$. Convexity gives $$ f(u) \le (1 - \lambda) f(a) + \lambda f(v), $$ i.e. $f(u) - f(a) \le \lambda (f(v) - f(a))$. Dividing by $u - a = \lambda (v - a) > 0$ yields the claim.
- Case $u < v < a$. Symmetric: write $v = (1 - \mu) u + \mu a$ with $\mu = (v - u)/(a - u) \in {]}0, 1[$. Convexity gives $f(v) \le (1 - \mu) f(u) + \mu f(a)$, equivalently $f(v) - f(a) \le (1 - \mu)(f(u) - f(a))$. Dividing by $v - a = -(1 - \mu)(a - u) < 0$ flips the inequality and produces the claim.
- Case $u < a < v$. Write $$ a = (1 - \theta) u + \theta v, \qquad \theta = \frac{a - u}{v - u} \in \,]0, 1[. $$ By convexity, $$ f(a) \le (1 - \theta) f(u) + \theta f(v). $$ Since $1 - \theta = (v - a)/(v - u)$ and $\theta = (a - u)/(v - u)$, multiplying by $v - u > 0$: $$ (v - u) f(a) \le (v - a) f(u) + (a - u) f(v). $$ Rearranging, $$ (v - a) (f(a) - f(u)) \le (a - u) (f(v) - f(a)). $$ Dividing by the strictly positive number $(a - u)(v - a) > 0$: $$ \frac{f(a) - f(u)}{a - u} \le \frac{f(v) - f(a)}{v - a}. $$ Since $(f(a) - f(u))/(a - u) = (f(u) - f(a))/(u - a)$, we conclude $$ \frac{f(u) - f(a)}{u - a} \le \frac{f(v) - f(a)}{v - a}. $$
$(ii) \Rightarrow (i)$. Take $x < y$ in $I$ and $\lambda \in {]}0, 1[$; set $t = (1 - \lambda) x + \lambda y$, so $x < t < y$. Applying (ii) with anchor $a = x$ on the pair $(t, y)$ (both in $I \setminus \{x\}$, with $t < y$): $$ \frac{f(t) - f(x)}{t - x} \le \frac{f(y) - f(x)}{y - x}. $$ Since $t - x = \lambda (y - x) > 0$, multiplying by $t - x$ gives $$ f(t) - f(x) \le \lambda (f(y) - f(x)), $$ i.e. $f(t) \le (1 - \lambda) f(x) + \lambda f(y)$. The boundary cases $\lambda \in \{0, 1\}$ are immediate. Hence $f$ is convex.

Skills to practice

Computing an expectation by linearity

I.4 Independence and the product formula

Jensen's inequality is the finite-point version of convexity: it extends the two-point chord inequality of D1.1 to any finite convex combination. We stay within finite combinations only --- any general development on barycentres is out of program.

Theorem — Jensen's inequality (finite case)

Let $f : I \to \mathbb{R}$ be convex. For every integer $n \ge 1$, every $x_1, \dots, x_n \in I$, and every $\lambda_1, \dots, \lambda_n \in [0, 1]$ with $\lambda_1 + \dots + \lambda_n = 1$, $$ \textcolor{colorprop}{f\!\left( \sum_{k=1}^n \lambda_k x_k \right) \le \sum_{k=1}^n \lambda_k f(x_k)}. $$

Proof

By induction on $n$.

Base $n = 1$. The sum reduces to $f(x_1) \le f(x_1)$, trivial.
Base $n = 2$. The inequality is exactly Definition D1.1.
Heredity. Assume the inequality at rank $n \ge 2$ and prove it at rank $n + 1$. Fix $x_1, \dots, x_{n+1} \in I$ and $\lambda_1, \dots, \lambda_{n+1} \in [0, 1]$ with $\lambda_1 + \dots + \lambda_{n+1} = 1$.
- If $\lambda_{n+1} = 1$, then $\lambda_1 = \dots = \lambda_n = 0$ and both sides reduce to $f(x_{n+1})$.
- Otherwise $\mu := 1 - \lambda_{n+1} = \lambda_1 + \dots + \lambda_n > 0$. Set $$ z := \frac{1}{\mu} \sum_{k=1}^n \lambda_k x_k. $$ The coefficients $\lambda_k / \mu$ ($k = 1, \dots, n$) are nonnegative (zero allowed) and sum to $1$, so $z$ is a convex combination of $x_1, \dots, x_n \in I$. Since $I$ is an interval (in particular, convex as a subset of $\mathbb{R}$), $z \in I$. Now $\sum_{k=1}^{n+1} \lambda_k x_k = \mu z + \lambda_{n+1} x_{n+1}$, with $\mu + \lambda_{n+1} = 1$, $\mu, \lambda_{n+1} \in [0, 1]$. Applying D1.1 (the two-point case) to $z, x_{n+1}$ with weights $\mu, \lambda_{n+1}$: $$ f\!\left( \sum_{k=1}^{n+1} \lambda_k x_k \right) = f(\mu z + \lambda_{n+1} x_{n+1}) \le \mu f(z) + \lambda_{n+1} f(x_{n+1}). $$ By the induction hypothesis applied to $z = \sum (\lambda_k/\mu) x_k$, $$ f(z) \le \sum_{k=1}^n \frac{\lambda_k}{\mu} f(x_k). $$ Therefore $$ \mu f(z) \le \sum_{k=1}^n \lambda_k f(x_k), $$ and summing with $\lambda_{n+1} f(x_{n+1})$ gives $$ f\!\left( \sum_{k=1}^{n+1} \lambda_k x_k \right) \le \sum_{k=1}^{n+1} \lambda_k f(x_k). $$

Remark --- continuity on an open interval

A convex function on an open interval is continuous. This useful theorem is admitted here.
(The proof uses the slope-monotonicity characterization P1.2 plus the monotone-limit theorem of Limits and continuity.)

Ex 6 Ex 7

Skills to practice

Applying independence to compute $\mathbb{E}(XY)$

I.5 Expectations of the standard laws

Ex 8 Ex 9 Ex 10

Skills to practice

Computing the expectation of a custom law

II Variance$\virgule$ covariance$\virgule$ inequalities and the weak law

Ex 11

II.1 Second moment$\virgule$ Cauchy-Schwarz

Ex 12

When $f$ is (twice) differentiable, convexity has clean derivative-level characterizations: $f$ convex $\Leftrightarrow$ $f'$ nondecreasing $\Leftrightarrow$ graph above all tangents. The $C^2$ version --- $f$ convex $\Leftrightarrow$ $f'' \ge 0$ --- gives the operational test used everywhere. The tangent inequality is the engine behind the classical bounds $e^x \ge 1 + x$, $\ln(1+x) \le x$, $\sin x \le x$, and $\sqrt{x} \le (x+1)/2$.
Derivative convention. In derivative statements below, either $I$ is open, or the assertions involving $f'(a)$ are read for interior points $a \in \mathring{I}$. When an endpoint is used in an example, the function is understood as the restriction of a differentiable function defined on a larger open interval.

Theorem — Characterization of convex differentiable functions

Let $f : I \to \mathbb{R}$ be differentiable on $I$. The following are equivalent:

(i) $f$ is convex on $I$;
(ii) $f'$ is nondecreasing on $I$;
(iii) For every $a \in I$ and every $x \in I$, $\textcolor{colorprop}{f(x) \ge f(a) + f'(a)(x - a)}$ (the graph of $f$ lies above all its tangents).

Proof

Cyclic implication $(i) \Rightarrow (ii) \Rightarrow (iii) \Rightarrow (i)$.

$(i) \Rightarrow (ii)$. Fix $u < v$ in $I$. For every $w \in \,]u, v[$, the three-point slope inequality P1.3 applied to $u < w < v$ gives $$ \frac{f(w) - f(u)}{w - u} \le \frac{f(v) - f(u)}{v - u} \le \frac{f(v) - f(w)}{v - w}. $$ Letting $w \to u^+$ in the left inequality gives $$ f'(u) \le \frac{f(v) - f(u)}{v - u}. $$ Letting $w \to v^-$ in the right inequality gives $$ \frac{f(v) - f(u)}{v - u} \le f'(v). $$ Chaining: $f'(u) \le f'(v)$. Hence $f'$ is nondecreasing.
$(ii) \Rightarrow (iii)$. Fix $a \in I$ and $x \in I$. Define $\varphi(t) := f(t) - f(a) - f'(a)(t - a)$ for $t \in I$. Then $\varphi$ is differentiable on $I$ with $\varphi'(t) = f'(t) - f'(a)$. Since $f'$ is nondecreasing (hypothesis ii), $\varphi'(t) \le 0$ for $t \le a$ and $\varphi'(t) \ge 0$ for $t \ge a$. By the sign-of-$f'$ $\Leftrightarrow$ monotonicity theorem from Differentiability P5.1, $\varphi$ is nonincreasing on $I \cap {]}-\infty, a]$ and nondecreasing on $I \cap [a, +\infty[$. Combined with $\varphi(a) = 0$, this gives $\varphi(t) \ge 0$ for all $t \in I$, i.e. $f(t) \ge f(a) + f'(a)(t - a)$.
$(iii) \Rightarrow (i)$. Fix $x, y \in I$ and $\lambda \in [0, 1]$; set $a := (1 - \lambda) x + \lambda y \in I$. By (iii) at $a$ evaluated at $t = x$ and $t = y$: $$ f(x) \ge f(a) + f'(a)(x - a), \qquad f(y) \ge f(a) + f'(a)(y - a). $$ Multiply the first by $1 - \lambda \ge 0$, the second by $\lambda \ge 0$, and add. The coefficients of $f'(a)$ combine as $$ (1 - \lambda)(x - a) + \lambda (y - a) = (1 - \lambda) x + \lambda y - a = 0, $$ so $f'(a)$ drops out. We obtain $$ (1 - \lambda) f(x) + \lambda f(y) \ge f(a) = f((1 - \lambda) x + \lambda y), $$ which is the convexity inequality D1.1. Hence $f$ is convex.

Remark --- concave dual

For $f$ concave differentiable, the inequalities reverse. In particular, the graph of $f$ lies below every one of its tangents: $$ \forall a, x \in I, \qquad f(x) \le f(a) + f'(a)(x - a). $$ This is the engine behind the bounds $\ln(1+x) \le x$, $\sqrt{x} \le (x+1)/2$, and $\sin x \le x$ proved below.

Method — Show that $f$ is convex (or concave) from $f''$

Given $f$ at least $C^2$ on $I$:

Compute $f''$ explicitly.
Study the sign of $f''$ on $I$ (factor, use known signs, monotonicity).
Conclude with P2.1: $f''(x) \ge 0$ on $I$ $\Rightarrow$ $f$ convex; $f''(x) \le 0$ on $I$ $\Rightarrow$ $f$ concave.

Caveat. The conclusion needs a sign that holds on the whole interval $I$. If $f''$ changes sign, $f$ is convex on the subinterval where $f'' \ge 0$ and concave on the subinterval where $f'' \le 0$ --- it is neither on $I$ as a whole.

Example

$\ln$ is concave on $\mathbb{R}_+^*$. Deduce $\ln(1 + x) \le x$ for $x > -1$ and $\ln x \le x - 1$ for $x > 0$.

Answer

$\ln$ is $C^2$ on $\mathbb{R}_+^*$ with $(\ln)''(x) = -1/x^2 < 0$, so by P2.1 (concave case), $\ln$ is concave on $\mathbb{R}_+^*$. Apply the concave-dual tangent inequality:

At $a = 1$: $(\ln)(1) = 0$, $(\ln)'(1) = 1$, so the tangent is $y = x - 1$. Hence $\ln x \le x - 1$ for $x > 0$.
Setting $u = 1 + x$ (so $u > 0$ iff $x > -1$) in the previous: $\ln(1 + x) \le (1 + x) - 1 = x$ for $x > -1$.

Example

$\sin$ is concave on $[0, \pi/2]$. Deduce the two inequalities $\sin x \ge \dfrac{2 x}{\pi}$ and $\sin x \le x$, valid for $x \in [0, \pi/2]$.

Answer

$\sin$ is $C^2$ on $[0, \pi/2]$ with $(\sin)''(x) = -\sin x \le 0$ on this interval, so by P2.1 (concave case), $\sin$ is concave on $[0, \pi/2]$. The two inequalities come from two distinct geometric facts:

Above-chord rule (concave version of P1.1). On $[0, \pi/2]$, the graph of $\sin$ lies above its chord between $(0, 0)$ and $(\pi/2, 1)$. That chord has equation $y = (2/\pi) x$, so $$ \sin x \ge \frac{2 x}{\pi} \qquad (x \in [0, \pi/2]). $$
Below-tangent rule (concave dual of T2.1(iii)). The tangent at $a = 0$ is $y = (\sin)(0) + (\sin)'(0) (x - 0) = x$. So $$ \sin x \le x \qquad (x \in [0, \pi/2]). $$

The inequality $\sin x \le x$ in fact holds for every $x \ge 0$: the function $h(x) = x - \sin x$ satisfies $h(0) = 0$ and $h'(x) = 1 - \cos x \ge 0$ on $\mathbb{R}$, hence $h$ is nondecreasing on $\mathbb{R}_+$, giving $h(x) \ge h(0) = 0$ for every $x \ge 0$.

Skills to practice

Applying Cauchy-Schwarz

II.2 Variance and König-Huygens

Method — Bound a numerical expression by a convexity inequality

To prove a bound of the form $f(x) \ge L(x)$ (or $\le$, for concave) where $L$ is affine:

Identify a function $f$ for which the bound is the tangent inequality (or the chord inequality) at a specific point $a$.
Verify that $f$ is convex (or concave) on the relevant interval, by sign of $f''$ via P2.1.
Compute $f(a)$ and $f'(a)$; check that $L(x) = f(a) + f'(a)(x - a)$ is indeed the target affine bound.
Conclude by T2.1(iii) for the tangent case, or by P1.1 / above-chord rule for the chord case.

Recognising the right anchor. The anchor $a$ is the point where the bound is sharpest (equality). For $e^x \ge 1 + x$ this is $a = 0$; for $\ln x \le x - 1$ this is $a = 1$; for $\sqrt{x} \le (x+1)/2$ this is $a = 1$.

Example

Show that $\sqrt{x} \le \dfrac{x + 1}{2}$ for every $x \ge 0$.

Answer

The function $g(x) = \sqrt{x}$ is $C^2$ on $\mathbb{R}_+^*$ with $g''(x) = -\frac{1}{4} x^{-3/2} < 0$. By P2.1 (concave case), $g$ is concave on $\mathbb{R}_+^*$. Apply the concave-dual tangent inequality at $a = 1$: $g(1) = 1$, $g'(1) = 1/2$, so the tangent is $y = 1 + \frac{1}{2}(x - 1) = (x + 1)/2$. Hence $$ \sqrt{x} \le \frac{x + 1}{2} \qquad (x > 0). $$ For $x = 0$, the inequality becomes $0 \le 1/2$, which is true. By continuity of $g$ at $0$ (or by direct check), the inequality extends to $x \ge 0$.

Remark --- strict convexity

A function $f : I \to \mathbb{R}$ is strictly convex on $I$ if, for every $x \ne y$ in $I$ and every $\lambda \in {]}0, 1[$, $f((1 - \lambda) x + \lambda y) < (1 - \lambda) f(x) + \lambda f(y)$ (strict inequality). For differentiable functions, $f'$ strictly increasing on $I$ implies $f$ strictly convex. For $f \in C^2$, the condition $f'' > 0$ on $I$ is sufficient but not necessary --- the function $x \mapsto x^4$ is strictly convex on $\mathbb{R}$ yet $f''(0) = 0$.

Ex 13 Ex 14

Skills to practice

Computing a variance

II.3 Covariance$\virgule$ bilinearity$\virgule$ Bienaymé identity

Ex 15 Ex 16 Ex 17 Ex 18

Definition — Inflection point

Let $f : I \to \mathbb{R}$ and $a$ an interior point of $I$. We say that $f$ has an inflection point at $a$ if there exists $\eta > 0$ such that $[a - \eta, a + \eta] \subset I$, $f$ is not affine on any neighborhood of $a$, and one of the following holds:

$f$ is convex on $[a - \eta, a]$ and concave on $[a, a + \eta]$, or
$f$ is concave on $[a - \eta, a]$ and convex on $[a, a + \eta]$.

The non-affine condition forces the convex/concave character of $f$ to change effectively at $a$: without it, every point of an affine function would qualify (since an affine function is both convex and concave on every interval).

Pitfall --- $f''(a) \equal 0$ is not sufficient

For a $C^2$ function satisfying the convex/concave-transition definition above, $f''(a) = 0$ is necessary but not sufficient. The standard counterexample is $f(x) = x^4$ at $a = 0$: $f''(x) = 12 x^2$ vanishes at $0$ but is nonnegative everywhere, so $f$ is convex on all of $\mathbb{R}$ --- there is no convex/concave transition, hence no inflection point. The exo file drills this pitfall explicitly.

Method — Find the inflection points of a function

For $f$ at least $C^2$ on $I$:

Compute $f''$ explicitly on $I$.
Solve $f''(x) = 0$ to obtain the candidate set $\{a_1, a_2, \dots\}$.
For each candidate $a_i$, study the sign of $f''$ on a neighborhood of $a_i$ and check whether it changes sign at $a_i$.
Conclude by P3.1: every candidate at which $f''$ changes sign gives an inflection point; candidates where the sign does not change are discarded in the usual $C^2$ setting.

Discard candidates where the sign does not change. A zero of $f''$ where the sign stays the same (e.g. a double zero of $f''$) is not an inflection point. The $x^4$ example above is the canonical illustration.

Figure --- cubic with its inflection point

The cubic $f(x) = x^3 - 3 x$ has $f''(x) = 6 x$, so the unique inflection point is at $x = 0$. The graph is concave on $]-\infty, 0]$ and convex on $[0, +\infty[$.

Ex 19

Skills to practice

Applying the Bienaymé identity

II.4 Markov and Bienaymé-Tchebychev inequalities

Ex 20 Ex 21 Ex 22 Ex 23 Ex 24 Ex 25

Skills to practice

Bounding a probability by Markov or Bienaymé-Tchebychev

II.5 Weak law of large numbers

The integers $\mathbb{Z}$ form, under their usual addition and multiplication, a self-contained world rich enough to host an entire theory of divisibility. This chapter develops that theory in five steps. We start from the divisibility relation $b \mid a$ and the Euclidean division theorem $a = bq + r$, the unique algorithmic device that converts the question « does $b$ divide $a$? » into the question « is the remainder zero? ». We then build on Euclidean division to construct the greatest common divisor $a \wedge b$, both via the iterative algorithm of Euclid and via the linear-combination identity of Bézout, $a \wedge b = a u + b v$. We extract the special case $a \wedge b = 1$ (coprime integers) and prove Bézout's theorem and Gauss's lemma --- the two engines that drive every divisibility argument in the rest of the chapter. We introduce the prime numbers, prove their infinitude (Euclid) and the existence/uniqueness of the prime factorization $n = \prod_p p^{v_p(n)}$, and use the $p$-adic valuation $v_p$ to give clean closed-form expressions for $a \wedge b$ and $a \vee b$. We close by returning to the divisibility relation in a different guise --- congruences modulo $n$ --- and use the previous machinery to characterize invertibility modulo $n$ ($a \wedge n = 1$) and to prove Fermat's little theorem $a^p \equiv a \pmod p$ for $p$ prime.
The approach throughout is elementary: by convention in this chapter, we do not call upon the language of algebraic structures. We therefore avoid, deliberately, all references to groups, rings, ideals, and morphisms; the set $\mathbb{Z}/n\mathbb{Z}$ appears in the section on congruences as a set of $n$ classes only, with an explicit reminder that its ring structure is hors programme and will be revisited in the chapter Algebraic structures. Two further chapters extend what we do here: Arithmetic of polynomials replays the entire story for $\mathbb{K}[X]$ in place of $\mathbb{Z}$ (Euclidean division, gcd, Bézout, Gauss, irreducibility), and Algebraic structures re-frames divisibility, ideals, and modular arithmetic in the abstract.
Standing convention. Throughout this chapter, a divisor is always a non-zero integer. Whenever we write « $d \mid a$ », we silently assume $d \in \mathbb{Z}^*$. In particular $\operatorname{div}(0) = \mathbb{Z}^*$ (every non-zero integer divides $0$). We use $\mathbb{N} = \{0, 1, 2, \dots\}$ and $\mathbb{N}^* = \mathbb{N} \setminus \{0\}$. The notations $a \wedge b$ (greatest common divisor) and $a \vee b$ (least common multiple) are standard.

Two definitions, one theorem. The definition of divisibility is the entry point of the chapter; Euclidean division is the algorithmic tool that turns « $b \mid a$? » into « is the remainder zero? ». The Grade 12 « Spécialité » baseline already covers both --- we recall them quickly, then state and prove rigorously and extend the Euclidean division statement to negative dividends.

Definition — Divisibility

Let $a \in \mathbb{Z}$ and $b \in \mathbb{Z}^*$. We say that $b$ divides $a$, and we write $b \mid a$, if there exists $k \in \mathbb{Z}$ such that $a = b k$. We then also say that $a$ is a multiple of $b$, or that $b$ is a divisor of $a$.
Notations:

$a \mathbb{Z} = \{a k : k \in \mathbb{Z}\}$ is the set of multiples of $a$;
$\operatorname{div}(a) = \{d \in \mathbb{Z}^* : d \mid a\}$ is the set of divisors of $a$ (a subset of $\mathbb{Z}^*$ by the standing convention).

Example

For every $a \in \mathbb{Z}^*$, $1 \mid a$, $-1 \mid a$, $a \mid a$, and $-a \mid a$. So $\{\pm 1, \pm a\} \subset \operatorname{div}(a)$ always; for $a \in \mathbb{N}$ with $a \geq 2$, the equality $\operatorname{div}(a) = \{\pm 1, \pm a\}$ characterizes the prime numbers (see the section on primes below).

Proposition — Properties of divisibility

Let $a, b, c, d \in \mathbb{Z}$ and $u, v \in \mathbb{Z}$.

Reflexivity and transitivity. For every $a \in \mathbb{Z}^*$, $a \mid a$. If $a \mid b$ and $b \mid c$ (with $a, b \in \mathbb{Z}^*$), then $a \mid c$.
Associated integers. For $a, b \in \mathbb{Z}^*$, $a \mid b \text{ and } b \mid a \iff |a| = |b|$. Such integers are called associated. In particular, restricted to $\mathbb{N}^*$ this gives the antisymmetry of $\mid$: $a \mid b$ and $b \mid a \iff a = b$.
Linear combinations. If $d \mid a$ and $d \mid b$, then $d \mid (a u + b v)$ for every $u, v \in \mathbb{Z}$.
Product. If $a \mid b$ and $c \mid d$ (with $a, c \in \mathbb{Z}^*$), then $a c \mid b d$. In particular $a^k \mid b^k$ for every $k \in \mathbb{N}^*$ when $a \mid b$.

Consequence: divisibility $\mid$ is an order relation on $\mathbb{N}^*$ (reflexive, transitive, antisymmetric), but only reflexive and transitive on $\mathbb{Z}^*$.

Proof

Reflexivity / transitivity. $a = a \cdot 1$ gives $a \mid a$. If $b = a k$ and $c = b k'$, then $c = a (k k')$ with $k k' \in \mathbb{Z}$, so $a \mid c$.
Antisymmetry on $\mathbb{N}^*$. If $a \mid b$ and $b \mid a$, write $b = a k$ and $a = b k'$. Then $a = a (k k')$, so $k k' = 1$ (since $a \neq 0$). With $k, k' \in \mathbb{Z}$, this forces $k = k' = \pm 1$, hence $|a| = |b|$. Conversely if $|a| = |b|$ then $a = \pm b$, so each divides the other.
Linear combinations. From $a = d k$ and $b = d k'$, $a u + b v = d (k u + k' v) \in d \mathbb{Z}$.
Product. From $b = a k$ and $d = c k'$, $b d = (a c)(k k')$, so $a c \mid b d$.

Example

$5 \mid 15$ and $5 \mid 25$, hence by the linear-combination property $5 \mid (15 \cdot 3 - 25 \cdot 2) = 45 - 50 = -5$, in agreement with $5 \mid (-5)$.

Skills to practice

Applying the weak law of large numbers

II.6 Synthesis: expectations and variances of the standard laws

Method — Decide whether $b \mid a$

Two equivalent tests for $b \in \mathbb{Z}^*$, $a \in \mathbb{Z}$:

Direct quotient test. Compute $a / b$ in $\mathbb{Q}$; check whether the result is an integer.
Remainder test (preferred when $a, b$ are large). Compute the Euclidean division $a = b q + r$ with $0 \leq r < |b|$; then $b \mid a \iff r = 0$ (next subsection).

For very large $|a|$, one can also use a congruence reduction: reduce $a$ modulo $b$ first using simpler intermediate steps (e.g.\ for $b = 9$, $a$ is congruent modulo $9$ to the sum of its decimal digits --- see the section on congruences below).

Ex 26

Skills to practice

Recognising a standard law from its moments

III Generating functions of an $\mathbb{N}$-valued random variable

Ex 27

III.1 Definition; radius of convergence

Ex 28

Euclidean division turns the divisibility question into an algorithmic one. The Grade 12 statement covers $a \in \mathbb{N}$, $b \in \mathbb{N}^*$; we extend to $a \in \mathbb{Z}$, with the same divisor $b \in \mathbb{N}^*$ and the same remainder constraint $0 \leq r < b$. The negative case is the classical trap: students must remember that $r$ is always non-negative.

Theorem — Euclidean division

For every $a \in \mathbb{Z}$ and every $b \in \mathbb{N}^*$, there exists a unique pair $(q, r) \in \mathbb{Z} \times \mathbb{N}$ such that $$ \textcolor{colorprop}{a = b q + r \quad \text{and} \quad 0 \leq r < b.} $$ The integer $a$ is the dividend, $b$ the divisor, $q$ the quotient and $r$ the remainder of the Euclidean division of $a$ by $b$. Moreover $q = \lfloor a / b \rfloor$ (where $\lfloor \cdot \rfloor$ is the floor function from $\mathbb{R}$ to $\mathbb{Z}$).

Proof

Existence. Set $R = \{a - b k : k \in \mathbb{Z}\} \cap \mathbb{N}$. The set $R$ is non-empty: if $a \geq 0$, then $a = a - b \cdot 0 \in R$; if $a < 0$, then $k = a$ gives $a - b a = a (1 - b) \geq 0$ since $1 - b \leq 0$ and $a < 0$, hence $a - b a \in R$. As a non-empty subset of $\mathbb{N}$, $R$ has a smallest element; call it $r$, and write $r = a - b q$ for some $q \in \mathbb{Z}$, i.e.\ $a = b q + r$ with $r \geq 0$. It remains to show $r < b$. Suppose for contradiction $r \geq b$. Then $r - b \in \mathbb{N}$ and $r - b = a - b (q + 1) \in R$, contradicting the minimality of $r$. Hence $r < b$.
Uniqueness. Suppose $a = b q + r = b q' + r'$ with $0 \leq r, r' < b$. Then $b (q - q') = r' - r$, so $|b (q - q')| = |r' - r| < b$, forcing $|q - q'| < 1$. Since $q - q' \in \mathbb{Z}$, $q = q'$, and then $r = a - b q = a - b q' = r'$.
Formula for $q$. From $a = b q + r$ with $0 \leq r < b$, divide by $b > 0$: $a / b = q + r / b$ with $0 \leq r/b < 1$, hence $q = \lfloor a / b \rfloor$.

Example — Negative dividend --- a classical trap

For $a = -23$, $b = 5$: not $-23 = 5 \cdot (-4) - 3$, because the remainder would then be $-3 < 0$. The correct Euclidean division is $$ \textcolor{colorprop}{-23 = 5 \cdot (-5) + 2,} $$ so $q = -5$ and $r = 2$. Check: $\lfloor -23 / 5 \rfloor = \lfloor -4{.}6 \rfloor = -5$ (the floor of $-4.6$ is $-5$, not $-4$). The remainder is always in $\{0, 1, \dots, b - 1\}$.

Ex 29 Ex 30

Skills to practice

Computing a generating function

III.2 Independence and the product formula

Ex 31

The greatest common divisor (gcd) of two integers $a$ and $b$ is, intuitively, the largest integer that divides both. Two subtleties stand out: (i) « largest » needs to be qualified --- largest in the natural order $\leq$ on $\mathbb{N}^*$ vs.\ largest in the divisibility order $\mid$ ---, and the gcd turns out to be the largest in both senses; (ii) the gcd is computed effectively by Euclid's algorithm, an iterated Euclidean division. We then prove the central identity of the chapter: Bézout's identity $a \wedge b = a u + b v$ for some $u, v \in \mathbb{Z}$. From there the « common-divisors-divide-the-gcd » characterization falls as a corollary, and the lcm $a \vee b$ (and the formula $(a \wedge b)(a \vee b) = |a b|$) follow naturally.

Definition — Greatest common divisor (gcd)

Let $a, b \in \mathbb{N}$ not both zero. The greatest common divisor of $a$ and $b$, denoted $a \wedge b$, is the largest element of $\{d \in \mathbb{N}^* : d \mid a \text{ and } d \mid b\}$ for the natural order $\leq$ on $\mathbb{N}^*$. By convention, $0 \wedge 0 = 0$.
Extension to $\mathbb{Z}$. For $a, b \in \mathbb{Z}$ not both zero, define $a \wedge b := |a| \wedge |b|$.
The set $\{d \in \mathbb{N}^* : d \mid a \text{ and } d \mid b\}$ is non-empty (it contains $1$). It is also finite: if both $a$ and $b$ are non-zero, every common divisor is bounded above by $\min(|a|, |b|)$; if for instance $a = 0$ and $b \neq 0$, the common positive divisors are exactly the positive divisors of $|b|$, hence are bounded above by $|b|$. Therefore the maximum exists.

The algorithmic computation of the gcd rests on a single observation: when one integer is reduced modulo the other, the gcd doesn't change. Iterated, this rule gives the algorithm of Euclid, and the gcd is read off as the last non-zero remainder.

Theorem — Existence of the gcd via Euclid's algorithm

Let $a, b \in \mathbb{N}$ with $0 < b \leq a$. Define a sequence of remainders by $$ r_0 = a, \quad r_1 = b, \quad r_{k+2} = \text{Euclidean remainder of } r_k \text{ divided by } r_{k+1} \quad \text{as long as } r_{k+1} > 0. $$ The sequence terminates: there exists $N \geq 1$ such that $r_N = 0$, and the gcd is the last non-zero remainder: $$ \textcolor{colorprop}{a \wedge b = r_{N - 1}.} $$

Proof

Termination. As long as $r_{k+1} > 0$, the construction yields $0 \leq r_{k+2} < r_{k+1}$, so the sequence of strictly positive remainders $r_1, r_2, \dots$ is strictly decreasing in $\mathbb{N}^*$. A strictly decreasing sequence in $\mathbb{N}$ is finite; some $r_N = 0$.
Correctness. By the « idée fondamentale » applied at each step (Euclidean division of $r_k$ by $r_{k+1}$ gives quotient $q_k$ and remainder $r_{k+2}$, with $r_k = r_{k+1} q_k + r_{k+2}$), $r_k \wedge r_{k+1} = r_{k+1} \wedge r_{k+2}$ for every $k \in \{0, \dots, N-2\}$. Chaining, $$ a \wedge b = r_0 \wedge r_1 = r_1 \wedge r_2 = \cdots = r_{N-1} \wedge r_N = r_{N-1} \wedge 0 = r_{N-1}. $$

Skills to practice

Determining the law of $X + Y$ by $G_X G_Y$

III.3 Moments from the generating function

Method — Compute $a \wedge b$ by Euclid's algorithm

Suppose $0 < b \leq a$. Tabulate the successive Euclidean divisions:

Step 1: $a = b \, q_1 + r_2$ with $0 \leq r_2 < b$.
Step 2: $b = r_2 \, q_2 + r_3$ with $0 \leq r_3 < r_2$.
Step $k$: $r_k = r_{k+1} \, q_{k+1} + r_{k+2}$ with $0 \leq r_{k+2} < r_{k+1}$.
Stop as soon as a remainder is zero. The last non-zero remainder is $a \wedge b$.

For $a, b \in \mathbb{Z}$ general, replace $a, b$ by $|a|, |b|$.

Example

Compute $1542 \wedge 58$.

Answer

Apply Euclid's algorithm: $$ \begin{aligned} 1542 &= 26 \cdot 58 + 34, \\ 58 &= 1 \cdot 34 + 24, \\ 34 &= 1 \cdot 24 + 10, \\ 24 &= 2 \cdot 10 + 4, \\ 10 &= 2 \cdot 4 + 2, \\ 4 &= 2 \cdot 2 + 0. \end{aligned} $$ The last non-zero remainder is $2$, so $1542 \wedge 58 = 2$.

Ex 32

Bézout's identity is the linear-combination side of the gcd: $a \wedge b$ writes as $a u + b v$ for some $u, v \in \mathbb{Z}$. The proof reuses the Euclid sequence: each remainder $r_k$ writes as a $\mathbb{Z}$-linear combination of $a$ and $b$, by induction. Practically, one runs Euclid's algorithm forward and then back-substitutes to extract a pair $(u, v)$.

Example — Non-uniqueness of Bézout coefficients

$4 \wedge 6 = 2$, and one can verify both $$ 2 = (-1) \cdot 4 + 1 \cdot 6 = 2 \cdot 4 + (-1) \cdot 6, $$ so $(u, v) = (-1, 1)$ and $(u, v) = (2, -1)$ are two distinct pairs of Bézout coefficients.

Method — Extended Euclidean algorithm

To compute a pair of Bézout coefficients, run the Euclid algorithm forward, then back-substitute from the last non-zero remainder up to the original integers.
Step 1 (forward). Tabulate $r_0 = a$, $r_1 = b$, $r_{k+2} = r_k - q_k r_{k+1}$ until some $r_N = 0$. The gcd is $r_{N-1}$.
Special case $b \mid a$. If the first remainder is $0$ (i.e.\ $a = b q_0$), then $a \wedge b = b$ and a Bézout identity is immediate: $b = 0 \cdot a + 1 \cdot b$. No back-substitution needed.
Step 2 (back-substitution). Otherwise (the algorithm has at least two non-trivial divisions), starting from $r_{N-1} = r_{N-3} - q_{N-3} r_{N-2}$, repeatedly substitute the previous equation $r_{N-2} = r_{N-4} - q_{N-4} r_{N-3}$ to express $r_{N-1}$ as a $\mathbb{Z}$-linear combination of $r_{N-3}$ and $r_{N-4}$, then of $r_{N-5}$ and $r_{N-4}$, \ldots, until reaching a combination of $r_0 = a$ and $r_1 = b$.

Example

Compute Bézout coefficients of $3080$ and $525$.

Answer

Step 1 (forward Euclid). $$ \begin{aligned} 3080 &= 5 \cdot 525 + 455, \\ 525 &= 1 \cdot 455 + 70, \\ 455 &= 6 \cdot 70 + 35, \\ 70 &= 2 \cdot 35 + 0. \end{aligned} $$ Last non-zero remainder: $35$, so $3080 \wedge 525 = 35$.
Step 2 (back-substitution). $$ \begin{aligned} 35 &= 455 - 6 \cdot 70 \\ &= 455 - 6 \cdot (525 - 1 \cdot 455) = 7 \cdot 455 - 6 \cdot 525 \\ &= 7 \cdot (3080 - 5 \cdot 525) - 6 \cdot 525 = 7 \cdot 3080 - 41 \cdot 525. \end{aligned} $$ Hence $(u, v) = (7, -41)$ is a pair of Bézout coefficients: $35 = 7 \cdot 3080 - 41 \cdot 525$. Verification: $7 \cdot 3080 = 21560$, $41 \cdot 525 = 21525$, $21560 - 21525 = 35$. $\checkmark$

Skills to practice

Computing $\mathbb{E}$ and $\mathrm{V}$ via $G_X$

III.4 Generating functions of the standard laws

Ex 33 Ex 34

With Bézout's identity in hand, the characterization of the gcd by its set of divisors is now an immediate corollary.

Proposition — Common divisors are divisors of the gcd

For all $a, b \in \mathbb{Z}$ not both zero, $$ \textcolor{colorprop}{\operatorname{div}(a) \cap \operatorname{div}(b) = \operatorname{div}(a \wedge b).} $$ Equivalently: a non-zero integer divides both $a$ and $b$ if and only if it divides $a \wedge b$. As a consequence, $a \wedge b$ is also the greatest common divisor in the divisibility order $\mid$: every common divisor of $a$ and $b$ divides $a \wedge b$.

Proof

Set $d = a \wedge b$. We show $\operatorname{div}(a) \cap \operatorname{div}(b) = \operatorname{div}(d)$ by double inclusion.

$(\supset)$ Since $d = a \wedge b$ divides both $a$ and $b$ (by definition), every divisor of $d$ also divides $a$ and $b$ by transitivity, hence belongs to $\operatorname{div}(a) \cap \operatorname{div}(b)$.
$(\subset)$ Let $\delta \in \operatorname{div}(a) \cap \operatorname{div}(b)$. By Bézout's identity, $d = a u + b v$ for some $u, v \in \mathbb{Z}$. Since $\delta \mid a$ and $\delta \mid b$, the linear-combination property gives $\delta \mid (a u + b v) = d$, so $\delta \in \operatorname{div}(d)$.

Skills to practice

Recognising a law from its generating function