CommeUnJeu · L1 MPSI

Functions of two variables

⌚ ~147 min ▢ 18 blocks ✓ 54 exercises ➣ Prerequisites : Differentiability, Real functions

In this chapter, a function of two variables is a map $f : \Omega \to \mathbb R$, where $\Omega$ is an open subset of $\mathbb R^2$. Its graph is a surface in $\mathbb R^3$; differentiating it means picking a direction in the plane and reading off the slope of the surface in that direction. The point of view is practical: compute partial derivatives, master the chain rule, build a geometric intuition for the gradient.
The chapter has four sections. The first section sets up the ambient space: the Euclidean norm on $\mathbb R^2$, open balls, open subsets, and the visual representation of $f$ by a surface and its level lines in the $(x, y)$-plane. The second section introduces the two partial derivatives $\partial f / \partial x$ and $\partial f / \partial y$ by "freezing" one variable at a time, the class $\mathcal C^1$ (partial derivatives that exist and are continuous), the first-order expansion for $\mathcal C^1$ functions, the tangent plane to the surface, and the gradient $\nabla f$. The third section is the technical heart of the chapter: the directional derivative $\mathrm D_v f(a)$, the central formula $\mathrm D_v f(a) = \langle \nabla f(a), v\rangle$ that ties it to the gradient, the chain rule along a parameterized arc $t \mapsto f(x(t), y(t))$, the geometric meaning of the gradient (steepest-ascent direction; orthogonal to level lines), and the chain rule for a change of variables $(u, v) \mapsto f(x(u, v), y(u, v))$. The fourth section treats extrema: local max and min, critical points ($\nabla f = (0, 0)$), the necessary condition "extremum $\Rightarrow$ critical point", and the standard analysis-synthesis method that studies a candidate critical point by exact algebraic rewriting of $f(a + h, b + k) - f(a, b)$.
The headline result the chapter is built around is the first-order expansion for a $\mathcal C^1$ function at a point $a$: $f(a + v) = f(a) + \langle \nabla f(a), v\rangle + o(\|v\|)$ as $v \to (0, 0)$. From this single expansion follow, in turn, the directional-derivative formula, both chain rules, the orthogonality of $\nabla f$ to the level lines, and the necessary condition for an extremum --- five geometric or computational statements unfolding from one first-order expansion.
Three reflexes the reader should leave with:

freeze a variable to compute a partial derivative;
the gradient $\nabla f$ packages the two partial derivatives and the DL$_1$ into one vector, so that every geometric statement re-expresses $\nabla f \cdot v$;
for extrema, study $f(a + h, b + k) - f(a, b)$ exactly --- always exact algebraic rewriting, never an asymptotic expansion.

I Open subsets of $\mathbb R^2$ and continuous functions

I.1 Euclidean norm and open ball

The plane $\mathbb R^2$ inherits its geometry from the identification $(x, y) \leftrightarrow x + iy \in \mathbb C$ : every notion already built on $\mathbb C$ in the chapter Topology of $\mathbb R$ and $\mathbb C$ --- modulus, ball, neighborhood, open subset --- transports verbatim. We give the definitions directly in $\mathbb R^2$-coordinates, then describe an open subset as a set that contains a small ball around each of its points.

Definition — Euclidean norm on $\mathbb R^2$

The Euclidean norm on $\mathbb R^2$ is the map $$ \| \cdot \| \ : \ \mathbb R^2 \to \mathbb R_+, \qquad \|(x, y)\| \ = \ \sqrt{x^2 + y^2}. $$ For $a = (x_a, y_a)$ and $b = (x_b, y_b)$ in $\mathbb R^2$, the real number $\|b - a\| = \sqrt{(x_b - x_a)^2 + (y_b - y_a)^2}$ is the Euclidean distance from $a$ to $b$.

Definition — Open ball

Let $a \in \mathbb R^2$ and $r > 0$. The open ball of center $a$ and radius $r$ is the set $$ \mathrm B(a, r) \ = \ \bigl\{ \, p \in \mathbb R^2 \ \mid \ \|p - a\| < r \, \bigr\}. $$

Example — Unit open disk

The open ball $\mathrm B\bigl((0, 0), 1\bigr) = \{(x, y) \in \mathbb R^2 \mid x^2 + y^2 < 1\}$ is the open unit disk: the interior of the unit circle, without the boundary.

Skills to practice

Computing norms and open balls

I.2 Open subsets of $\mathbb R^2$

The chapter only deals with functions defined on open subsets of $\mathbb R^2$ : the openness of the domain is what lets us look in every direction around any point, which is exactly what differential calculus needs. We recall the definition for ease of reference; the full topological framework is in Topology of $\mathbb R$ and $\mathbb C$.

Definition — Open subset of $\mathbb R^2$

A subset $\Omega \subset \mathbb R^2$ is called an open subset (or simply open) if, for every $a \in \Omega$, there exists $r > 0$ such that $\mathrm B(a, r) \subset \Omega$.

Example — Open ball is open

Every open ball $\mathrm B(a, r) \subset \mathbb R^2$ is itself an open subset of $\mathbb R^2$ : for $p \in \mathrm B(a, r)$, the radius $\rho = r - \|p - a\| > 0$ works, since $\mathrm B(p, \rho) \subset \mathrm B(a, r)$ by the triangle inequality.

Example — Open half-plane

The half-plane $\{(x, y) \in \mathbb R^2 \mid x > 0\}$ is open: for $(x_0, y_0)$ with $x_0 > 0$, the ball $\mathrm B\bigl((x_0, y_0), x_0\bigr)$ is contained in the half-plane (since every point $(x, y)$ in this ball satisfies $|x - x_0| \le \|(x, y) - (x_0, y_0)\| < x_0$, hence $x > 0$).

Example — Punctured plane

The set $\mathbb R^2 \setminus \{(0, 0)\}$ is open: for $a \ne (0, 0)$, the ball $\mathrm B(a, \|a\|)$ does not contain $(0, 0)$.

Skills to practice

Deciding whether a subset of $\mathbb R^2$ is open

I.3 Graphical representation by a surface

A function $f : \mathbb R \to \mathbb R$ is visualized by its graph, a curve in the plane. A function $f : \Omega \to \mathbb R$ with $\Omega \subset \mathbb R^2$ is visualized by its graph, a surface in $\mathbb R^3$. To picture this surface we intersect it with three families of planes: $x = \lambda$ gives a family of vertical cross-sections, $y = \lambda$ gives another, and $z = \lambda$ gives the level lines, which live in the $(x, y)$-plane and are the basic geometric tool of the chapter.

Definition — Graph and level lines

Let $\Omega \subset \mathbb R^2$ be an open subset and $f : \Omega \to \mathbb R$ a function. The graph of $f$ is the surface $$ \mathscr S \ = \ \bigl\{ \, (x, y, z) \in \mathbb R^3 \ \mid \ (x, y) \in \Omega \ \text{and} \ z = f(x, y) \, \bigr\} \ \subset \ \mathbb R^3. $$ For $\lambda \in \mathbb R$, the level line of $f$ at level $\lambda$ is the subset of $\Omega$ $$ \mathscr C_\lambda \ = \ \bigl\{ \, (x, y) \in \Omega \ \mid \ f(x, y) = \lambda \, \bigr\} \ \subset \ \mathbb R^2. $$ A level line is not drawn on the surface itself: it is drawn in the $(x, y)$-plane and consists of the points whose image has the same height $z = \lambda$.

Example — Paraboloid $z \equal x^2 + y^2$

Take $f(x, y) = x^2 + y^2$ on $\Omega = \mathbb R^2$. The graph is the paraboloid of revolution with vertex at the origin and axis $Oz$. For $\lambda < 0$, the level line $\mathscr C_\lambda$ is empty; for $\lambda = 0$, it reduces to the single point $(0, 0)$ ; for $\lambda > 0$, it is the circle of center $(0, 0)$ and radius $\sqrt \lambda$.

Example — Saddle $z \equal x^2 - y^2$ and its level lines

Take $f(x, y) = x^2 - y^2$ on $\Omega = \mathbb R^2$. The graph is a saddle surface. The level lines are: for $\lambda > 0$, the rectangular hyperbola $x^2 - y^2 = \lambda$ with branches along the $x$-axis; for $\lambda < 0$, the rectangular hyperbola with branches along the $y$-axis; for $\lambda = 0$, the two diagonals $y = x$ and $y = -x$ (a degenerate level line).

Level-line figure for $z \equal x^2 + y^2$

Below are the level lines of $f(x, y) = x^2 + y^2$ at $\lambda = 1, 2, 3, 4$ in the $(x, y)$-plane.

The four level lines are concentric circles. Note that $(1, 0)$, $(0, 1)$ and $(-1/\sqrt 2, 1/\sqrt 2)$ all lie on the level line $\lambda = 1$, even though the corresponding heights on the surface are all equal to $1$.

Skills to practice

Recognizing graphs and level lines

I.4 Limit and continuity

Continuity for a function of two variables is defined by the same $\varepsilon$--$\alpha$ recipe as in dimension one, with the Euclidean distance replacing the absolute value. The study of continuity of a function is not an objective of this chapter; we introduce continuity only as a tool for differential calculus and as a context for the class $\mathcal C^1$.

Definition — Limit at a point

Let $E \subset \mathbb R^2$, $f : E \to \mathbb R$ a function, $a \in \overline{E \setminus \{a\}}$ an accumulation point of $E$, and $\ell \in \mathbb R$. We say that $f$ admits $\ell$ as a limit at $a$, and we write $\lim_{p \to a} f(p) = \ell$, if $$ \forall \varepsilon > 0, \ \exists \alpha > 0, \ \forall p \in E, \quad 0 < \|p - a\| < \alpha \ \Rightarrow \ |f(p) - \ell| < \varepsilon. $$ The punctured inequality $0 < \|p - a\|$ allows us to consider a limit at $a$ even when $a \notin E$ : the typical case is $a = (0, 0)$ and $E = \mathbb R^2 \setminus \{(0, 0)\}$. The hypothesis $a \in \overline{E \setminus \{a\}}$ (an accumulation point of $E$) is required for the limit to be unique: if $a$ were an isolated point of $E$, the punctured condition $0 < \|p - a\| < \alpha$ would be vacuous and every $\ell$ would qualify.

Definition — Continuity at a point and on an open subset

Let $E \subset \mathbb R^2$, $f : E \to \mathbb R$ and $a \in E$. We say that $f$ is continuous at $a$ if $\lim_{p \to a} f(p) = f(a)$, i.e. $$ \forall \varepsilon > 0, \ \exists \alpha > 0, \ \forall p \in E, \quad \|p - a\| < \alpha \ \Rightarrow \ |f(p) - f(a)| < \varepsilon. $$ We say that $f$ is continuous on $E$ if $f$ is continuous at every point of $E$. The set of continuous real-valued functions on $E$ is denoted $\mathcal C(E, \mathbb R)$.

Proposition — Operations on continuous functions

Let $\Omega \subset \mathbb R^2$ be an open subset. Linear combinations, products, quotients (where the denominator does not vanish) of continuous functions $\Omega \to \mathbb R$ are continuous on $\Omega$. The composition with a continuous one-variable function preserves continuity: if $f \in \mathcal C(\Omega, \mathbb R)$ has values in an interval $I \subset \mathbb R$ and $\varphi \in \mathcal C(I, \mathbb R)$, then $\varphi \circ f \in \mathcal C(\Omega, \mathbb R)$.
Proof admitted: the recipes transport verbatim from the one-variable case, replacing $|x - x_0|$ by $\|p - a\|$ in the $\varepsilon$--$\alpha$ statements (cf. Topology of $\mathbb R$ and $\mathbb C$).

Example — Polynomials of two variables are continuous

The two coordinate projections $(x, y) \mapsto x$ and $(x, y) \mapsto y$ are continuous on $\mathbb R^2$ : for the first, $|x - x_0| \le \|(x, y) - (x_0, y_0)\|$, so $\alpha = \varepsilon$ works. By the operations Proposition, every polynomial in $x$ and $y$, e.g. $(x, y) \mapsto x^3 - 2 x y + 5 y^2 - 1$, is continuous on $\mathbb R^2$.

Example — Norm is continuous

The Euclidean norm $(x, y) \mapsto \|(x, y)\| = \sqrt{x^2 + y^2}$ is continuous on $\mathbb R^2$ : it is the composition of the continuous polynomial $(x, y) \mapsto x^2 + y^2$ with the continuous function $t \mapsto \sqrt t$ on $\mathbb R_+$.

Method — Study a limit at $(0\virgule 0)$ via polar coordinates

To study $\lim_{(x, y) \to (0, 0)} f(x, y)$, set $(x, y) = (r \cos \theta, r \sin \theta)$ with $r > 0$. In polar coordinates, "$(x, y) \to (0, 0)$" means "$r \to 0$" with $\theta$ arbitrary. To prove the full limit the convergence must be uniform in $\theta$, or controlled independently of $\theta$. Two cases:

If there exists $\ell \in \mathbb R$ such that $|f(r \cos \theta, r \sin \theta) - \ell| \to 0$ as $r \to 0$ uniformly in $\theta$ (typically because $|f(r \cos \theta, r \sin \theta) - \ell| \le g(r)$ for some function $g$ with $g(r) \to 0$ independent of $\theta$), then the limit at $(0, 0)$ exists and equals $\ell$.
If there exist two fixed angles $\theta_1$ and $\theta_2$ for which the radial limits $\lim_{r \to 0} f(r \cos \theta_1, r \sin \theta_1)$ and $\lim_{r \to 0} f(r \cos \theta_2, r \sin \theta_2)$ exist and are different, then $f$ has no limit at $(0, 0)$.

A radial limit that depends on $\theta$ but tends to the same value for every $\theta$ does not prevent the limit from existing (e.g. $f(r \cos \theta, r \sin \theta) = r \cos \theta$ tends to $0$ for every $\theta$). The decisive test is whether different angles give different radial limits.

Example — A function with no limit at $(0\virgule 0)$

Take $f(x, y) = \dfrac{2 x y}{x^2 + y^2}$ on $\Omega = \mathbb R^2 \setminus \{(0, 0)\}$. In polar coordinates, $f(r \cos \theta, r \sin \theta) = \dfrac{2 \cdot r \cos \theta \cdot r \sin \theta}{r^2} = \sin(2 \theta)$, which is independent of $r$ but depends on $\theta$. Hence $f$ has no limit at $(0, 0)$ : along the line $y = x$ (i.e. $\theta = \pi / 4$), $f$ takes the constant value $1$ ; along the axis $y = 0$ ($\theta = 0$), $f$ takes the constant value $0$.

Scope of this chapter

The study of continuity of a function is not an objective of this chapter. Continuity appears here only because the class $\mathcal C^1$ (next subsection) is defined as "partial derivatives exist and are continuous". We will not develop continuity beyond what is needed for differential calculus.

Skills to practice

Studying limits at $(0\virgule 0)$ via polar coordinates

II Partial derivatives and the class $\mathcal C^1$

II.1 Partial derivatives at a point

A function of one variable has one derivative; a function of two variables has two. The trick is to "freeze" one of the variables and differentiate the resulting one-variable function. There are therefore two partial derivatives, one with respect to $x$ and one with respect to $y$, both of which depend on the point at which they are evaluated. They are denoted with the partial-derivative symbol $\partial$ : $\partial f / \partial x$ and $\partial f / \partial y$.

Definition — Partial derivatives at a point

Let $\Omega \subset \mathbb R^2$ be an open subset, $f : \Omega \to \mathbb R$ a function, and $a = (x_0, y_0) \in \Omega$. The first partial derivative of $f$ at $a$, denoted $\dfrac{\partial f}{\partial x}(x_0, y_0)$ (or $\partial_1 f(a)$), is the derivative at $0$ of the one-variable function $t \mapsto f(x_0 + t, y_0)$, when it exists: $$ \frac{\partial f}{\partial x}(x_0, y_0) \ = \ \lim_{t \to 0} \ \frac{f(x_0 + t, y_0) - f(x_0, y_0)}{t}. $$ Similarly, the second partial derivative of $f$ at $a$, denoted $\dfrac{\partial f}{\partial y}(x_0, y_0)$ (or $\partial_2 f(a)$), is the derivative at $0$ of $t \mapsto f(x_0, y_0 + t)$ : $$ \frac{\partial f}{\partial y}(x_0, y_0) \ = \ \lim_{t \to 0} \ \frac{f(x_0, y_0 + t) - f(x_0, y_0)}{t}. $$

Method — Compute a partial derivative by freezing a variable

To compute $\dfrac{\partial f}{\partial x}(x_0, y_0)$, treat $y$ as a constant equal to $y_0$ and differentiate the resulting expression as a function of $x$ alone, using the one-variable rules (sum, product, quotient, chain rule). Then evaluate at $x = x_0$. Symmetrically for $\dfrac{\partial f}{\partial y}$.

Example — Partial derivatives of $f(x\virgule y) \equal x^y$

Compute the partial derivatives of $f(x, y) = x^y$ on $\Omega = (0, +\infty) \times \mathbb R$ (the domain where $x^y$ is real-valued).

Answer

Freezing $y$, the map $x \mapsto x^y = e^{y \ln x}$ is differentiable on $(0, +\infty)$ with derivative $y x^{y-1}$. Freezing $x$, the map $y \mapsto x^y = e^{y \ln x}$ is differentiable on $\mathbb R$ with derivative $x^y \ln x$. Hence $$ \frac{\partial f}{\partial x}(x, y) \ = \ y \, x^{y - 1}, \qquad \frac{\partial f}{\partial y}(x, y) \ = \ x^y \ln x. $$

Example — Partial derivatives of $f(x\virgule y) \equal \mathrm{Arctan}(y / x)$

Take $f(x, y) = \mathrm{Arctan}(y / x)$ on $\Omega = \{(x, y) \in \mathbb R^2 \mid x \ne 0\}$. Freezing $y$, the chain rule gives $\partial_x f = \dfrac{1}{1 + (y/x)^2} \cdot \dfrac{-y}{x^2}$ ; multiplying numerator and denominator by $x^2$ : $$ \frac{\partial f}{\partial x}(x, y) \ = \ \frac{-y}{x^2 + y^2}, \qquad \frac{\partial f}{\partial y}(x, y) \ = \ \frac{1}{1 + (y/x)^2} \cdot \frac{1}{x} \ = \ \frac{x}{x^2 + y^2}. $$

Example — Partial derivatives of $f(x\virgule y) \equal e^{x y^2}$

Take $f(x, y) = e^{x y^2}$ on $\Omega = \mathbb R^2$. Freezing $y$, $\partial_x f = y^2 e^{x y^2}$ (chain rule with inner function $x \mapsto x y^2$, of derivative $y^2$). Freezing $x$, $\partial_y f = 2 x y \, e^{x y^2}$.

Example — Partial derivatives of $f(x\virgule y) \equal x^3 - 3 x y^2$

Take $f(x, y) = x^3 - 3 x y^2$ on $\Omega = \mathbb R^2$. The two partial derivatives are $$ \frac{\partial f}{\partial x}(x, y) \ = \ 3 x^2 - 3 y^2, \qquad \frac{\partial f}{\partial y}(x, y) \ = \ - 6 x y. $$ The symmetry between $x$ and $y$ in the original expression is broken by the cubic term: $\partial_x f$ contains a $-3 y^2$ but $\partial_y f$ has no $x^2$. We will reuse the gradient $\nabla f = (3 x^2 - 3 y^2, -6 x y)$ later when discussing the geometry of level lines.

Existence of partial derivatives does not imply continuity

The existence of partial derivatives does not imply continuity. The standard counter-example is the function defined on $\mathbb R^2$ by $f(x, y) = x y / (x^2 + y^2)$ for $(x, y) \ne (0, 0)$ and $f(0, 0) = 0$. At $(0, 0)$, freezing $y = 0$ gives $f(x, 0) = 0$, so $\partial_x f(0, 0) = 0$ ; similarly $\partial_y f(0, 0) = 0$ ; yet $f$ is not continuous at $(0, 0)$ (by the polar Method, $f = \sin(2\theta) / 2$ for $r > 0$, which does not tend to $0$). This is why the class $\mathcal C^1$, which we define next, asks not only for the partial derivatives to exist but also to be continuous.

Skills to practice

Computing partial derivatives

II.2 The class $\mathcal C^1$

The right regularity hypothesis for the whole chapter is not "partial derivatives exist" (which is too weak, as the Remark above showed), but "partial derivatives exist and are continuous". This is the class $\mathcal C^1$. It is the natural domain of the first-order expansion introduced in the First-order expansion and tangent plane subsection below, and of every chain-rule statement of the next section.

Definition — Function of class $\mathcal C^1$

Let $\Omega \subset \mathbb R^2$ be an open subset and $f : \Omega \to \mathbb R$ a function. We say that $f$ is of class $\mathcal C^1$ on $\Omega$ if both partial derivatives $\partial f / \partial x$ and $\partial f / \partial y$ exist at every point of $\Omega$ and define continuous functions $\Omega \to \mathbb R$. The set of $\mathcal C^1$ functions $\Omega \to \mathbb R$ is denoted $\mathcal C^1(\Omega, \mathbb R)$.

Example — Polynomials in two variables are $\mathcal C^1$

Every polynomial in $x$ and $y$ is of class $\mathcal C^1$ on $\mathbb R^2$ : its partial derivatives are themselves polynomials, hence continuous on $\mathbb R^2$. For instance, $f(x, y) = x^3 - 2 x y + 5 y^2 - 1$ has $\partial_x f = 3 x^2 - 2 y$ and $\partial_y f = -2 x + 10 y$, both continuous on $\mathbb R^2$.

Example — Norm is $\mathcal C^1$ on the punctured plane$\virgule$ not at the origin

The Euclidean norm $f(x, y) = \sqrt{x^2 + y^2}$ is $\mathcal C^1$ on $\Omega = \mathbb R^2 \setminus \{(0, 0)\}$ : its partial derivatives are $\partial_x f = x / \sqrt{x^2 + y^2}$ and $\partial_y f = y / \sqrt{x^2 + y^2}$, both continuous on $\Omega$. At $(0, 0)$, however, freezing $y = 0$ gives $f(x, 0) = |x|$, which is not differentiable at $0$ ; the same on the $y$-axis. So $\partial_x f(0, 0)$ and $\partial_y f(0, 0)$ do not exist, and $f$ is not $\mathcal C^1$ at the origin.

Skills to practice

Determining the $\mathcal C^1$ domain of a function

II.3 Operations on partial derivatives

Once partial derivatives are computed by freezing a variable, the usual one-variable rules (linearity, product, quotient, chain rule) transport verbatim. We state them in one Theorem for both indices simultaneously. The proof is short --- it reduces to applying the one-variable rules to $t \mapsto f(x_0 + t, y_0)$ --- and is not required at this level, so we admit it.

Theorem — Operations on partial derivatives

Let $\Omega \subset \mathbb R^2$ be an open subset, $f, g \in \mathcal C^1(\Omega, \mathbb R)$ and $\lambda, \mu \in \mathbb R$. Then $\lambda f + \mu g$, $f g$ are of class $\mathcal C^1$ on $\Omega$, and so is $1 / f$ on $\Omega \cap \{f \ne 0\}$. For every $i \in \{1, 2\}$ : $$ \partial_i (\lambda f + \mu g) \ = \ \lambda \, \partial_i f + \mu \, \partial_i g, \qquad \partial_i (f g) \ = \ g \, \partial_i f + f \, \partial_i g, \qquad \partial_i \! \left( \frac{1}{f} \right) \ = \ - \frac{\partial_i f}{f^2}. $$ Moreover, for every open interval $I \subset \mathbb R$ with $f(\Omega) \subset I$ and every $\varphi \in \mathcal C^1(I, \mathbb R)$, the composite $\varphi \circ f$ is of class $\mathcal C^1$ on $\Omega$ with $\partial_i (\varphi \circ f) = \partial_i f \times (\varphi' \circ f)$.
Proof admitted: each formula follows from the corresponding one-variable rule applied to the function obtained by freezing the other variable; continuity of the partial derivatives then follows from the Proposition on operations on continuous functions from the Limit and continuity subsection.

Example — $\partial_x e^{x y^2}$ revisited via composition

The composition rule of the operations Theorem applied to $f(x, y) = x y^2$ (a $\mathcal C^1$ polynomial on $\mathbb R^2$) and $\varphi(t) = e^t$ (a $\mathcal C^1$ function on $\mathbb R$) gives directly: $$ \partial_x (\varphi \circ f) \ = \ (\partial_x f) \times (\varphi' \circ f) \ = \ y^2 \times e^{x y^2}, $$ recovering the result of Example $f(x, y) = e^{x y^2}$ above without re-doing the chain-rule computation from scratch.

Method — Build the $\mathcal C^1$ class of a composite function

To show that a function built by elementary operations is $\mathcal C^1$ on a domain $\Omega$ :

identify the elementary building blocks (polynomials in two variables, the norm, $\sqrt{\cdot}$ on $(0, +\infty)$, $\exp$, $\ln$, $\sin$, $\cos$, $\mathrm{Arctan}$, etc.) and the domains on which each is $\mathcal C^1$ ;
check that the composition is well-defined throughout $\Omega$ (i.e. each block has its inner function landing in its domain --- in particular, the inner function of a $\sqrt{\cdot}$ must be strictly positive);
conclude by the operations Theorem (linearity, product, quotient, composition with a one-variable $\mathcal C^1$ function).

Skills to practice

Applying the operation rules for partial derivatives

II.4 First-order expansion and tangent plane

Here is the central result of the chapter. For a $\mathcal C^1$ function of two variables, the variation $f(a + v) - f(a)$ is, to first order, a linear function of the increment $v$ : the linear combination $\partial_x f(a) \cdot h + \partial_y f(a) \cdot k$, with a remainder that is negligible compared to $\|v\|$. This is the analogue of the one-variable order-$1$ Taylor expansion, with the gradient replacing the derivative. The proof of this Theorem is outside the syllabus at this level: we state it, comment briefly on the geometric meaning, and then admit it. Every chain rule of the next section will be a direct consequence.

Theorem — DL$_1$ for $\mathcal C^1$ functions

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$ and $a = (x_0, y_0) \in \Omega$. Then $f$ admits an order-$1$ expansion at $a$ : as $(h, k) \to (0, 0)$, $$ f(x_0 + h, y_0 + k) \ = \ f(x_0, y_0) + \frac{\partial f}{\partial x}(x_0, y_0) \, h + \frac{\partial f}{\partial y}(x_0, y_0) \, k + o\bigl(\|(h, k)\|\bigr). $$ Equivalently, in vector form: for $v = (h, k)$, $$ f(a + v) \ = \ f(a) + \frac{\partial f}{\partial x}(a) \, h + \frac{\partial f}{\partial y}(a) \, k + o(\|v\|) \quad \text{as } v \to (0, 0). $$

Proof is out of scope; differential is too

We admit this Theorem (proof outside the syllabus at this level). For a complementary explanation, the idea is to integrate the partial derivatives along the two-segment path $(x_0, y_0) \to (x_0 + h, y_0) \to (x_0 + h, y_0 + k)$, then use the continuity of $\partial_x f$ and $\partial_y f$ at $a$ to bound the discrepancy. The more general notion of differentiable function (and the associated differential $df_a$, which would be a linear form $\mathbb R^2 \to \mathbb R$) is also outside the syllabus at this level; it will be introduced in second year. We use only partial derivatives, the DL$_1$ above, and the gradient.

Definition — Tangent plane

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$ and $a = (x_0, y_0) \in \Omega$. Set $z_0 = f(x_0, y_0)$. The tangent plane to the graph of $f$ at $a$ is the plane in $\mathbb R^3$ of equation $$ z - z_0 \ = \ \frac{\partial f}{\partial x}(x_0, y_0) \, (x - x_0) + \frac{\partial f}{\partial y}(x_0, y_0) \, (y - y_0). $$ By the DL$_1$ Theorem, this is the unique plane in $\mathbb R^3$ that approximates the surface $z = f(x, y)$ to first order near $(x_0, y_0, z_0)$.

Example — Tangent plane of $f(x\virgule y) \equal \sin(x + 2y)$ at $(0\virgule 0)$

Compute the tangent plane to the graph of $f(x, y) = \sin(x + 2 y)$ at the point $(0, 0)$.

Answer

The function $f$ is $\mathcal C^1$ on $\mathbb R^2$ (composition of the $\mathcal C^1$ polynomial $(x, y) \mapsto x + 2 y$ with the $\mathcal C^1$ function $\sin$). Its partial derivatives are $\partial_x f(x, y) = \cos(x + 2 y)$ and $\partial_y f(x, y) = 2 \cos(x + 2 y)$. At $(0, 0)$, $f(0, 0) = 0$, $\partial_x f(0, 0) = 1$ and $\partial_y f(0, 0) = 2$. The tangent plane has equation $$ z - 0 \ = \ 1 \cdot (x - 0) + 2 \cdot (y - 0), \qquad \text{i.e.} \qquad z \ = \ x + 2 y. $$

Example — Tangent plane of $f(x\virgule y) \equal x^2 + y^2$ at $(1\virgule 1)$

Compute the tangent plane to the paraboloid $z = x^2 + y^2$ at the point $(1, 1)$.

Answer

The function $f(x, y) = x^2 + y^2$ is a $\mathcal C^1$ polynomial on $\mathbb R^2$. At $(1, 1)$, $f(1, 1) = 2$, $\partial_x f(1, 1) = 2$, $\partial_y f(1, 1) = 2$. The tangent plane has equation $$ z - 2 \ = \ 2 (x - 1) + 2 (y - 1), \qquad \text{i.e.} \qquad z \ = \ 2 x + 2 y - 2. $$

Skills to practice

Computing tangent planes and normal vectors

II.5 Gradient

The two partial derivatives are often the answer to the same kind of question, applied to the two coordinate directions. It is convenient to package them into a single vector --- the gradient --- so that the DL$_1$ can be written compactly via the Euclidean inner product, and so that every geometric statement of the next section (steepest ascent, orthogonality to level lines, chain rule) can be cast as a one-line gradient identity.

Definition — Gradient

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$ and $a \in \Omega$. The gradient of $f$ at $a$ is the vector of $\mathbb R^2$ $$ \nabla f(a) \ = \ \bigl( \, \partial_x f(a), \, \partial_y f(a) \, \bigr). $$ The symbol $\nabla$ is read "nabla". Using $\nabla f$, the first-order expansion Theorem rewrites as $$ f(a + v) \ = \ f(a) + \langle \nabla f(a), v \rangle + o(\|v\|), \qquad v \to (0, 0), $$ where $\langle \cdot, \cdot \rangle$ is the canonical inner product on $\mathbb R^2$.

Example — Gradient of $f(x\virgule y) \equal x^2 + y^2$

For $f(x, y) = x^2 + y^2$ on $\mathbb R^2$, $\partial_x f = 2 x$ and $\partial_y f = 2 y$, hence $\nabla f(x, y) = (2 x, 2 y) = 2 (x, y)$. At every point $a \ne (0, 0)$, the gradient is the radial vector $2 a$. At the origin, $\nabla f(0, 0) = (0, 0)$ : the origin is a critical point (cf. the Extrema section).

Method — Compute the gradient

Compute the two partial derivatives by freezing each variable in turn (Method of the Partial derivatives at a point subsection), then assemble them into the gradient vector $\nabla f(a) = (\partial_x f(a), \partial_y f(a))$. The order of the components matters: the first is the partial with respect to $x$, the second with respect to $y$.

Skills to practice

Computing a gradient

III Composition and the chain rule

III.1 Directional derivative

A partial derivative measures the slope of the surface $z = f(x, y)$ in one of the two coordinate directions. But why limit ourselves to those two directions? Given any vector $v \in \mathbb R^2$, we can move from $a$ along the segment $a + t v$ and measure the rate of change along it. This is the directional derivative. For arbitrary $v$, $\mathrm D_v f(a)$ is the rate of change along $t \mapsto a + t v$; the geometric slope in the direction of $v \ne (0, 0)$ is $\mathrm D_{v / \|v\|} f(a)$ (it scales with $\|v\|$ otherwise). For a $\mathcal C^1$ function, the DL$_1$ tells us that this slope is a simple linear combination of the two partial derivatives --- in fact, the inner product of the gradient with $v$.

Definition — Directional derivative

Let $\Omega \subset \mathbb R^2$ be an open subset, $f : \Omega \to \mathbb R$, $a \in \Omega$ and a direction vector $v \in \mathbb R^2$. We say that $f$ admits a directional derivative at $a$ in the direction vector $v$ if the one-variable function $t \mapsto f(a + t v)$ (defined on a neighborhood of $0$ since $\Omega$ is open) is differentiable at $0$. In that case, the directional derivative of $f$ at $a$ in the direction vector $v$ is the real number $$ \mathrm D_v f(a) \ = \ \lim_{t \to 0} \ \frac{f(a + t v) - f(a)}{t}. $$ No normalization is imposed here: $v$ is an arbitrary vector of $\mathbb R^2$, not necessarily of unit norm. For $v = e_1 = (1, 0)$, $\mathrm D_{e_1} f(a) = \partial_x f(a)$ ; for $v = e_2 = (0, 1)$, $\mathrm D_{e_2} f(a) = \partial_y f(a)$. The partial derivatives are the directional derivatives in the two coordinate directions.

Theorem — Directional derivative formula

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$, $a \in \Omega$ and $v = (h, k) \in \mathbb R^2$. Then $f$ admits a directional derivative at $a$ in the direction $v$ and $$ \mathrm D_v f(a) \ = \ \langle \nabla f(a), v \rangle \ = \ \frac{\partial f}{\partial x}(a) \, h + \frac{\partial f}{\partial y}(a) \, k. $$

Proof

If $v = (0, 0)$, then $f(a + t v) = f(a)$ for every $t$, so $\bigl(f(a + tv) - f(a)\bigr) / t = 0$ for $t \ne 0$ and $\mathrm D_v f(a) = 0 = \langle \nabla f(a), v\rangle$.
Assume now $v \ne (0, 0)$. Apply the DL$_1$ Theorem of the First-order expansion and tangent plane subsection at $a$ with increment $t v$ : as $t \to 0$, $t v \to (0, 0)$, hence $$ f(a + t v) \ = \ f(a) + \langle \nabla f(a), t v \rangle + o(\|t v\|) \ = \ f(a) + t \, \langle \nabla f(a), v \rangle + o(|t|), $$ since $\|t v\| = |t| \cdot \|v\|$ and $\|v\| > 0$ is a constant. Subtracting $f(a)$ and dividing by $t \ne 0$ gives $\bigl(f(a + t v) - f(a)\bigr) / t \to \langle \nabla f(a), v \rangle$ as $t \to 0$.

Method — Compute a directional derivative

For $f \in \mathcal C^1$, compute the gradient $\nabla f(a)$ (Method of the Gradient subsection), then take its inner product with the direction vector $v = (h, k)$ : $\mathrm D_v f(a) = \partial_x f(a) h + \partial_y f(a) k$. No limit to compute directly.

Example — Directional derivative of $x^2 + y^2$ at $(1\virgule 2)$ in the direction vector $(1\virgule 1)$

Compute the directional derivative $\mathrm D_{(1, 1)} f$ at $a = (1, 2)$ for $f(x, y) = x^2 + y^2$, in the direction vector $(1, 1)$ (no normalization).

Answer

We have $\nabla f(x, y) = (2 x, 2 y)$ ; at $a = (1, 2)$, $\nabla f(a) = (2, 4)$. Hence, for the direction vector $(1, 1)$, $$ \mathrm D_{(1, 1)} f(a) \ = \ \langle (2, 4), (1, 1) \rangle \ = \ 2 \cdot 1 + 4 \cdot 1 \ = \ 6. $$

Skills to practice

Computing a directional derivative

III.2 Chain rule along a parameterized arc

Now we let the "direction" itself vary in time. Pick a curve $\gamma(t) = (x(t), y(t))$ in $\Omega$, where $x$ and $y$ are $\mathcal C^1$ functions of $t$. Composing with $f$ gives a one-variable function $F(t) = f(x(t), y(t))$, and its derivative is given by the chain rule: the dot product of the gradient of $f$ at $\gamma(t)$ with the velocity vector $\gamma'(t) = (x'(t), y'(t))$. This is the most important computational result of the chapter; its proof is required at this level.

Theorem — Chain rule along a parameterized arc

Let $\Omega \subset \mathbb R^2$ be an open subset, $I \subset \mathbb R$ an open interval, $f \in \mathcal C^1(\Omega, \mathbb R)$, and $x, y \in \mathcal C^1(I, \mathbb R)$ such that $(x(t), y(t)) \in \Omega$ for every $t \in I$. Then the function $F : I \to \mathbb R$ defined by $F(t) = f(x(t), y(t))$ is of class $\mathcal C^1$ on $I$ and for every $t \in I$: $$ F'(t) \ = \ x'(t) \, \frac{\partial f}{\partial x}\bigl(x(t), y(t)\bigr) + y'(t) \, \frac{\partial f}{\partial y}\bigl(x(t), y(t)\bigr) \ = \ \bigl\langle \nabla f(\gamma(t)), \gamma'(t) \bigr\rangle, $$ where $\gamma(t) = (x(t), y(t))$ and $\gamma'(t) = (x'(t), y'(t))$.

Proof

Fix $t \in I$. By the DL$_1$ of $f$ at $\gamma(t) \in \Omega$ (Theorem of the First-order expansion and tangent plane subsection), for any increment $w \in \mathbb R^2$ small enough to keep $\gamma(t) + w$ in $\Omega$ : $$ f(\gamma(t) + w) \ = \ f(\gamma(t)) + \bigl\langle \nabla f(\gamma(t)), w \bigr\rangle + o(\|w\|). $$ Choose $w = \gamma(t + h) - \gamma(t)$ ; since $\gamma \in \mathcal C^1$, $\gamma(t + h) = \gamma(t) + h \gamma'(t) + o(\lvert h\rvert)$ by the one-variable DL$_1$ applied componentwise to $x$ and $y$. So $w = h \gamma'(t) + o(\lvert h\rvert)$, and $\|w\| = O(\lvert h\rvert)$, hence $o(\|w\|) = o(\lvert w\rvert) = o(\lvert h\rvert)$. Substituting: $$ F(t + h) - F(t) \ = \ f(\gamma(t + h)) - f(\gamma(t)) \ = \ \bigl\langle \nabla f(\gamma(t)), h \gamma'(t) + o(\lvert h\rvert) \bigr\rangle + o(\lvert h\rvert) \ = \ h \, \bigl\langle \nabla f(\gamma(t)), \gamma'(t) \bigr\rangle + o(\lvert h\rvert). $$ Dividing by $h \ne 0$ and letting $h \to 0$ yields $F'(t) = \langle \nabla f(\gamma(t)), \gamma'(t) \rangle$, which is the announced formula. As $\nabla f \circ \gamma$ and $\gamma'$ are continuous, $F'$ is continuous, so $F \in \mathcal C^1(I, \mathbb R)$.

Method — Differentiate a composite $f(x(t)\virgule y(t))$

Identify the "outer" function $f$ of two variables and the two "inner" functions $x$ and $y$ of one variable. Check: $f \in \mathcal C^1(\Omega, \mathbb R)$ and $x, y \in \mathcal C^1(I, \mathbb R)$, with $(x(t), y(t)) \in \Omega$ for $t \in I$. Then apply the chain rule formula: $F'(t) = x'(t) \partial_x f(x(t), y(t)) + y'(t) \partial_y f(x(t), y(t))$. The crucial point is that the partial derivatives of $f$ are evaluated at the moving point $(x(t), y(t))$, not at a fixed point.

Example — $F(t) \equal f(t^2\virgule \sin t)$

Let $f \in \mathcal C^1(\mathbb R^2, \mathbb R)$ and $F : \mathbb R \to \mathbb R$ defined by $F(t) = f(t^2, \sin t)$. Compute $F'(t)$.

Answer

Set $x(t) = t^2$ and $y(t) = \sin t$ ; both are $\mathcal C^1$ on $\mathbb R$, with $x'(t) = 2 t$ and $y'(t) = \cos t$. The composite $(x(t), y(t)) = (t^2, \sin t)$ stays in $\Omega = \mathbb R^2$. By the chain rule: $$ F'(t) \ = \ 2 t \, \frac{\partial f}{\partial x}(t^2, \sin t) + \cos t \, \frac{\partial f}{\partial y}(t^2, \sin t). $$

Skills to practice

Differentiating composites along an arc

III.3 Gradient and steepest ascent

Among all unit directions $v \in \mathbb R^2$, in which direction is the slope of the surface steepest? The directional derivative formula $\mathrm D_v f(a) = \langle \nabla f(a), v\rangle$ combined with Cauchy-Schwarz gives an immediate answer: when the gradient is non-zero, the direction of steepest ascent is the gradient itself (normalized to unit length), and the corresponding slope is the norm of the gradient. When the gradient vanishes, all directions give the same slope $0$ : there is no distinguished direction of ascent.

Theorem — Steepest-ascent direction

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$ and $a \in \Omega$.

If $\nabla f(a) \ne (0, 0)$, then among unit vectors $v \in \mathbb R^2$, the directional derivative $\mathrm D_v f(a)$ attains its maximum value $\|\nabla f(a)\|$ exactly for $v = \nabla f(a) / \|\nabla f(a)\|$, and its minimum value $-\|\nabla f(a)\|$ exactly for $v = -\nabla f(a) / \|\nabla f(a)\|$.
If $\nabla f(a) = (0, 0)$, then $\mathrm D_v f(a) = 0$ for every unit vector $v$ : every unit direction gives the same value $0$; hence there is no unique/preferred direction of steepest ascent.

Proof

Assume $\nabla f(a) \ne (0, 0)$. By the directional derivative formula and Cauchy-Schwarz (see Pre-Hilbert real spaces), for every unit vector $v$ : $$ \mathrm D_v f(a) \ = \ \langle \nabla f(a), v \rangle \ \le \ \|\nabla f(a)\| \cdot \|v\| \ = \ \|\nabla f(a)\|. $$ The case of equality in Cauchy-Schwarz holds exactly when $v$ has the same direction as $\nabla f(a)$ ; combined with $\|v\| = 1$, this gives $v = \nabla f(a) / \|\nabla f(a)\|$. Symmetrically, the minimum $-\|\nabla f(a)\|$ is attained for $v = -\nabla f(a) / \|\nabla f(a)\|$.
Assume $\nabla f(a) = (0, 0)$. Then for every $v$, $\mathrm D_v f(a) = \langle (0, 0), v \rangle = 0$.

Method — Find the direction of steepest ascent

Compute the gradient $\nabla f(a)$. If $\nabla f(a) \ne (0, 0)$, normalize: the direction of steepest ascent is $v = \nabla f(a) / \|\nabla f(a)\|$, and the corresponding (maximal) slope is $\|\nabla f(a)\|$. If $\nabla f(a) = (0, 0)$, there is no distinguished steepest-ascent direction --- every direction has slope $0$ ; $a$ is a critical point (cf. the Extrema section).

Skills to practice

Finding the direction of steepest ascent

III.4 Gradient orthogonal to level lines

At a point $a$ on the level line $\mathscr C_\lambda$, the value of $f$ does not change as we move along the line. The chain rule then says: the rate of change of $f$ along the line is $\langle \nabla f, \gamma'\rangle$, and this rate is zero. Geometrically, the gradient is therefore orthogonal to the direction of the level line. This is the geometric counterpart of the steepest-ascent result: at a regular point where $\nabla f(a) \ne (0, 0)$, the gradient points perpendicularly off the level line, towards higher values.

Theorem — Gradient orthogonal to level lines

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$ and $\lambda \in \mathbb R$. Let $I \subset \mathbb R$ be an open interval and $\gamma : I \to \Omega$ a $\mathcal C^1$ arc tracing the level line $\mathscr C_\lambda$, i.e. $f(\gamma(t)) = \lambda$ for every $t \in I$. Then for every $t \in I$ : $$ \bigl\langle \nabla f(\gamma(t)), \gamma'(t) \bigr\rangle \ = \ 0. $$

Proof

The composite $F(t) = f(\gamma(t))$ is constant equal to $\lambda$ on $I$. By the chain rule along a parameterized arc (Theorem of the Chain rule along a parameterized arc subsection), $F$ is of class $\mathcal C^1$ on $I$ and $F'(t) = \langle \nabla f(\gamma(t)), \gamma'(t)\rangle$. Since $F$ is constant, $F'(t) = 0$ for every $t \in I$, hence $\langle \nabla f(\gamma(t)), \gamma'(t)\rangle = 0$.

Regularity caveat: when does "normal direction" make sense?

The Theorem above is the algebraic identity $\langle \nabla f, \gamma'\rangle = 0$, which is always true regardless of degeneracies. To re-read it geometrically as "the gradient is normal to the level line", one first needs the level set to be locally a regular curve --- which is guaranteed when $\nabla f \ne (0, 0)$ --- and then two non-degeneracy conditions are needed:

$\gamma'(t) \ne (0, 0)$, so that the arc actually defines a tangent direction at $\gamma(t)$ ;
$\nabla f(\gamma(t)) \ne (0, 0)$, so that the gradient is a genuine non-zero vector.

At a critical point ($\nabla f = (0, 0)$), or where the arc degenerates ($\gamma' = (0, 0)$), the geometric statement "gradient normal to the level line" loses its meaning, even though the identity continues to hold trivially.

Method — Use the gradient to find the tangent to a level line

At a regular point $a$ on a level line $\mathscr C_\lambda$ (i.e. $\nabla f(a) \ne (0, 0)$), the tangent direction to $\mathscr C_\lambda$ is the orthogonal of $\nabla f(a)$ : if $\nabla f(a) = (A, B)$, the tangent line to $\mathscr C_\lambda$ at $a = (x_0, y_0)$ has equation $A (x - x_0) + B (y - y_0) = 0$. Equivalently, a tangent vector is $(-B, A)$. The gradient points across the level line, towards higher values of $f$.

Example — Gradient of $x^2 + y^2$ at $(1\virgule 1)$$\virgule$ normal to the circle

Verify that for $f(x, y) = x^2 + y^2$ on $\mathbb R^2$ and the point $a = (1, 1)$, the gradient $\nabla f(a)$ is normal to the level line of $f$ passing through $a$.

Answer

We have $\nabla f(x, y) = (2 x, 2 y)$. At $a = (1, 1)$ : $f(a) = 2$, so $a$ lies on the level line $\mathscr C_2 = \{(x, y) \mid x^2 + y^2 = 2\}$, which is the circle of center $(0, 0)$ and radius $\sqrt 2$. A parameterization of this circle near $a$ is $\gamma(t) = (\sqrt 2 \cos t, \sqrt 2 \sin t)$ ; the point $a = (1, 1)$ corresponds to $t = \pi / 4$. The velocity $\gamma'(\pi / 4) = (-\sqrt 2 \sin(\pi / 4), \sqrt 2 \cos(\pi/4)) = (-1, 1)$. The gradient is $\nabla f(a) = (2, 2)$. Their inner product is $\langle (2, 2), (-1, 1)\rangle = -2 + 2 = 0$, confirming orthogonality. Geometrically, $\nabla f(a) = (2, 2)$ is the radial vector from the origin to $a$, scaled by $2$ : it is indeed normal to the circle at $a$, and points outward (towards larger circles, i.e. larger values of $f$).
Below, the four level lines $\lambda = 1, 2, 3, 4$ together with the gradient at $(1, 1)$ (radial arrow, pointing outward) and the tangent direction at $(1, 1)$ (dotted).

Skills to practice

Computing the tangent line to a level curve

III.5 Chain rule for a change of variables

The chain rule of the Chain rule along a parameterized arc subsection dealt with composing $f$ with a one-variable arc. The same principle extends to composing $f$ with a change of variables $(u, v) \mapsto (x(u, v), y(u, v))$ : the result $F(u, v) = f(x(u, v), y(u, v))$ is then a new function of two variables, and its partial derivatives are linear combinations of the partial derivatives of $f$ (evaluated at the image point), weighted by the partial derivatives of $x$ and $y$ (evaluated at $(u, v)$). This is the indispensable tool for solving partial differential equations by a clever substitution (typically, polar coordinates).

Theorem — Chain rule for a change of variables

Let $U$ and $\Omega$ be open subsets of $\mathbb R^2$, $x, y \in \mathcal C^1(U, \mathbb R)$ such that $(x(u, v), y(u, v)) \in \Omega$ for every $(u, v) \in U$, and $f \in \mathcal C^1(\Omega, \mathbb R)$. Then the function $F : U \to \mathbb R$ defined by $F(u, v) = f(x(u, v), y(u, v))$ is of class $\mathcal C^1$ on $U$ and for every $(u, v) \in U$ : $$ \frac{\partial F}{\partial u}(u, v) \ = \ \frac{\partial x}{\partial u}(u, v) \, \frac{\partial f}{\partial x}\bigl(x(u, v), y(u, v)\bigr) + \frac{\partial y}{\partial u}(u, v) \, \frac{\partial f}{\partial y}\bigl(x(u, v), y(u, v)\bigr), $$ $$ \frac{\partial F}{\partial v}(u, v) \ = \ \frac{\partial x}{\partial v}(u, v) \, \frac{\partial f}{\partial x}\bigl(x(u, v), y(u, v)\bigr) + \frac{\partial y}{\partial v}(u, v) \, \frac{\partial f}{\partial y}\bigl(x(u, v), y(u, v)\bigr). $$ The evaluation points are crucial: partial derivatives of $f$ are taken at the image point $(x(u, v), y(u, v))$, while partial derivatives of $x$ and $y$ are taken at $(u, v)$.

Proof

Fix $(u_0, v_0) \in U$. Since $U$ is open, there exists an open interval $J \subset \mathbb R$ containing $u_0$ such that $(u, v_0) \in U$ for every $u \in J$. Consider the partial function $\Phi_{v_0} : u \mapsto F(u, v_0) = f(x(u, v_0), y(u, v_0))$ on $J$. This is the composite of $f$ with the $\mathcal C^1$ arc $\gamma_{v_0} : J \to \Omega$ defined by $\gamma_{v_0}(u) = (x(u, v_0), y(u, v_0))$, whose velocity is $\gamma_{v_0}'(u) = (\partial_u x(u, v_0), \partial_u y(u, v_0))$. Applying the chain rule along a parameterized arc (Theorem of the Chain rule along a parameterized arc subsection) at $u = u_0$ : $$ \frac{\partial F}{\partial u}(u_0, v_0) \ = \ \frac{\partial x}{\partial u}(u_0, v_0) \, \frac{\partial f}{\partial x}(\gamma_{v_0}(u_0)) + \frac{\partial y}{\partial u}(u_0, v_0) \, \frac{\partial f}{\partial y}(\gamma_{v_0}(u_0)), $$ which is the announced formula evaluated at $(u_0, v_0)$. Symmetrically, fixing $u = u_0$ and considering the arc $v \mapsto (x(u_0, v), y(u_0, v))$ on an open interval containing $v_0$ gives the formula for $\partial F / \partial v$ at $(u_0, v_0)$. As $(u_0, v_0)$ ranges over $U$, the formulas hold throughout $U$. Since $(x, y) : U \to \Omega$ is continuous and $\partial_x f, \partial_y f$ are continuous on $\Omega$, the composites $\partial_x f \circ (x, y)$ and $\partial_y f \circ (x, y)$ are continuous. The continuity of $\partial F / \partial u$ and $\partial F / \partial v$ on $U$ then follows from the operations Proposition applied to the continuous components ($\partial_u x$, $\partial_u y$, $\partial_x f \circ (x, y)$, $\partial_y f \circ (x, y)$, etc.). Hence $F \in \mathcal C^1(U, \mathbb R)$.

Method — Change of variables in a PDE

To solve a PDE on $\Omega$ for an unknown $f \in \mathcal C^1(\Omega, \mathbb R)$ by a change of variables $(u, v) \mapsto (x, y) = (x(u, v), y(u, v))$ :

introduce $F(u, v) = f(x(u, v), y(u, v))$ ;
compute $\partial_u F$, $\partial_v F$ using the change-of-variables chain rule;
re-express the PDE in $\Omega$ as a PDE on $F$ in the new variables $(u, v)$ ;
solve in $(u, v)$ ;
come back to $f$ by expressing the new variables in terms of the old ones, at least on the domain under consideration (a global inverse is not always available --- polar coordinates, for instance, are only locally injective).

The change of variables is well-chosen when the transformed PDE is simpler (e.g. becomes an ODE in $u$ with $v$ as a parameter, or annihilates one term).

Example — Polar substitution

Let $\Omega_{\mathrm{pol}} = (0, +\infty) \times \mathbb R$, and consider the change of variables $(r, \theta) \mapsto (x, y) = (r \cos \theta, r \sin \theta)$ from $\Omega_{\mathrm{pol}}$ to $\Omega_{\mathrm{cart}} = \mathbb R^2 \setminus \{(0, 0)\}$. For $f \in \mathcal C^1(\Omega_{\mathrm{cart}}, \mathbb R)$, set $F(r, \theta) = f(r \cos \theta, r \sin \theta)$. Verify that $$ r \, \frac{\partial F}{\partial r}(r, \theta) \ = \ x \, \frac{\partial f}{\partial x}(x, y) + y \, \frac{\partial f}{\partial y}(x, y), \qquad \frac{\partial F}{\partial \theta}(r, \theta) \ = \ - y \, \frac{\partial f}{\partial x}(x, y) + x \, \frac{\partial f}{\partial y}(x, y), $$ where $(x, y) = (r \cos \theta, r \sin \theta)$.

Answer

The change of variables maps $(r, \theta) \in \Omega_{\mathrm{pol}}$ to $(x, y) = (r \cos \theta, r \sin \theta)$, which is in $\Omega_{\mathrm{cart}}$ since $r > 0$. Both $x$ and $y$ are $\mathcal C^1$ on $\Omega_{\mathrm{pol}}$ with $\partial_r x = \cos \theta$, $\partial_\theta x = -r \sin \theta$, $\partial_r y = \sin \theta$, $\partial_\theta y = r \cos \theta$. By the change-of-variables chain rule: $$ \begin{aligned} \frac{\partial F}{\partial r}(r, \theta) &= \cos \theta \cdot \frac{\partial f}{\partial x}(x, y) + \sin \theta \cdot \frac{\partial f}{\partial y}(x, y) \\ \frac{\partial F}{\partial \theta}(r, \theta) &= - r \sin \theta \cdot \frac{\partial f}{\partial x}(x, y) + r \cos \theta \cdot \frac{\partial f}{\partial y}(x, y). \end{aligned} $$ Multiplying the first equation by $r$ and using $r \cos \theta = x$, $r \sin \theta = y$ : $r \partial_r F = x \partial_x f + y \partial_y f$. The second equation directly reads $\partial_\theta F = -y \partial_x f + x \partial_y f$. Both formulas hold for $(x, y) = (r \cos \theta, r \sin \theta)$.

Skills to practice

Differentiating composites with two-variable inner functions
Solving a PDE by passing to polar coordinates

IV Extrema

IV.1 Local extrema: definitions

The notions of local maximum and local minimum transport from one variable to two without any conceptual change: at a local maximum, the value of $f$ at $a$ is larger than (or equal to) every nearby value; symmetrically for a minimum. The only adjustment is that "nearby" now means "inside some open ball around $a$", not "inside some interval".

Definition — Local extremum

Let $E \subset \mathbb R^2$, $f : E \to \mathbb R$ and $a \in E$. We say that $f$ admits a local maximum at $a$ if there exists $r > 0$ such that $f(p) \le f(a)$ for every $p \in E \cap \mathrm B(a, r)$. We say that $f$ admits a local minimum at $a$ if $f(p) \ge f(a)$ on some such ball. A local extremum at $a$ is either a local maximum or a local minimum.

Example — $x^2 + y^2$ has a global minimum at the origin

The function $f(x, y) = x^2 + y^2$ on $\mathbb R^2$ has a global minimum at $(0, 0)$ : for every $(x, y) \in \mathbb R^2$, $f(x, y) \ge 0 = f(0, 0)$. In particular, $(0, 0)$ is also a local minimum. The function has no local maximum: from any point $a \in \mathbb R^2$, moving a little farther away from the origin strictly increases $x^2 + y^2$, so no value $f(a)$ dominates its neighborhood.

Skills to practice

Checking a local extremum from the definition

IV.2 Necessary condition: critical points

The one-dimensional Fermat theorem says: at an interior local extremum, the derivative vanishes. In two dimensions, the same idea applies in every direction. By the directional derivative formula, this forces the gradient to vanish. A point where the gradient vanishes is called a critical point. The condition "extremum $\Rightarrow$ critical point" is necessary but not sufficient (the next Remark shows a counter-example).

Definition — Critical point

Let $\Omega \subset \mathbb R^2$ be an open subset and $f \in \mathcal C^1(\Omega, \mathbb R)$. A point $a \in \Omega$ is called a critical point of $f$ if $\nabla f(a) = (0, 0)$, i.e. $\partial_x f(a) = \partial_y f(a) = 0$.

Theorem — Extremum implies critical point

Let $\Omega \subset \mathbb R^2$ be an open subset, $f \in \mathcal C^1(\Omega, \mathbb R)$ and $a \in \Omega$. If $f$ has a local extremum at $a$, then $a$ is a critical point of $f$ : $\nabla f(a) = (0, 0)$.

Proof

Fix an arbitrary $v \in \mathbb R^2 \setminus \{(0, 0)\}$. Since $\Omega$ is open, the one-variable function $g_v(t) = f(a + t v)$ is defined on a neighborhood of $0$. The function $g_v$ is of class $\mathcal C^1$ at $0$ (composite of the affine arc $t \mapsto a + tv$ with $f \in \mathcal C^1$), with $g_v'(0) = \mathrm D_v f(a) = \langle \nabla f(a), v\rangle$ by the directional derivative formula (Directional derivative subsection). If $f$ has a local extremum at $a$, then $g_v$ has a local extremum at $0$, hence by the one-variable Fermat theorem $g_v'(0) = 0$. Therefore $\langle \nabla f(a), v\rangle = 0$ for every $v \ne (0, 0)$ ; taking $v = e_1$ gives $\partial_x f(a) = 0$, and $v = e_2$ gives $\partial_y f(a) = 0$. Hence $\nabla f(a) = (0, 0)$.

Method — Find candidate extrema

The Theorem reduces the search for local extrema to solving the system $\partial_x f(a) = 0$ and $\partial_y f(a) = 0$ simultaneously. The solutions are the candidate points for extrema --- the critical points. To determine which candidates are actual extrema (and of what kind), one then performs the analysis-synthesis study of the Study of critical points by analysis-synthesis subsection.

The converse is false

A critical point is not always an extremum. Counter-example: $f(x, y) = x^3$ on $\mathbb R^2$. The partial derivatives are $\partial_x f = 3 x^2$ and $\partial_y f = 0$, both vanishing at $(0, 0)$. So $(0, 0)$ is a critical point. Yet $f$ takes positive values to the right of the $y$-axis ($x > 0 \Rightarrow f > 0$) and negative values to the left ($x < 0 \Rightarrow f < 0$), so $f(0, 0) = 0$ is neither a local maximum nor a local minimum. The phenomenon already happens in one variable ($x \mapsto x^3$ at $0$).

Skills to practice

Finding the critical points

IV.3 Study of critical points by analysis-synthesis

The standard method for studying a critical point is the analysis-synthesis: (analysis) find the critical points by solving $\nabla f = 0$; (synthesis) for each critical point $(a, b)$, study the exact expression $f(a + h, b + k) - f(a, b)$ by algebraic rewriting --- completing the square, factoring, or testing the sign along well-chosen paths $k = c h$. The Hessian-based second-derivative test is outside the syllabus.

Method — Find and classify extrema by analysis-synthesis

Two steps:

Analysis. Solve the system $\partial_x f = 0$ and $\partial_y f = 0$ simultaneously to find the critical points $(a, b)$ of $f$. By the Theorem of the Necessary condition: critical points subsection, every local extremum is among them.
Synthesis. For each critical point $(a, b)$, write down the exact expression $\Delta(h, k) = f(a + h, b + k) - f(a, b)$ (no $o$-remainder; if $f$ is polynomial, this is exact). Study the sign of $\Delta(h, k)$ on a neighborhood of $(0, 0)$ by algebraic rewriting:
- if $\Delta \ge 0$ near $(0, 0)$, then $(a, b)$ is a local minimum; if $\Delta \le 0$ near $(0, 0)$, it is a local maximum. If equality $\Delta = 0$ holds only at $(0, 0)$, the extremum is strict;
- if $\Delta(h, k)$ changes sign on every neighborhood of $(0, 0)$ (e.g. tested along two directions $k = c_1 h$ and $k = c_2 h$ giving opposite signs), $(a, b)$ is not an extremum.

The method works because the exact expression $\Delta(h, k)$ for a polynomial $f$ is itself polynomial, and the sign analysis is done by hand --- never via a general order-$2$ Taylor theorem. For non-polynomial $f$, a sufficient exact expansion (or factoring) is usually available in practice for the chapter's examples.

Example — Local minimum of a quadratic in two variables

Find and classify the local extrema of $f(x, y) = x^2 - 3 x + x y + y^2$ on $\mathbb R^2$.

Answer

Take $f(x, y) = x^2 - 3 x + x y + y^2$ on $\mathbb R^2$; $f$ is a polynomial, hence $\mathcal C^1$ on $\mathbb R^2$.

Analysis. $\partial_x f = 2 x - 3 + y$ and $\partial_y f = x + 2 y$. The system $\nabla f = (0, 0)$ reads $$ 2 x + y = 3, \qquad x + 2 y = 0. $$ Computing $2 \times (2 x + y) - (x + 2 y)$ gives $3 x = 6$, hence $x = 2$. The second equation then gives $y = -x / 2 = -1$. The unique critical point is $(2, -1)$, with $f(2, -1) = 4 - 6 - 2 + 1 = -3$.
Synthesis. Compute the exact difference $\Delta(h, k) = f(2 + h, -1 + k) - f(2, -1)$ for $(h, k) \in \mathbb R^2$. First expand each of the four pieces of $f(2 + h, -1 + k)$: $$ \begin{aligned} (2 + h)^2 &= 4 + 4 h + h^2 && \text{(square expansion)}, \\ -3 (2 + h) &= -6 - 3 h && \text{(distribute $-3$)}, \\ (2 + h)(-1 + k) &= -2 + 2 k - h + h k && \text{(product expansion)}, \\ (-1 + k)^2 &= 1 - 2 k + k^2 && \text{(square expansion)}. \end{aligned} $$ Adding the four lines, group by degree: $$ \begin{aligned} \text{constants:} &\quad 4 - 6 - 2 + 1 = -3 = f(2, -1) && \text{(constants give $f(2, -1)$)}, \\ \text{terms in $h$:} &\quad 4 h - 3 h - h = 0 && \text{(linear-$h$ cancel: critical point)}, \\ \text{terms in $k$:} &\quad 2 k - 2 k = 0 && \text{(linear-$k$ cancel: critical point)}, \\ \text{degree $2$:} &\quad h^2 + h k + k^2 && \text{(collected quadratic part)}. \end{aligned} $$ Subtracting $f(2, -1) = -3$ gives $\Delta(h, k) = h^2 + h k + k^2$. The quadratic form is put in canonical form: $$ h^2 + h k + k^2 \ = \ \left( h + \frac{k}{2} \right)^2 + \frac{3 k^2}{4} \ \ge \ 0, $$ with equality iff $h = 0$ and $k = 0$. Therefore $\Delta(h, k) \ge 0$ with strict inequality away from $(0, 0)$: $(2, -1)$ is a (strict) local minimum of $f$. Since $h^2 + h k + k^2 \ge 0$ for all $(h, k) \in \mathbb R^2$, the inequality $\Delta(h, k) \ge 0$ holds on the whole plane, so the minimum is in fact global.

Example — Critical point with no extremum

Show that $g(x, y) = x^2 - x - x y - y^2 / 2 + 2 y$ on $\mathbb R^2$ has a unique critical point, but that this point is not a local extremum.

Answer

Take $g(x, y) = x^2 - x - x y - y^2 / 2 + 2 y$ on $\mathbb R^2$; $g$ is a polynomial, hence $\mathcal C^1$ on $\mathbb R^2$.

Analysis. $\partial_x g = 2 x - 1 - y$ and $\partial_y g = - x - y + 2$. The system $\nabla g = (0, 0)$ reads $$ 2 x - y = 1, \qquad x + y = 2. $$ Adding the two equations gives $3 x = 3$, hence $x = 1$; substituting back gives $y = 1$. The unique critical point is $(1, 1)$, with $g(1, 1) = 1 - 1 - 1 - 1/2 + 2 = 1/2$.
Synthesis. Compute $\Delta(h, k) = g(1 + h, 1 + k) - g(1, 1)$ exactly: after expanding (linear terms in $h$ and $k$ cancel because $(1, 1)$ is a critical point), $$ \Delta(h, k) \ = \ h^2 - h k - \frac{k^2}{2}. $$ Two sign tests on this quadratic:
- Along the path $k = 0$: $\Delta(h, 0) = h^2 > 0$ for every $h \ne 0$, so $g(1 + h, 1) > g(1, 1)$.
- Along the path $h = 0$: $\Delta(0, k) = - k^2 / 2 < 0$ for every $k \ne 0$, so $g(1, 1 + k) < g(1, 1)$.
On every open ball around $(1, 1)$, $\Delta$ takes both strictly positive and strictly negative values. Hence $(1, 1)$ is not a local extremum.

Skills to practice

Finding local extrema by analysis-synthesis

Jump to section