Table of Contents
- 19.1 INTRODUCTION
- 19.2 LECTURE
- 19.2.1 Finding Optima with Gradients
- 19.2.2 Unveiling Critical Points
- 19.2.3 The Second Derivative Test Steps In
- 19.2.4 Positive and Negative Definite Matrices
- 19.2.5 Unveiling the Role of Positive Definite Hessians
- 19.2.6 Classifying Extrema in Two Dimensions
- 19.2.7 Morse Functions and the Second Derivative Test
- 19.2.8 From Hessian to Gauss Curvature
- 19.2.9 The Morse Lemma
- 19.3 EXAMPLES
- EXERCISES
19.1 INTRODUCTION
19.1.1 Exploring Learning as an Optimization Process
Learning is an optimization process with the goal to increase knowledge, skills and creative power. This applies both for education as well as for machine learning. In order to track the learning process, we need a function which measures progress. An old fashioned metric is the GPA averaging some grades in an educational system, an other or IQ scores measured by doing tests. An other metric example in a research setting is a social network score like the number of citations or the h-index. For a car driving autonomously it could be the

19.1.2 Will AI Conquer Every Domain?
Once the frame work and the function
19.1.3 Machine Learning’s Advantage in Gradient-Based Optimization
Once a machine knows the function
19.1.4 Using Gradients to Find the Direction of Improvement
Let us first look at the rate of change of a function along a direction
19.2 LECTURE
19.2.1 Finding Optima with Gradients
All functions are assumed here to be in
Theorem 1. If
Proof. We prove this by contradiction. Assume
19.2.2 Unveiling Critical Points
A point
19.2.3 The Second Derivative Test Steps In
As in one dimension, having a critical point does not assure that a point is a local maximum or minimum. The second derivative test in single variable calculus assures that if
19.2.4 Positive and Negative Definite Matrices
A matrix
19.2.5 Unveiling the Role of Positive Definite Hessians
We say
Theorem 2. Assume
Proof. As
19.2.6 Classifying Extrema in Two Dimensions
Let us look at the case, where
In this two dimensional case, we can classify the critical points if the determinant




19.2.7 Morse Functions and the Second Derivative Test
We say
Theorem 3. Assume
- If
and then is a local minimum. - If
and then is a local maximum. - If
then is a hyperbolic saddle.
Proof. After translation
19.2.8 From Hessian to Gauss Curvature
One can ask, why
19.2.9 The Morse Lemma
In higher dimensions, the situation is described by the Morse lemma. It tells that near a critical point there is a coordinate change
Theorem 4. Near a Morse critical point
Proof. We use induction with respect to
- Induction foundation: For
, the result tells that for a Morse critical point, the function looks like or . First show that if , , then or for some positive function . Proof. By a linear coordinate change we assume and . There exists then such that : it is for and is in the limit the value of . By the product rule, with . Because can define for and take the limit , because by applying Hôpital twice, the limit is . The coordinate change is now given by a function satisfying . Implicit differentiation gives so that by the implicit function theorem exists. - Induction step
: we first note that Taylor for with remainder term implies that with some continuous functions . Furthermore, the function value are the coordinates of the Hessian. Apply first a rotation so that . Now look at and keep the other coordinates constant. As in (i), find a coordinate change such that , where inherits the properties of but is of one dimension less. By induction assumption, there is a second coordinate change such that Combining and produces the Morse normal form.
◻
19.3 EXAMPLES
Example 1. Q: Classify the critical points of
A: As
EXERCISES
Exercise 1.
- Classify the critical points of the function
(Maxima, minima or saddle points). - Now do the same for
and find the Morse index at each critical point.
Exercise 2. Find all critical points of the
P.S. Area
Exercise 3. Where on the parametrized surface
Exercise 4. Find all the critical points of the function
Exercise 5.
- Find a function
with maxima and saddle points and one minimum. - You see below a contour map of a function of two variables. How many critical points are there? Is the function a Morse function?

- There could be resistance: humans might decide not to cite scientific breakthroughs by machines. On the other hand, who would not want to learn a "theory of everything" even if it is discovered by a machine?↩︎