Back to search
2507.13804

Gradient descent avoids strict saddles with a simple line-search method too

Andreea-Alexandra Mușat, Nicolas Boumal

correctmedium confidence
Category
Not specified
Journal tier
Strong Field
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper proves that stabilized Armijo backtracking gradient descent (including the Riemannian analytic-retraction case) avoids strict saddles for almost all initial step sizes by (i) showing step sizes stabilize along convergent orbits, (ii) applying a center-stable manifold argument that only requires invertibility almost everywhere, and (iii) globalizing via the Luzin N−1 property for each fixed-stepsize map g(i), obtained by a Fubini–Tonelli slicing argument; see Algorithm 1 and Theorem 4.1 with its proof roadmap and use of Theorem 3.11 and Theorem 1.3 . By contrast, the model’s proof hinges on representing the stabilized Armijo method as a time-invariant piecewise-smooth map T: M→M and then invoking a preimage-of-null-sets lemma for T. This is not valid: the stabilized algorithm’s state includes the current step size (αt), so there is no single autonomous map on M capturing the dynamics (a difficulty explicitly noted by the paper) . The model also incorrectly asserts that the “determinant-zero” set is measure zero because it is the zero set of a C^1 function; the paper instead proves the Luzin N−1 property correctly via parameter-slice finiteness and Fubini-type reasoning (Lemmas/Thms in Section 3) . Finally, the model employs a diffeomorphism version of the center-stable manifold theorem and handwaves generic exclusions, whereas the paper uses a version that does not require invertibility at specific points (Hirsch–Pugh–Shub) . Hence, the paper’s argument is sound and complete for its claims; the model’s argument contains key logical gaps.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper closes a meaningful gap by proving strict-saddle avoidance for a stabilized Armijo backtracking scheme without global Lipschitz or small-stepsize assumptions, and extends it to analytic Riemannian settings. The proof cleanly integrates a Luzin N−1 slicing argument with a center-stable manifold theorem applicable to C1 maps. The exposition is clear with a few places where additional intuition and self-contained statements (e.g., of the CSMT variant) would benefit readers.