Back to search
2407.20209

Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning

Dennis Chemnitz, Maximilian Engel

correctmedium confidence
Category
math.DS
Journal tier
Strong Field
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper precisely states and proves that, under H1–H3, the sign of µ(x*) characterizes the support of the GD limit (Theorem A) and, for regular x*, the sign of λ(x*) does the same for SGD (Theorem B) . It defines µ and λ via the linearization that respects the tangent/normal splitting at M, and builds local coordinates in which M is a linear subspace . The GD-stable case uses a contraction argument in the normal coordinates; the unstable case uses a center-stable manifold theorem and a null-set preimage lemma (H3) to exclude support . For SGD, λ is defined via Kingman’s subadditive theorem, and the stable case is handled carefully using upper semi-continuity and local bounds, avoiding uniform hyperbolicity assumptions . Crucially, for SGD instability the paper explicitly avoids invoking a random center-stable foliation over an open subset of M, noting that Pesin theory only gives manifolds at single points and that a foliation over a whole subset is not available; instead it constructs an annealed Lyapunov function and uses a projective semigroup/spectral-gap argument (LePage) to prove non-support . By contrast, the model’s solution asserts the existence of a C^1 random lamination for SGD near M and uses a random stable manifold theorem to conclude Lebesgue-null basins under λ(x*)>0; this step is not justified under H1–H3 and the paper’s own discussion indicates this approach is not available in this setting. The model’s GD analysis and the SGD-stable case align in spirit with the paper, but its SGD-unstable argument relies on unproven structural assumptions. Therefore, the paper is correct while the model’s proof is incomplete/incorrect in the SGD-unstable case.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The paper rigorously characterizes which global minima can be attained with positive probability by GD and SGD in overparameterized settings via linear stability indicators µ and λ. The analysis is technically careful, especially for SGD where the authors avoid unavailable random foliation results by an annealed Lyapunov-function approach. The work meaningfully clarifies the qualitative implicit bias of GD/SGD. Minor clarifications and examples would enhance accessibility.