2407.20209
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz, Maximilian Engel
correctmedium confidence
- Category
- math.DS
- Journal tier
- Strong Field
- Processed
- Sep 28, 2025, 12:56 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper precisely states and proves that, under H1–H3, the sign of µ(x*) characterizes the support of the GD limit (Theorem A) and, for regular x*, the sign of λ(x*) does the same for SGD (Theorem B) . It defines µ and λ via the linearization that respects the tangent/normal splitting at M, and builds local coordinates in which M is a linear subspace . The GD-stable case uses a contraction argument in the normal coordinates; the unstable case uses a center-stable manifold theorem and a null-set preimage lemma (H3) to exclude support . For SGD, λ is defined via Kingman’s subadditive theorem, and the stable case is handled carefully using upper semi-continuity and local bounds, avoiding uniform hyperbolicity assumptions . Crucially, for SGD instability the paper explicitly avoids invoking a random center-stable foliation over an open subset of M, noting that Pesin theory only gives manifolds at single points and that a foliation over a whole subset is not available; instead it constructs an annealed Lyapunov function and uses a projective semigroup/spectral-gap argument (LePage) to prove non-support . By contrast, the model’s solution asserts the existence of a C^1 random lamination for SGD near M and uses a random stable manifold theorem to conclude Lebesgue-null basins under λ(x*)>0; this step is not justified under H1–H3 and the paper’s own discussion indicates this approach is not available in this setting. The model’s GD analysis and the SGD-stable case align in spirit with the paper, but its SGD-unstable argument relies on unproven structural assumptions. Therefore, the paper is correct while the model’s proof is incomplete/incorrect in the SGD-unstable case.
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions \textbf{Journal Tier:} strong field \textbf{Justification:} The paper rigorously characterizes which global minima can be attained with positive probability by GD and SGD in overparameterized settings via linear stability indicators µ and λ. The analysis is technically careful, especially for SGD where the authors avoid unavailable random foliation results by an annealed Lyapunov-function approach. The work meaningfully clarifies the qualitative implicit bias of GD/SGD. Minor clarifications and examples would enhance accessibility.