Back to search
2505.07244

The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property

Christian Kuehn, Sara-Viola Kuntz

correctmedium confidence
Category
Not specified
Journal tier
Specialist/Solid
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper’s main theorem (Theorem 5.4) precisely states and proves that non-augmented neural DDEs with sufficiently small memory capacity Kτ cannot uniformly approximate a map whose i-th component has a non-degenerate local extremum; the proof uses a careful small-delay decomposition y(·) via G on [0,βτ] and H on [βτ,T], with quantitative bounds (G near the identity on O(τ), and exponential attraction toward special solutions with δ3,τ=C2 e^{−T/τ}), plus a Jordan–Brouwer-based level-set separation that yields a hard geometric contradiction (Sections 5.3–5.5) . By contrast, the candidate solution hinges on a global “closeness-to-identity” estimate for S_T (Step 2) claiming ∥S_T(c1)−S_T(c2)−(c1−c2)∥ ≤ η∥c1−c2∥ with η=(Kτe)/(1−Kτe). This is not established in the paper and is generally false (it contradicts the τ→0 ODE limit, where ∥S_T(c1)−S_T(c2)∥ can scale like e^{KT}∥c1−c2∥, not like ∥c1−c2∥). The paper instead relies on existence/uniqueness and exponential attraction of special solutions for Kτe<1 (Theorem 4.5) and a Lipschitz estimate for I, not a global near-identity property of S_T . Because the candidate’s Step 2 is the linchpin for its extremum and Lipschitz arguments, the model’s proof collapses, whereas the paper’s argument remains coherent and complete.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The manuscript establishes a sharp and conceptually clear barrier for universal approximation by non-augmented neural DDEs at small memory capacity. The proof is well organized: it combines a local Morse normal form with a careful small-delay decomposition (initial O(τ) drift and an exponentially small tail), existence and uniqueness of special solutions for Kτe<1, and a Jordan–Brouwer–based separation argument to force a contradiction. The result cleanly complements the positive universal embedding/approximation regimes and clarifies the role of memory. I suggest minor clarifications and tightening of constants/lipschitz estimates, but I find no fundamental correctness issues.