2402.19047
Theoretical Foundations of Deep Selective State-Space Models
Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi, Terry Lyons
correctmedium confidence
- Category
- Not specified
- Journal tier
- Strong Field
- Processed
- Sep 28, 2025, 12:56 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper proves that dense Linear CDEs uniformly approximate functionals of the form Ψ(ωX[0,t]) + ∫0^t Φ(ωX[s,t])·dξXs, and that the same form characterizes their outputs; it also characterizes the diagonal case and gives a probabilistic universality result. These statements and their proofs (via signature expansions, Stone–Weierstrass, and careful tail control) are consistent and well-supported in Theorem 4.1 and Proposition B.11 for the closure, Proposition 4.6 for the expansion, Theorem 4.2 for the random-weights result, and Theorem 4.3/B.15 for the diagonal case (and B.18 for exponentials) . The candidate solution reaches the same core conclusions using a different, constructive path: (i) a path-algebra density argument on compact sets K0 and KΔ via Stone–Weierstrass (using time channel t to separate points) and (ii) an explicit truncated tensor-algebra/shift-matrix realization that matches finite signature polynomials exactly, with uniform error bounded by the total-variation of ξ. This matches the paper’s closure characterization and the diagonal-commuting case decomposition, and it aligns with the paper’s intuition that Yt itself is of the class (7) via the variation-of-constants/signature expansion . Differences: (1) the paper assumes ω1=t and ω2=t2 to guarantee separation of subpath restrictions (Lemma A.8), whereas the model argues that ω1=t already suffices on K0 and KΔ by integration-by-parts; this is a reasonable alternative separation route on these compact sets, though the paper’s t2-augmentation addresses path-equivalence issues uniformly . (2) The model’s probabilistic variant omits tail-control for words not in the finite set W when Ai are random; the paper’s Theorem 4.2/B.13 explicitly handles this with quantitative bounds, so the model’s probabilistic argument is incomplete, but this does not affect the main deterministic expressivity result .
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions \textbf{Journal Tier:} strong field \textbf{Justification:} The paper gives a rigorous and unified treatment of Linear CDE expressivity, including a precise closure characterization in both dense and diagonal settings, and a novel random-weights universality statement. The methods are solid, draw appropriately on rough path theory, and illuminate the role of gating. Minor clarifications (e.g., the role of t\^2 vs. alternative separation routes; a brief constructive example) would further improve accessibility.