2412.02682
THE ASYMPTOTIC BEHAVIOR OF ATTENTION IN TRANSFORMERS
Á. RODRÍGUEZ ABELLA, J.P. SILVESTRE, P. TABUADA
correctmedium confidence
- Category
- Not specified
- Journal tier
- Strong Field
- Processed
- Sep 28, 2025, 12:56 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper proves asymptotic stability of consensus for causal (auto-regressive) attention with identity value matrix and bounded time-varying P(t) using an ISS-Lyapunov cascade argument (Theorem 5.1), relying on the triangular structure, positivity/lower bounds on attention weights, and an auxiliary scalar Lyapunov function V(a) (sections and equations around the auto-regressive model and Theorem 5.1) . The candidate solution establishes the same result via a different route: a direct differential inequality in cosine coordinates, uniform lower bounds on aggregate weights, Young’s inequality, and a cascade comparison for sup norms. It matches the paper’s hypotheses (bounded P(t), identity U, lower-triangular attention, positivity and normalization of weights) and reaches the same conclusion; it also supplies explicit exponential rates that the paper does not emphasize. No material contradictions with the paper’s model or assumptions were found (e.g., the softmax normalization with √(n+1) and positivity/lower bounds for α are consistent with Lemma 4.1) .
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions \textbf{Journal Tier:} strong field \textbf{Justification:} A rigorous and timely analysis of attention dynamics using control-theoretic tools. The main auto-regressive consensus result is demonstrated clearly and aligns with observed collapse phenomena. The proofs are sound and self-contained; a few expository refinements (explicit regularity assumptions, constants, and rate remarks) would increase accessibility and facilitate comparison with alternative derivations.