Back to search
2412.02682

THE ASYMPTOTIC BEHAVIOR OF ATTENTION IN TRANSFORMERS

Á. RODRÍGUEZ ABELLA, J.P. SILVESTRE, P. TABUADA

correctmedium confidence
Category
Not specified
Journal tier
Strong Field
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper proves asymptotic stability of consensus for causal (auto-regressive) attention with identity value matrix and bounded time-varying P(t) using an ISS-Lyapunov cascade argument (Theorem 5.1), relying on the triangular structure, positivity/lower bounds on attention weights, and an auxiliary scalar Lyapunov function V(a) (sections and equations around the auto-regressive model and Theorem 5.1) . The candidate solution establishes the same result via a different route: a direct differential inequality in cosine coordinates, uniform lower bounds on aggregate weights, Young’s inequality, and a cascade comparison for sup norms. It matches the paper’s hypotheses (bounded P(t), identity U, lower-triangular attention, positivity and normalization of weights) and reaches the same conclusion; it also supplies explicit exponential rates that the paper does not emphasize. No material contradictions with the paper’s model or assumptions were found (e.g., the softmax normalization with √(n+1) and positivity/lower bounds for α are consistent with Lemma 4.1) .

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

A rigorous and timely analysis of attention dynamics using control-theoretic tools. The main auto-regressive consensus result is demonstrated clearly and aligns with observed collapse phenomena. The proofs are sound and self-contained; a few expository refinements (explicit regularity assumptions, constants, and rate remarks) would increase accessibility and facilitate comparison with alternative derivations.