Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer

William T. Ng, K. Siu, Albert C. Cheung, Michael K. Ng

incompletemedium confidence

Category: Not specified
Journal tier: Specialist/Solid
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper defines the edge tensor entries e_{ijk} as a cosine-type correlation between IMFs (its symmetry and |e_{ijk}| ≤ 1 follow immediately), builds the adjacency A_t by thresholding symmetric residual correlations, and introduces the TSAT head Ai = (α0 σ(Q_i K_i^T/√d_k) + Σ_k α_k σ(D_imf_k) + α_{K+1} A) V_i, with α’s trainable, but does not provide formal proofs of these basic properties, permutation equivariance, or strict generalization claims. Equation (3) and the residual correlation (4) justify symmetry of e_{ijk} and A_t by construction, while Equation (9) shows the head reduces to standard self-attention when α1=…=α_{K+1}=0 and α0=1, and that σ is used on the added matrices as an activation (the appendix also notes 'exp'/'softmax'/'none' variants) . The candidate solution supplies the missing derivations: (i) symmetry/PSD and range bounds, (ii) exact reduction to standard attention, (iii) permutation equivariance under node reindexing, and (iv) a clear, generic argument (plus example) for strict generalization beyond standard attention. We found no logical errors in the model’s reasoning. Hence, the paper’s exposition is incomplete on these points, while the model’s solution is correct.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The work is solid and practically valuable, but several basic theoretical properties of the proposed layer—obvious in hindsight—are not stated or justified. Adding short statements and proofs (symmetry/PSD, permutation equivariance, reduction to standard attention, and a remark on strict generalization) would materially improve rigor without lengthening the paper much. Empirical results are strong, motivation is clear, and the design aligns well with time-series structure.