Back to search
2503.14927

Semi-Gradient SARSA Routing with Theoretical Guarantee on Traffic Stability and Weight Convergence

Yidan Wu, Yu Yu, Jianan Zhang, Li Jin

incompletemedium confidence
Category
Not specified
Journal tier
Specialist/Solid
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper’s formal theorem establishes sufficiency (convergence when λ < Σ μn) but does not actually prove the necessity (“only if”) direction it repeatedly claims; moreover, key steps in the proof rely on extra conditions (e.g., ‘sufficiently small’ softmax temperature and a small Lipschitz constant for the policy) that are not stated in Theorem 1 and appear only inside lemmas or proofs. See the theorem statement and surrounding exposition, which assume λ < Σ μn and then prove convergence, while the text elsewhere asserts an iff claim without a corresponding converse proof; and note the additional small-ι and small-Lπ requirements used in Lemma 1 and the ODE argument, respectively . The candidate model gives both directions, but its sufficiency proof depends on unsubstantiated steps (e.g., a uniform Foster–Lyapunov drift for all frozen policies, and identifying the SA mean field with a gradient under the optimal stationary distribution), so it is also incomplete.

Referee report (LaTeX)

\textbf{Recommendation:} major revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The contribution is promising and addresses an important gap by coupling traffic stability with the convergence of an RL algorithm on an unbounded state space. However, the current presentation overclaims an iff result while only proving sufficiency, and it relies on smallness conditions that are not stated in the main theorem. Some technical steps (ODE normalization, continuity/ergodicity dependencies, basis assumptions) need sharpening. With these issues addressed, the work could be a solid contribution.