SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning

Nicholas Zolman, Christian Lagemann, Urban Fasel, J. Nathan Kutz, Steven L. Brunton

incompletemedium confidence

Category: Not specified
Journal tier: Strong Field
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper makes environment-specific, empirical claims of 10–100× sample-efficiency gains (e.g., 100× on swing-up; 14.47× on 3D airfoil) and describes practical stability heuristics (resetting/projection), but it does not provide a general theorem establishing a universal 100× guarantee . The candidate model offers a conditional theorem with explicit assumptions and bounds linking model error, planning accuracy, and environment interactions—this is logically sound overall, though a small correction is needed in the reward Lipschitz transfer (a missing factor depending on the policy Lipschitz constant). Net: the paper is empirically correct but theoretically incomplete; the model’s conditional analysis is essentially correct and clarifies when large reductions (e.g., 100×) are attainable.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} strong field

\textbf{Justification:}

The work demonstrates a compelling and practically important integration of sparse dictionary models with Dyna-style RL, yielding large empirical gains and interpretability. Claims about “10–100×” efficiency are framed empirically and substantiated across tasks, with thoughtful discussion of stability heuristics and ablations (e.g., (Ncollect, nbatch)). However, the manuscript would benefit from explicitly stating the absence of a universal guarantee and from augmenting the discussion with conditions under which the observed gains are expected, bridging to formal bounds.