2403.09110
SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning
Nicholas Zolman, Christian Lagemann, Urban Fasel, J. Nathan Kutz, Steven L. Brunton
incompletemedium confidence
- Category
- Not specified
- Journal tier
- Strong Field
- Processed
- Sep 28, 2025, 12:56 AM
- arXiv Links
- Abstract ↗PDF ↗
Audit review
The paper makes environment-specific, empirical claims of 10–100× sample-efficiency gains (e.g., 100× on swing-up; 14.47× on 3D airfoil) and describes practical stability heuristics (resetting/projection), but it does not provide a general theorem establishing a universal 100× guarantee . The candidate model offers a conditional theorem with explicit assumptions and bounds linking model error, planning accuracy, and environment interactions—this is logically sound overall, though a small correction is needed in the reward Lipschitz transfer (a missing factor depending on the policy Lipschitz constant). Net: the paper is empirically correct but theoretically incomplete; the model’s conditional analysis is essentially correct and clarifies when large reductions (e.g., 100×) are attainable.
Referee report (LaTeX)
\textbf{Recommendation:} minor revisions \textbf{Journal Tier:} strong field \textbf{Justification:} The work demonstrates a compelling and practically important integration of sparse dictionary models with Dyna-style RL, yielding large empirical gains and interpretability. Claims about “10–100×” efficiency are framed empirically and substantiated across tasks, with thoughtful discussion of stability heuristics and ablations (e.g., (Ncollect, nbatch)). However, the manuscript would benefit from explicitly stating the absence of a universal guarantee and from augmenting the discussion with conditions under which the observed gains are expected, bridging to formal bounds.