Mixing Implicit and Explicit Deep Learning with Skip DEQs and Infinite Time Neural ODEs (Continuous DEQs)

Avik Pal, Alan Edelman, Christopher Rackauckas

correctmedium confidence

Category: Not specified
Journal tier: Specialist/Solid
Processed: Sep 28, 2025, 12:56 AM
arXiv Links: Abstract ↗PDF ↗

Audit review

The paper’s three focal claims are present and supported in the PDF: (A) average speedups are stated in Figure 1 (2.55x train, 3.349x predict) and reiterated in the conclusion (≈2.5x, 3.4x) with per-task tables (MNIST Dense/Conv., CIFAR‑10, SVHN) reporting the underlying timings used to form such ratios . (B) Robustness and convergence-depth improvements are described qualitatively and quantitatively (e.g., “best models have a 3.74x reduction in convergence depth”; continuous forms stabilize depth and improve convergence in MNIST Conv./SVHN) . (C) The continuous DEQ’s gradients use the same implicit differentiation as discrete DEQ, so no backpropagation through time is required, as explicitly stated in the Continuous DEQ section and consistent with the DEQ identity given earlier in the paper . By contrast, the candidate solution’s empirical check misreads CIFAR‑10 (using nonexistent “training time (s/batch)” and values that do not appear in Table 3), and it leaves the MNIST/SVHN aggregation unfinished despite the necessary numerals being available in the uploaded PDF. Its proof for (C) is correct, but its verification for (A) is numerically wrong and incomplete, and (B) is only argued theoretically without reconciling the precise reported figures.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

A solid, practitioner-relevant paper that demonstrates speed and stability improvements for implicit models by mixing explicit prediction with implicit correction and by adopting a continuous DEQ formulation that avoids BPTT. The empirical tables broadly support the claims. Clarifying how the headline averages are computed and tightening unit conventions would improve reproducibility and clarity.