Back to search
2503.21929

Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models

Tom Kempton, Stuart Burrell

wrongmedium confidence
Category
math.DS
Journal tier
Specialist/Solid
Processed
Sep 28, 2025, 12:56 AM

Audit review

The paper’s Section 5 correctly derives the top‑k and nucleus results using the standard Gibbs variational lemma (Lemma 5.1) and obtains an entropy + average log‑likelihood + log‑distortion objective for the locally normalized joint (Corollaries 5.1–5.2) . However, Corollary 5.3 (temperature) states the third term as E_µ[ε_τ(w)] rather than E_µ[log ε_τ(w)], which is inconsistent with the same lemma and with the derivation pattern used for top‑k/nucleus; the correct objective must have the log distortion so that the Gibbs maximizer reproduces q_τ ∝ p^{1/τ}·ε_τ rather than p^{1/τ}·e^{ε_τ} . The paper otherwise correctly notes that global normalization removes the distortion term (Corollary 5.4) . The candidate solution pinpoints and corrects the temperature misstatement, and matches the paper’s logic everywhere else, including the telescoping/normalization structure and the global-vs-local contrast.

Referee report (LaTeX)

\textbf{Recommendation:} minor revisions

\textbf{Journal Tier:} specialist/solid

\textbf{Justification:}

The paper develops a clean, unified variational perspective on decoding strategies, exposing a path-dependent distortion due to local normalization and motivating global normalization. The mathematics is accessible and the empirical illustrations are supportive. However, the temperature corollary is misprinted (uses E[ετ] instead of E[log ετ]), which directly conflicts with the stated lemma and should be corrected. With this fix and small clarifications of assumptions, the contribution is solid and useful to the community.