Skip to content

Commit 65f57a3

Browse files
committed
Restore Bug #4 and #5 fixes: nucleus and smoothing diagnostics
Bug #4 fix: Change nucleus top_p fallback from 1.0 to 0.95, add [NUCLEUS_DEBUG] diagnostic logging. This ensures nucleus runs even if config attribute is missing, preventing 32000 survivors (full vocab). Bug #5 fix: Add [SMOOTH_DEBUG] diagnostic logging for smoothing lambda. These fixes were accidentally removed during the bug #2 draft-anchored rewrite (commit 595a371). Restoring them does not affect bug #2's core algorithm - they only improve fallback behavior and diagnostics.
1 parent 595a371 commit 65f57a3

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

vllm/v1/spec_decode/eagle.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,9 @@ def _sample_draft_tokens(
258258
x = torch.full_like(x, float("-inf")).scatter(-1, idx, vals)
259259

260260
# Top-p (nucleus) with correct boundary rule
261-
top_p = float(getattr(self.opt_config, "draft_top_p", 1.0) or 1.0)
261+
top_p = float(getattr(self.opt_config, "draft_top_p", 0.95) or 0.95)
262+
print(f"[NUCLEUS_DEBUG] draft_top_p from config: {top_p}, will run nucleus: {0.0 < top_p < 1.0}",
263+
file=sys.stderr, flush=True)
262264
if 0.0 < top_p < 1.0:
263265
p = torch.softmax(x, dim=-1)
264266
sp, si = torch.sort(p, dim=-1, descending=True)
@@ -272,6 +274,8 @@ def _sample_draft_tokens(
272274
# Optional smoothing with untempered baseline
273275
probs_full = torch.softmax(x, dim=-1)
274276
lam = float(getattr(self.opt_config, "draft_mix_lambda_max", 0.0) or 0.0)
277+
print(f"[SMOOTH_DEBUG] lambda_max from config: {lam}, will run smoothing: {lam > 0.0}",
278+
file=sys.stderr, flush=True)
275279
if lam > 0.0:
276280
base = torch.softmax(logits_f32, dim=-1) # untempered baseline
277281
probs_full = (1.0 - lam) * probs_full + lam * base

0 commit comments

Comments
 (0)