[Question] Speculative Decoding: Sampling from the draft probs vector

### Description

The speculative sampling kernel introduced via pull request #3373 ([sgl-kernel/src/sgl-kernel/csrc/speculative_sampling.cuh](https://github.com/sgl-project/sglang/blame/main/sgl-kernel/csrc/speculative/speculative_sampling.cuh#L91)) appears to be incomplete. There are clear comments like `// FIXME: leverage draft probs` indicating that the implementation does not yet sample from the drafter's probability vector, instead defaulting to zeros as placeholder values.

This discrepancy directly undermines speculative decoding’s effectiveness. If not fixed, it significantly reduces the acceptance rate:

- **Desired behavior (sampling):**  
  `Acceptance rate = ∑ₓ min(p(x), q(x))`  
  where *p(x)* is the target model’s distribution and *q(x)* is the drafter’s.

- **Current behavior (greedy/zeros):**  
  `Acceptance rate = p(x)`

---

### References

- The kernel file with the FIXME comment:  
  `sgl-kernel/src/sgl-kernel/csrc/speculative_sampling.cuh` as introduced in pull request **#3373**.  
  [PR link](https://github.com/sgl-project/sglang/pull/3373)

- Slack discussion highlighting that `draft_probs` may be set to zeros:  
  [eagle_utils.py#L474](https://github.com/sgl-project/sglang/blob/96149ea8b51983cbe261da92c56e5780040ed593/python/sglang/srt/speculative/eagle_utils.py#L474)

---

### Questions & Action Items

1. **Is the drafter's probability vector intended to be sampled from in the current kernel implementation?**  
2. If not, **is there a timeline or PR planned to update the kernel to properly incorporate sampling from draft probabilities?**

---

This issue blocks [#9539](https://github.com/sgl-project/sglang/pull/9539), and may also block [#8581](https://github.com/sgl-project/sglang/issues/8581) and [#8391](https://github.com/sgl-project/sglang/issues/8391).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Speculative Decoding: Sampling from the draft probs vector #9877

Description

References

Questions & Action Items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Speculative Decoding: Sampling from the draft probs vector #9877

Description

Description

References

Questions & Action Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions