You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug #1 (CRITICAL): Add missing begin() and stage() methods to KVWriteRouter
- Flash attention backend calls router.begin() and router.stage()
- KVWriteRouter only had write() and commit() methods
- Added begin() to store slot_mapping and initialize shadow buffer
- Added stage() to extract per-timestep slot and stage KV pairs
- Without these, no tokens were being staged → 0% acceptance rate
Bug #2 (MODERATE): Fix bonus token counting in accepted_lens
- valid_sampled_token_ids includes [accepted_draft_tokens..., bonus_token]
- Previous: len([bonus]) = 1, incorrectly counted as 1 accepted draft token
- Fixed: Use max(0, len(seq) - 1) to exclude bonus token from count
- Now correctly reports 0 accepted when only bonus token is present
Files modified:
- vllm/v1/kv_cache/write_router.py: Added begin() and stage() methods
- vllm/v1/worker/gpu_model_runner.py: Fixed accepted_lens calculation
0 commit comments