Skip to content

Conversation

@dsikka
Copy link
Contributor

@dsikka dsikka commented Oct 21, 2025

This PR fixes the speculator model integration that enables the simplified vllm serve <speculator-model> command and ensures compatibility with S3 models in CI.

Background

The speculator integration allows users to run speculative decoding without explicitly providing a --speculative-config by automatically detecting speculator models and extracting the configuration. This integration kept breaking on main because:

  1. Tests were located in tests/speculative_decoding/speculators/ which doesn't run in CI
  2. The integration didn't properly handle S3 models used in CI testing

Changes

1. Moved Speculator Tests to CI-Monitored Directory

  • Before: tests/speculative_decoding/speculators/test_eagle3.py
  • After: tests/v1/spec_decode/test_speculators_eagle3.py

This ensures tests run in CI and prevent future breakage.

2. Fixed S3 Model Compatibility

Problem:

  • Speculator detection (maybe_override_with_speculators()) was moved before create_model_config() to properly detect speculators before creating the model config
  • However, HuggingFace's PretrainedConfig.get_config_dict() cannot load configs from S3 URLs (s3://...)
  • This caused failures when running with S3 models in CI

Solution:
Skip speculator auto-detection for S3 models:

# Skip speculator detection for S3 models since HuggingFace cannot load
# configs directly from S3 URLs. S3 models can still use speculators with
# explicit --speculative-config.
if not is_s3(self.model):
    (self.model, self.tokenizer, self.speculative_config) = (
        maybe_override_with_speculators(...)
    )

Trade-off: S3 models cannot use automatic speculator detection but can still use speculators via explicit --speculative-config argument.

3. Added Comprehensive Integration Test

Added test_speculators_model_integration() in tests/v1/e2e/test_spec_decode.py to validate the simplified integration path:

What it tests:

  • ✅ Speculator model auto-detection without explicit config
  • ✅ Speculative config correctly extracted from model
  • ✅ Verifier model properly identified
  • ✅ Draft model set to speculator model
  • ✅ Text generation works with speculative decoding
  • ✅ Output correctness (66% match threshold vs reference)

Test models:

  • nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717-quantized
  • nm-testing/Speculator-Qwen3-8B-Eagle3-converted-071-quantized

Compatibility Matrix

Model Type Speculator Auto-Detection Manual --speculative-config
HuggingFace models ✅ Yes ✅ Yes
Local models ✅ Yes ✅ Yes
S3 models (CI) ❌ No ✅ Yes

Testing

Run the new test:

python -m pytest tests/v1/e2e/test_spec_decode.py::test_speculators_model_integration -v

Run specific variant:

python -m pytest tests/v1/e2e/test_spec_decode.py::test_speculators_model_integration[llama3_eagle3_speculator] -v

Files Changed

tests/v1/e2e/test_spec_decode.py | 79 ++++++++++++++++++++++++++
vllm/engine/arg_utils.py         | 22 ++++++--
2 files changed, 92 insertions(+), 9 deletions(-)

Related Issues

Fixes the issue where speculator integration tests weren't running in CI, preventing detection of breaking changes to the vllm serve <speculator-model> integration.

@rahul-tuli rahul-tuli force-pushed the fix_speculators branch 2 times, most recently from aca9493 to b39699e Compare October 24, 2025 13:53
@dsikka dsikka marked this pull request as ready for review October 24, 2025 14:38
Copy link
Contributor Author

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick change - sorry!

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 24, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 27, 2025 11:44
@mergify
Copy link

mergify bot commented Oct 27, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dsikka.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 27, 2025
dsikka and others added 5 commits October 27, 2025 17:37
…lConfig creation

  When using 'vllm serve' with a speculator model path directly
  (e.g., RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3), the tokenizer
  loading was failing because ModelConfig was created with the speculator
  path before maybe_override_with_speculators() could swap it to the
  target model path.

  This fix moves the maybe_override_with_speculators() call to happen
  BEFORE create_model_config(), ensuring that:
  1. Speculator models are detected early
  2. The target model path is extracted from the speculators config
  3. ModelConfig is created with the correct target model path
  4. Tokenizer loads successfully from the target model

Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
auto-merge was automatically disabled October 27, 2025 17:39

Head branch was pushed to by a user without write access

@mergify mergify bot removed the needs-rebase label Oct 27, 2025
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) October 28, 2025 16:20
@vllm-bot vllm-bot merged commit 413ef7a into vllm-project:main Oct 29, 2025
46 of 47 checks passed
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 30, 2025
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
@bkauf
Copy link

bkauf commented Oct 30, 2025

This PR has broken support for gs:// which is used by the Run AI model streamer

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants