[Speculators] Move tests + fix integration #27308

dsikka · 2025-10-21T23:57:57Z

This PR fixes the speculator model integration that enables the simplified vllm serve <speculator-model> command and ensures compatibility with S3 models in CI.

Background

The speculator integration allows users to run speculative decoding without explicitly providing a --speculative-config by automatically detecting speculator models and extracting the configuration. This integration kept breaking on main because:

Tests were located in tests/speculative_decoding/speculators/ which doesn't run in CI
The integration didn't properly handle S3 models used in CI testing

Changes

1. Moved Speculator Tests to CI-Monitored Directory

Before: tests/speculative_decoding/speculators/test_eagle3.py
After: tests/v1/spec_decode/test_speculators_eagle3.py

This ensures tests run in CI and prevent future breakage.

2. Fixed S3 Model Compatibility

Problem:

Speculator detection (maybe_override_with_speculators()) was moved before create_model_config() to properly detect speculators before creating the model config
However, HuggingFace's PretrainedConfig.get_config_dict() cannot load configs from S3 URLs (s3://...)
This caused failures when running with S3 models in CI

Solution:
Skip speculator auto-detection for S3 models:

# Skip speculator detection for S3 models since HuggingFace cannot load
# configs directly from S3 URLs. S3 models can still use speculators with
# explicit --speculative-config.
if not is_s3(self.model):
    (self.model, self.tokenizer, self.speculative_config) = (
        maybe_override_with_speculators(...)
    )

Trade-off: S3 models cannot use automatic speculator detection but can still use speculators via explicit --speculative-config argument.

3. Added Comprehensive Integration Test

Added test_speculators_model_integration() in tests/v1/e2e/test_spec_decode.py to validate the simplified integration path:

What it tests:

✅ Speculator model auto-detection without explicit config
✅ Speculative config correctly extracted from model
✅ Verifier model properly identified
✅ Draft model set to speculator model
✅ Text generation works with speculative decoding
✅ Output correctness (66% match threshold vs reference)

Test models:

nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717-quantized
nm-testing/Speculator-Qwen3-8B-Eagle3-converted-071-quantized

Compatibility Matrix

Model Type	Speculator Auto-Detection	Manual `--speculative-config`
HuggingFace models	✅ Yes	✅ Yes
Local models	✅ Yes	✅ Yes
S3 models (CI)	❌ No	✅ Yes

Testing

Run the new test:

python -m pytest tests/v1/e2e/test_spec_decode.py::test_speculators_model_integration -v

Run specific variant:

python -m pytest tests/v1/e2e/test_spec_decode.py::test_speculators_model_integration[llama3_eagle3_speculator] -v

Files Changed

tests/v1/e2e/test_spec_decode.py | 79 ++++++++++++++++++++++++++
vllm/engine/arg_utils.py         | 22 ++++++--
2 files changed, 92 insertions(+), 9 deletions(-)

Related Issues

Fixes the issue where speculator integration tests weren't running in CI, preventing detection of breaking changes to the vllm serve <speculator-model> integration.

tests/v1/e2e/test_spec_decode.py

dsikka

One quick change - sorry!

mergify · 2025-10-27T15:33:19Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dsikka.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Dipika Sikka <[email protected]>

…lConfig creation When using 'vllm serve' with a speculator model path directly (e.g., RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3), the tokenizer loading was failing because ModelConfig was created with the speculator path before maybe_override_with_speculators() could swap it to the target model path. This fix moves the maybe_override_with_speculators() call to happen BEFORE create_model_config(), ensuring that: 1. Speculator models are detected early 2. The target model path is extracted from the speculators config 3. ModelConfig is created with the correct target model path 4. Tokenizer loads successfully from the target model Signed-off-by: Rahul Tuli <[email protected]>

Signed-off-by: Rahul Tuli <[email protected]>

Signed-off-by: rahul-tuli <[email protected]>

Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: rahul-tuli <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

bkauf · 2025-10-30T17:34:30Z

This PR has broken support for gs:// which is used by the Run AI model streamer

Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: rahul-tuli <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

mergify bot added speculative-decoding v1 labels Oct 21, 2025

rahul-tuli force-pushed the fix_speculators branch 2 times, most recently from aca9493 to b39699e Compare October 24, 2025 13:53

dsikka marked this pull request as ready for review October 24, 2025 14:38

aarnphm approved these changes Oct 24, 2025

View reviewed changes

dsikka commented Oct 24, 2025

View reviewed changes

tests/v1/e2e/test_spec_decode.py Outdated Show resolved Hide resolved

dsikka commented Oct 24, 2025

View reviewed changes

mgoin approved these changes Oct 24, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 24, 2025

rahul-tuli force-pushed the fix_speculators branch from d0952c6 to 9eeb18a Compare October 27, 2025 11:35

DarkLight1337 enabled auto-merge (squash) October 27, 2025 11:44

mergify bot added the needs-rebase label Oct 27, 2025

dsikka and others added 5 commits October 27, 2025 17:37

move speculators tests with the reest

7360035

Signed-off-by: Dipika Sikka <[email protected]>

Fix speculator model integration and add S3 compatibility

94a4e28

Signed-off-by: Rahul Tuli <[email protected]>

Replace nm-testing/ models with RedHatAI equivalents for testing

e291a48

Signed-off-by: Rahul Tuli <[email protected]>

Remove: faulty model

c899908

Signed-off-by: rahul-tuli <[email protected]>

auto-merge was automatically disabled October 27, 2025 17:39
Head branch was pushed to by a user without write access

rahul-tuli force-pushed the fix_speculators branch from 73cdfd4 to c899908 Compare October 27, 2025 17:39

mergify bot removed the needs-rebase label Oct 27, 2025

rahul-tuli added 2 commits October 28, 2025 17:19

Merge branch 'main' into fix_speculators

feb2939

Merge branch 'main' into fix_speculators

b310e53

robertgshaw2-redhat enabled auto-merge (squash) October 28, 2025 16:20

Merge branch 'main' into fix_speculators

e41e315

vllm-bot merged commit 413ef7a into vllm-project:main Oct 29, 2025
46 of 47 checks passed

mgoin approved these changes Oct 29, 2025

View reviewed changes

bkauf mentioned this pull request Oct 30, 2025

[Bug]: Google Storage broken in Run:ai model streamer #27819

Closed

amacaskill mentioned this pull request Nov 5, 2025

Add runai model streamer e2e test for GCS #28079

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Speculators] Move tests + fix integration #27308

[Speculators] Move tests + fix integration #27308

dsikka commented Oct 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

mergify bot commented Oct 27, 2025

Uh oh!

Uh oh!

bkauf commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

[Speculators] Move tests + fix integration #27308

[Speculators] Move tests + fix integration #27308

Conversation

dsikka commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

1. Moved Speculator Tests to CI-Monitored Directory

2. Fixed S3 Model Compatibility

3. Added Comprehensive Integration Test

Compatibility Matrix

Testing

Files Changed

Related Issues

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 27, 2025

Uh oh!

Uh oh!

bkauf commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dsikka commented Oct 21, 2025 •

edited by github-actions bot

Loading