-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Speculators] Move tests + fix integration #27308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+97
−15
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aca9493 to
b39699e
Compare
aarnphm
approved these changes
Oct 24, 2025
dsikka
commented
Oct 24, 2025
dsikka
commented
Oct 24, 2025
Contributor
Author
dsikka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick change - sorry!
mgoin
approved these changes
Oct 24, 2025
d0952c6 to
9eeb18a
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Dipika Sikka <[email protected]>
…lConfig creation When using 'vllm serve' with a speculator model path directly (e.g., RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3), the tokenizer loading was failing because ModelConfig was created with the speculator path before maybe_override_with_speculators() could swap it to the target model path. This fix moves the maybe_override_with_speculators() call to happen BEFORE create_model_config(), ensuring that: 1. Speculator models are detected early 2. The target model path is extracted from the speculators config 3. ModelConfig is created with the correct target model path 4. Tokenizer loads successfully from the target model Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
auto-merge was automatically disabled
October 27, 2025 17:39
Head branch was pushed to by a user without write access
73cdfd4 to
c899908
Compare
mgoin
approved these changes
Oct 29, 2025
MatthewBonanni
pushed a commit
to MatthewBonanni/vllm
that referenced
this pull request
Oct 30, 2025
Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: rahul-tuli <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
|
This PR has broken support for gs:// which is used by the Run AI model streamer |
5 tasks
ilmarkov
pushed a commit
to neuralmagic/vllm
that referenced
this pull request
Nov 7, 2025
Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: rahul-tuli <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
ZhengHongming888
pushed a commit
to ZhengHongming888/vllm
that referenced
this pull request
Nov 8, 2025
Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: rahul-tuli <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
rtourgeman
pushed a commit
to rtourgeman/vllm
that referenced
this pull request
Nov 10, 2025
Signed-off-by: Dipika Sikka <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: rahul-tuli <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the speculator model integration that enables the simplified
vllm serve <speculator-model>command and ensures compatibility with S3 models in CI.Background
The speculator integration allows users to run speculative decoding without explicitly providing a
--speculative-configby automatically detecting speculator models and extracting the configuration. This integration kept breaking on main because:tests/speculative_decoding/speculators/which doesn't run in CIChanges
1. Moved Speculator Tests to CI-Monitored Directory
tests/speculative_decoding/speculators/test_eagle3.pytests/v1/spec_decode/test_speculators_eagle3.pyThis ensures tests run in CI and prevent future breakage.
2. Fixed S3 Model Compatibility
Problem:
maybe_override_with_speculators()) was moved beforecreate_model_config()to properly detect speculators before creating the model configPretrainedConfig.get_config_dict()cannot load configs from S3 URLs (s3://...)Solution:
Skip speculator auto-detection for S3 models:
Trade-off: S3 models cannot use automatic speculator detection but can still use speculators via explicit
--speculative-configargument.3. Added Comprehensive Integration Test
Added
test_speculators_model_integration()intests/v1/e2e/test_spec_decode.pyto validate the simplified integration path:What it tests:
Test models:
nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717-quantizednm-testing/Speculator-Qwen3-8B-Eagle3-converted-071-quantizedCompatibility Matrix
--speculative-configTesting
Run the new test:
Run specific variant:
Files Changed
Related Issues
Fixes the issue where speculator integration tests weren't running in CI, preventing detection of breaking changes to the
vllm serve <speculator-model>integration.