-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[KV Connector] Make KVCacheConfig an explicit constructor argument #27887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors KV connector instantiation to explicitly pass KVCacheConfig, which is a good improvement for separating runtime state from global configuration. The implementation of backward compatibility for external connectors is also well-handled in the factory. However, I've identified a critical issue in MultiConnector where it incorrectly instantiates its sub-connectors, which breaks backward compatibility and fails to pass the kv_cache_config. I've also found a minor issue in a test utility. Please see my comments for details and suggestions.
|
See also #25712 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
xref #27811 |
4513b7c to
15ec91b
Compare
Follow on from vllm-project#25712 `VllmConfig` is explicitly designed as a dataclass containing user-provided configuration and model metadata. It is a global configuration object that lives throughout the entire engine lifetime and is meant to be immutable after `__post_init__()`. `KVCacheConfig` is worker-specific, runtime-computed state. It has limited lifetime, and its purpose is limited to initializing the KV Cache in the model runner. Even if we add KV cache hints to model config.json in future, this would be parsed into `ModelConfig`, used as input to the `get_kv_cache_configs()` computation, and the resulting `KVCacheConfig` would still be runtime state. We are currently creating per-worker copies of VllmConfig in order to attach the runtime `KVCacheConfig` state. But instead we should just explicitly pass `KVCacheConfig` to the connector. Make sure to handle backwards compatibility for external connector implementations (loaded via module path) that have the old style constructor signature. Signed-off-by: Mark McLoughlin <[email protected]>
15ec91b to
fdc30ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM. Some nits listed in comments.
| f"Class {connector_name} not found in {connector_module_path}" | ||
| ) from e | ||
| connector_cls = cast(type[KVConnectorBaseType], connector_cls) | ||
| if not supports_kw(connector_cls, "kv_cache_config"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: this means that we allow connector to include kv_cache_config field as init args even when it does not support hybrid allocator (not a subclass of SupportsHMA)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Since we we are currently unconditionally attaching KVCacheConfig to VllmConfig, I didn't even consider making it conditional until now
If we think that KVCacheConfig will only be used as part of implementing request_finished_all_groups() we could add init_hma() or init_kv_cache_config() to SupportsHMA ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KVCacheConfig takes effect on almost all connector methods when HMA is enabled and it is not useful at all when HMA is disabled.
That said, the current implementation looks good to me as it is simple enough.
| @@ -0,0 +1,275 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this file? Like test_connector_init_with_kv_cache_config or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously I don't really mind renaming it, if it helps, but my thinking is that these tests are about testing support for connectors that have not yet been updated to the new signature so it's more like "without_kv_cache_config()"
Basically, because all connectors are expected to support the new signature, we'll soon see these tests as old cruft that we need to keep around for a while
(This is different from the SupportsHMA approach - in that case, maybe only a small subset of connectors would be updated to take KVCacheConfig)
|
Just capturing some of the points from the Slack discussion, since I think it was very useful 👍
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, let's keep compat code under control though
| self, | ||
| vllm_config: "VllmConfig", | ||
| role: KVConnectorRole, | ||
| kv_cache_config: "KVCacheConfig", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KuntaiDu do you need to change LMCache connector? kv_cache_config is not available in vllm config now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to change LMCache PR correspondingly, but we can merge this PR first and I can change it in LMCache afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Leave this PR to @KuntaiDu for compatibility with LMCache + HMA
Thanks for reminder! I will fix LMCache compatibility after this PR is merged (still a lot of work on LMCache side before turning on HMA by default for LMCache). |
…llm-project#27887) Signed-off-by: Mark McLoughlin <[email protected]>
Follow on from #25712
VllmConfigis explicitly designed as a dataclass containing user-provided configuration and model metadata. It is a global configuration object that lives throughout the entire engine lifetime and is meant to be immutable after__post_init__().KVCacheConfigis worker-specific, runtime-computed state. It has limited lifetime, and its purpose is limited to initializing the KV Cache in the model runner.Even if we add KV cache hints to model config.json in future, this would be parsed into
ModelConfig, used as input to theget_kv_cache_configs()computation, and the resultingKVCacheConfigwould still be runtime state.We are currently creating per-worker copies of VllmConfig in order to attach the runtime
KVCacheConfigstate. But instead we should just explicitly passKVCacheConfigto the connector.Make sure to handle backwards compatibility for external connector implementations (loaded via module path) that have the old style constructor signature.