-
Notifications
You must be signed in to change notification settings - Fork 168
Proposal for Multi-Cluster Inference Pooling #1374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Welcome @bexxmodd! |
Hi @bexxmodd. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @robscott |
/ok-to-test |
@bexxmodd can you please remove the .DS_Store files? |
|
||
## Implementation Details | ||
|
||
In the happy path, the only type of endpoint that a Gateway would need to know about is Endpoint Pickers. Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker. As long as an Endpoint Picker is available, there’s no need to actually propagate the model server endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker.
I don't really follow this. So the gateway sends to many EPPs and somehow it follows a single EPP to the model server endpoints. How does the gateway pick which EPP to follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will first decide the cluster to send a request to (likely based on some form of out-of-band load reporting/scraping of Endpoint Pickers), and then follow the recommendations of the Endpoint Picker in that cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the part I wonder if we need to spell out in this proposal as how does the "out-of-band" load report work? Or we want to leave it open? My biggest concern is that I don't seem to see anything in the inferencePoolmport that helps any of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the gateway pick which EPP to follow?
With an implementation-specific scheduler. After the IG selects an InferencePool, it asks the referenced EPP to select an endpoint from the pool and routes to the selected endpoint.
yes, that's the part I wonder if we need to spell out in this proposal as how does the "out-of-band" load report work? Or we want to leave it open?
Yes, this should remain implementation-specific. Please see the latest version of the proposal for examples.
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bexxmodd The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
} | ||
|
||
type ImportedCluster struct { | ||
// Name of the exporting cluster (must be unique within the list). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be great if we can tie this to https://multicluster.sigs.k8s.io/concepts/cluster-profile-api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @ryanzhang-oss. @robscott, this might be a reason for making the value of the export annotation implementation-specific. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar feedback as #1374 (comment). If an implementation chooses to use cluster-profile-api to populate the cluster list, can't the ClusterProfile namespace be inferred? If not, this could be a use case for adding an optional namespace field. WDYT @robscott?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danehans how would implementation-specific support of ClusterProfile work? Couldn't we standardize that? In any case, I think that we should have a single value that is broadly understood + portable, likely "All" or "ClusterSet".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott an implementation that used cluster-profile-api may iterate through instances of ClusterProfile to populate the InferencePoolImport cluster list. I mention the possibility of adding an optional namespace field to inferencepool.status.clusters[]
since ClusterProfile is a namespaced resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case would it be preferable to have a profile
struct inside inferencepool.status.clusters
? IE:
clusters:
- name: foo
profile:
name: bar
namespace: baz
If that's a path that we'd be happy with, I think I'd rather punt on including profile
in status until we have a clear e2e design for how it would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I take it back. The clusterProfile is mostly used in the hub cluster while inferencePoolImport can be put in any cluster. Thus, it is not a great fit.
xref notes from today's meeting to discuss the design proposal. |
// - Ready: at least one EPP/parent is ready. | ||
// | ||
// +kubebuilder:validation:Optional | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InferencePoolImport itself has no spec which means the "observedGeneration" in the conditions are not functional either. How can a user know the freshness of the conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the user needs to know the freshness, but controller will probably need to be able to check. Is there something (OG, resourceVersion, etc.) that's incremented on status changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, this may be a reason for using spec instead of purely status. I think it's important from a UX standpoint for the resource to contain at least 1 status condition that is set by the controller, e.g., Ready
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would fit inside spec? I expect this resource to be managed by a controller and primarily meant as a way to communicate status. I think the MVP for this resource should accomplish the following:
- Exist in a cluster so it can be targeted by Routes + used as a way to distinguish between targeting a cluster-local InferencePool and the entire set of connected InferencePools.
- Provide some basic list of conditions that can be used to communicate if there are any issues with the multi-cluster InferencePool. Given the multi-tenant nature of InferencePool (multiple Gateways/controllers can point to an InferencePool), we'll need to ensure that this status also supports multiple controllers collaborating. I'd recommend starting with only
Accepted
as the condition, mirroring other Gateway API resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would fit inside spec?
@robscott simply moving the cluster list from status to spec so observedGeneration in status conditions is functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually a bigger problem here as I think about it more. It's possible that there could be two distinct multi-cluster implementations of InferencePool operating on the same cluster.
So far, we've only required Gateway implementations to support InferencePools that are targeted by a Route attached to a Gateway supported by that controller. This GEP could have the impact of changing that scope to mean that each multi-cluster capable Gateway controller should watch all InferencePools and create InferencePoolImports/multi-cluster plumbing for all exported InferencePools, regardless of if a Gateway is pointing at that InferencePool.
Some potential solutions here:
- Require users to specify a controller/class name that should handle the multi-cluster portion of the InferencePool
- Require implementations to interoperate when multiple multi-cluster capable inference gateways are present on the same cluster
At first glance, 1) seems simplest but unfortunately rules out some legitimate use cases such as migrating between multiple clusters or combining different sets of partially overlapping clusters, ie a GKE Fleet and an Istio Mesh.
If we want to support 2), I think that would require the following:
A) Everything in InferencePoolImport must exist exclusively in nested status (similar to InferencePool)
B) All multi-cluster capable Gateway implementations must plumb support for all InferencePools that have been exported, even if they don't own a Gateway connected to the InferencePool (users may want to exclusively reference the InferencePoolImport)
C) InferencePoolImports should be upserted by Gateway implementations, adding an entry to status for their Gateway implementation
D) To handle garbage collection InferencePoolImports controllers should add owner references to their relevant GatewayClasses?
I don't love either option, interested in what others think here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott thanks for bringing this issue to our attention. I feel that 2B is a show stopper. Requiring an implementation to setup the multi-cluster infra for a non-owned exported pool feels controdictory to the Gateway API ownership model and the "minimize blast radius" principal.
At first glance, 1) seems simplest but unfortunately rules out some legitimate use cases such as migrating between multiple clusters...
This is similar to one of the use cases behind GatewayClass- allow users to migrate between implementations but in a multi-cluster context. To start off simple, we could add a GatewayClass/Gateway annotation that represents a class of InferencePools, where multi-cluster is the first use case. A user would apply this annoation to the multi-cluster Gatewayclass or Gateway and each cluster-local GatewayClass or Gateway. The multi-cluster controller would only export InferencePools that contain the export annotation and a parent with a matching InferencePool class annotation. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, just moving the cluster names to the state won't really solve the freshness problem.
Co-authored-by: Ryan Zhang <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Implement Sept 15 Meeting Feedback
Signed-off-by: Daneyon Hansen <[email protected]>
Resolves review feedback
Signed-off-by: Daneyon Hansen <[email protected]>
Adds TBD InferencePool status condition
// - Ready: at least one EPP/parent is ready. | ||
// | ||
// +kubebuilder:validation:Optional | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the user needs to know the freshness, but controller will probably need to be able to check. Is there something (OG, resourceVersion, etc.) that's incremented on status changes?
### InferencePoolImport Naming | ||
|
||
The exporting controller will create an InferencePoolImport resource using the exported InferencePool namespace and name. A cluster name entry in | ||
`inferencepoolimport.statu.clusters[]` is added for each cluster that exports an InferencePool with the same ns/name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that ONLY having cluster name in status is effectively requiring the gateway controller to read from remote API servers whether the gateway controller is operating in endpoint mode or parent mode. Are we sure we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on Monday's meeting, it was my understanding that we achieved consensus on starting minimal:
- A list of exporting clusters.
- Implementations are responsible for discovering the exported InferencePool in the cluster.
- InferencePoolImport namespace/name sameness to aid in discovering the exported InferencePool and simplified UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not saying it's a MUST for this initial implementation to support non global controllers. I'm just saying we should probably document the requirement for implementors
@nirrozenbaum @kfswain @robscott @bexxmodd @mikemorris @ryanzhang-oss @nirrozenbaum @keithmattix @srampal @elevran thank you for your involvement with this proposal. I have added a topic to tomorrow's community meeting to discuss this PR. The plan is to use the meeting to resolve the final details so we can get this PR through the finish line shortly after. |
1. **Export an InferencePool:** An [Inference Platform Owner](https://gateway-api-inference-extension.sigs.k8s.io/concepts/roles-and-personas/) | ||
exports an InferencePool by annotating it. | ||
2. **Exporting Controller:** | ||
- Watches exported InferencePool resources (must have access to the K8s API server). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if this controller needs to watch all the inferencePool resources, sig MC has a KEP to avoid storing secrete in the controller https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/5339-clusterprofile-plugin-credentials
- Most of the multi-cluster manager projects (Kubefleet/OCM etc) uses pull mode instead of push so they don't watch any of the resources in the working cluster
// - Ready: at least one EPP/parent is ready. | ||
// | ||
// +kubebuilder:validation:Optional | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, just moving the cluster names to the state won't really solve the freshness problem.
Initial design doc: https://docs.google.com/document/d/1QGvG9ToaJ72vlCBdJe--hmrmLtgOV_ptJi9D58QMD2w/edit?usp=sharing