Proposal for Multi-Cluster Inference Pooling #1374

bexxmodd · 2025-08-14T00:48:59Z

Initial design doc: https://docs.google.com/document/d/1QGvG9ToaJ72vlCBdJe--hmrmLtgOV_ptJi9D58QMD2w/edit?usp=sharing

netlify · 2025-08-14T00:49:05Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`98a0c2d`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68cae79d766ec30008d10184
😎 Deploy Preview	https://deploy-preview-1374--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-08-14T00:49:08Z

Welcome @bexxmodd!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-08-14T00:49:09Z

Hi @bexxmodd. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

bexxmodd · 2025-08-14T00:50:21Z

/cc @robscott

robscott · 2025-08-14T00:52:24Z

/ok-to-test

nirrozenbaum · 2025-08-14T05:54:15Z

@bexxmodd can you please remove the .DS_Store files?

bexxmodd · 2025-08-14T16:08:47Z

@bexxmodd can you please remove the .DS_Store files?

Removed.

Also, created PR to gitignore macOS generated files #1378

docs/proposals/1374-mc-inference-gateways/README.md

ryanzhang-oss · 2025-08-20T21:04:35Z

docs/proposals/1374-mc-inference-gateways/README.md

+
+## Implementation Details
+
+In the happy path, the only type of endpoint that a Gateway would need to know about is Endpoint Pickers. Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker. As long as an Endpoint Picker is available, there’s no need to actually propagate the model server endpoints.


Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker.

I don't really follow this. So the gateway sends to many EPPs and somehow it follows a single EPP to the model server endpoints. How does the gateway pick which EPP to follow?

It will first decide the cluster to send a request to (likely based on some form of out-of-band load reporting/scraping of Endpoint Pickers), and then follow the recommendations of the Endpoint Picker in that cluster.

yes, that's the part I wonder if we need to spell out in this proposal as how does the "out-of-band" load report work? Or we want to leave it open? My biggest concern is that I don't seem to see anything in the inferencePoolmport that helps any of that.

How does the gateway pick which EPP to follow?

With an implementation-specific scheduler. After the IG selects an InferencePool, it asks the referenced EPP to select an endpoint from the pool and routes to the selected endpoint.

yes, that's the part I wonder if we need to spell out in this proposal as how does the "out-of-band" load report work? Or we want to leave it open?

Yes, this should remain implementation-specific. Please see the latest version of the proposal for examples.

docs/proposals/1374-mc-inference-gateways/README.md

Signed-off-by: Daneyon Hansen <[email protected]>

k8s-ci-robot · 2025-09-11T14:30:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bexxmodd
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

docs/proposals/1374-multi-cluster-inference/README.md

ryanzhang-oss · 2025-09-11T20:26:02Z

docs/proposals/1374-multi-cluster-inference/README.md

+}
+
+type ImportedCluster struct {
+    // Name of the exporting cluster (must be unique within the list).


it will be great if we can tie this to https://multicluster.sigs.k8s.io/concepts/cluster-profile-api.

Thanks for the feedback @ryanzhang-oss. @robscott, this might be a reason for making the value of the export annotation implementation-specific. WDYT?

Similar feedback as #1374 (comment). If an implementation chooses to use cluster-profile-api to populate the cluster list, can't the ClusterProfile namespace be inferred? If not, this could be a use case for adding an optional namespace field. WDYT @robscott?

@danehans how would implementation-specific support of ClusterProfile work? Couldn't we standardize that? In any case, I think that we should have a single value that is broadly understood + portable, likely "All" or "ClusterSet".

@robscott an implementation that used cluster-profile-api may iterate through instances of ClusterProfile to populate the InferencePoolImport cluster list. I mention the possibility of adding an optional namespace field to inferencepool.status.clusters[] since ClusterProfile is a namespaced resource.

In that case would it be preferable to have a profile struct inside inferencepool.status.clusters? IE:

clusters: - name: foo profile: name: bar namespace: baz

If that's a path that we'd be happy with, I think I'd rather punt on including profile in status until we have a clear e2e design for how it would work.

actually I take it back. The clusterProfile is mostly used in the hub cluster while inferencePoolImport can be put in any cluster. Thus, it is not a great fit.

docs/proposals/1374-multi-cluster-inference/README.md

danehans · 2025-09-15T18:04:22Z

xref notes from today's meeting to discuss the design proposal.

docs/proposals/1374-multi-cluster-inference/README.md

ryanzhang-oss · 2025-09-15T21:31:23Z

docs/proposals/1374-multi-cluster-inference/README.md

+    // - Ready: at least one EPP/parent is ready.
+    //
+    // +kubebuilder:validation:Optional
+    Conditions []metav1.Condition `json:"conditions,omitempty"`


The InferencePoolImport itself has no spec which means the "observedGeneration" in the conditions are not functional either. How can a user know the freshness of the conditions?

I don't think the user needs to know the freshness, but controller will probably need to be able to check. Is there something (OG, resourceVersion, etc.) that's incremented on status changes?

IMHO, this may be a reason for using spec instead of purely status. I think it's important from a UX standpoint for the resource to contain at least 1 status condition that is set by the controller, e.g., Ready.

What would fit inside spec? I expect this resource to be managed by a controller and primarily meant as a way to communicate status. I think the MVP for this resource should accomplish the following:

Exist in a cluster so it can be targeted by Routes + used as a way to distinguish between targeting a cluster-local InferencePool and the entire set of connected InferencePools.

Provide some basic list of conditions that can be used to communicate if there are any issues with the multi-cluster InferencePool. Given the multi-tenant nature of InferencePool (multiple Gateways/controllers can point to an InferencePool), we'll need to ensure that this status also supports multiple controllers collaborating. I'd recommend starting with only Accepted as the condition, mirroring other Gateway API resources.

What would fit inside spec?

@robscott simply moving the cluster list from status to spec so observedGeneration in status conditions is functional.

There's actually a bigger problem here as I think about it more. It's possible that there could be two distinct multi-cluster implementations of InferencePool operating on the same cluster.

So far, we've only required Gateway implementations to support InferencePools that are targeted by a Route attached to a Gateway supported by that controller. This GEP could have the impact of changing that scope to mean that each multi-cluster capable Gateway controller should watch all InferencePools and create InferencePoolImports/multi-cluster plumbing for all exported InferencePools, regardless of if a Gateway is pointing at that InferencePool.

Some potential solutions here:

Require users to specify a controller/class name that should handle the multi-cluster portion of the InferencePool

Require implementations to interoperate when multiple multi-cluster capable inference gateways are present on the same cluster

At first glance, 1) seems simplest but unfortunately rules out some legitimate use cases such as migrating between multiple clusters or combining different sets of partially overlapping clusters, ie a GKE Fleet and an Istio Mesh.

If we want to support 2), I think that would require the following:

A) Everything in InferencePoolImport must exist exclusively in nested status (similar to InferencePool)
B) All multi-cluster capable Gateway implementations must plumb support for all InferencePools that have been exported, even if they don't own a Gateway connected to the InferencePool (users may want to exclusively reference the InferencePoolImport)
C) InferencePoolImports should be upserted by Gateway implementations, adding an entry to status for their Gateway implementation
D) To handle garbage collection InferencePoolImports controllers should add owner references to their relevant GatewayClasses?

I don't love either option, interested in what others think here.

cc @bexxmodd @danehans @howardjohn @keithmattix

@robscott thanks for bringing this issue to our attention. I feel that 2B is a show stopper. Requiring an implementation to setup the multi-cluster infra for a non-owned exported pool feels controdictory to the Gateway API ownership model and the "minimize blast radius" principal.

At first glance, 1) seems simplest but unfortunately rules out some legitimate use cases such as migrating between multiple clusters...

This is similar to one of the use cases behind GatewayClass- allow users to migrate between implementations but in a multi-cluster context. To start off simple, we could add a GatewayClass/Gateway annotation that represents a class of InferencePools, where multi-cluster is the first use case. A user would apply this annoation to the multi-cluster Gatewayclass or Gateway and each cluster-local GatewayClass or Gateway. The multi-cluster controller would only export InferencePools that contain the export annotation and a parent with a matching InferencePool class annotation. Thoughts?

btw, just moving the cluster names to the state won't really solve the freshness problem.

Co-authored-by: Ryan Zhang <[email protected]>

Signed-off-by: Daneyon Hansen <[email protected]>

Implement Sept 15 Meeting Feedback

Signed-off-by: Daneyon Hansen <[email protected]>

Resolves review feedback

Signed-off-by: Daneyon Hansen <[email protected]>

docs/proposals/1374-multi-cluster-inference/README.md

Adds TBD InferencePool status condition

docs/proposals/1374-multi-cluster-inference/README.md

keithmattix · 2025-09-17T18:28:03Z

docs/proposals/1374-multi-cluster-inference/README.md

+    // - Ready: at least one EPP/parent is ready.
+    //
+    // +kubebuilder:validation:Optional
+    Conditions []metav1.Condition `json:"conditions,omitempty"`


I don't think the user needs to know the freshness, but controller will probably need to be able to check. Is there something (OG, resourceVersion, etc.) that's incremented on status changes?

keithmattix · 2025-09-17T18:29:46Z

docs/proposals/1374-multi-cluster-inference/README.md

+### InferencePoolImport Naming
+
+The exporting controller will create an InferencePoolImport resource using the exported InferencePool namespace and name. A cluster name entry in
+`inferencepoolimport.statu.clusters[]` is added for each cluster that exports an InferencePool with the same ns/name.


I'm worried that ONLY having cluster name in status is effectively requiring the gateway controller to read from remote API servers whether the gateway controller is operating in endpoint mode or parent mode. Are we sure we want that?

Based on Monday's meeting, it was my understanding that we achieved consensus on starting minimal:

A list of exporting clusters.

Implementations are responsible for discovering the exported InferencePool in the cluster.

InferencePoolImport namespace/name sameness to aid in discovering the exported InferencePool and simplified UX.

Yeah I'm not saying it's a MUST for this initial implementation to support non global controllers. I'm just saying we should probably document the requirement for implementors

danehans · 2025-09-17T22:28:31Z

@nirrozenbaum @kfswain @robscott @bexxmodd @mikemorris @ryanzhang-oss @nirrozenbaum @keithmattix @srampal @elevran thank you for your involvement with this proposal. I have added a topic to tomorrow's community meeting to discuss this PR. The plan is to use the meeting to resolve the final details so we can get this PR through the finish line shortly after.

ryanzhang-oss · 2025-09-18T17:38:38Z

docs/proposals/1374-multi-cluster-inference/README.md

+1. **Export an InferencePool:** An [Inference Platform Owner](https://gateway-api-inference-extension.sigs.k8s.io/concepts/roles-and-personas/)
+   exports an InferencePool by annotating it.
+2. **Exporting Controller:**
+   - Watches exported InferencePool resources (must have access to the K8s API server).


if this controller needs to watch all the inferencePool resources, sig MC has a KEP to avoid storing secrete in the controller https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/5339-clusterprofile-plugin-credentials

Most of the multi-cluster manager projects (Kubefleet/OCM etc) uses pull mode instead of push so they don't watch any of the resources in the working cluster

ryanzhang-oss · 2025-09-18T17:42:46Z

docs/proposals/1374-multi-cluster-inference/README.md

+    // - Ready: at least one EPP/parent is ready.
+    //
+    // +kubebuilder:validation:Optional
+    Conditions []metav1.Condition `json:"conditions,omitempty"`


btw, just moving the cluster names to the state won't really solve the freshness problem.

bexxmodd added 2 commits August 13, 2025 17:42

Proposal for mc inference gateway.

7fdb6ee

Formatting updates.

fd0e5ee

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 14, 2025

k8s-ci-robot requested review from kfswain and nirrozenbaum August 14, 2025 00:49

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 14, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 14, 2025

Give PR number to the proposal.

aa1eff2

k8s-ci-robot requested a review from robscott August 14, 2025 00:50

Adding author(s)

1b442c9

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 14, 2025

Removed ds_store files.

2fe03ac

Removed ds_store file

d2e274f

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans reviewed Aug 14, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

ryanzhang-oss reviewed Aug 20, 2025

View reviewed changes

keithmattix reviewed Aug 21, 2025

View reviewed changes

docs/proposals/1374-mc-inference-gateways/README.md Outdated Show resolved Hide resolved

danehans added 3 commits September 9, 2025 14:26

Add details to MCI proposal

4d38ca6

Signed-off-by: Daneyon Hansen <[email protected]>

Makes EPP Port Singular

27c6c38

Signed-off-by: Daneyon Hansen <[email protected]>

Removes Routing Mode Config and Resolved Open Questions

2d41e82

Signed-off-by: Daneyon Hansen <[email protected]>

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 11, 2025

keithmattix reviewed Sep 11, 2025

View reviewed changes

ryanzhang-oss reviewed Sep 11, 2025

View reviewed changes

danehans reviewed Sep 15, 2025

View reviewed changes

docs/proposals/1374-multi-cluster-inference/README.md Outdated Show resolved Hide resolved

srampal reviewed Sep 15, 2025

View reviewed changes

docs/proposals/1374-multi-cluster-inference/README.md Show resolved Hide resolved

ryanzhang-oss reviewed Sep 15, 2025

View reviewed changes

danehans mentioned this pull request Sep 15, 2025

Inference: Add Multi-Cluster Inference Support kgateway-dev/kgateway#12282

Open

2 tasks

bexxmodd and others added 3 commits September 16, 2025 07:32

Update docs/proposals/1374-multi-cluster-inference/README.md

614c3fe

Co-authored-by: Ryan Zhang <[email protected]>

Implement Sept 15 Meeting Feedback

551d978

Signed-off-by: Daneyon Hansen <[email protected]>

Merge pull request #2 from danehans/mc-inference-gw-danehans

ff2ea04

Implement Sept 15 Meeting Feedback

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 16, 2025

danehans and others added 3 commits September 16, 2025 14:10

Resolves review feedback

a266fec

Signed-off-by: Daneyon Hansen <[email protected]>

Merge pull request #3 from danehans/mc-inference-gw-danehans

9c167cd

Resolves review feedback

Adds TBD InferencePool status condition

165dc47

Signed-off-by: Daneyon Hansen <[email protected]>

keithmattix reviewed Sep 17, 2025

View reviewed changes

docs/proposals/1374-multi-cluster-inference/README.md Show resolved Hide resolved

docs/proposals/1374-multi-cluster-inference/README.md Show resolved Hide resolved

Merge pull request #4 from danehans/mc-inference-gw-danehans

98a0c2d

Adds TBD InferencePool status condition

keithmattix reviewed Sep 17, 2025

View reviewed changes

ryanzhang-oss reviewed Sep 18, 2025

View reviewed changes

bexxmodd changed the title ~~Proposal for Multi-Cluster Inference Gateways~~ Proposal for Multi-Cluster Inference Pooling Sep 18, 2025


		## Implementation Details

		In the happy path, the only type of endpoint that a Gateway would need to know about is Endpoint Pickers. Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker. As long as an Endpoint Picker is available, there’s no need to actually propagate the model server endpoints.

Proposal for Multi-Cluster Inference Pooling #1374

Are you sure you want to change the base?

Proposal for Multi-Cluster Inference Pooling #1374

Conversation

bexxmodd commented Aug 14, 2025 • edited by danehans Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Aug 14, 2025

Uh oh!

k8s-ci-robot commented Aug 14, 2025

Uh oh!

bexxmodd commented Aug 14, 2025

Uh oh!

robscott commented Aug 14, 2025

Uh oh!

nirrozenbaum commented Aug 14, 2025

Uh oh!

bexxmodd commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

k8s-ci-robot commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanzhang-oss Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bexxmodd commented Aug 14, 2025 •

edited by danehans

Loading

netlify bot commented Aug 14, 2025 •

edited

Loading

bexxmodd commented Aug 14, 2025 •

edited

Loading

ryanzhang-oss Sep 18, 2025 •

edited

Loading