Skip to content

Conversation

bexxmodd
Copy link
Contributor

@bexxmodd bexxmodd commented Aug 14, 2025

Copy link

netlify bot commented Aug 14, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 98a0c2d
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68cae79d766ec30008d10184
😎 Deploy Preview https://deploy-preview-1374--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 14, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @bexxmodd!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 14, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @bexxmodd. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 14, 2025
@bexxmodd
Copy link
Contributor Author

/cc @robscott

@k8s-ci-robot k8s-ci-robot requested a review from robscott August 14, 2025 00:50
@robscott
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 14, 2025
@nirrozenbaum
Copy link
Contributor

@bexxmodd can you please remove the .DS_Store files?

@bexxmodd
Copy link
Contributor Author

bexxmodd commented Aug 14, 2025

@bexxmodd can you please remove the .DS_Store files?

Removed.

Also, created PR to gitignore macOS generated files #1378


## Implementation Details

In the happy path, the only type of endpoint that a Gateway would need to know about is Endpoint Pickers. Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker. As long as an Endpoint Picker is available, there’s no need to actually propagate the model server endpoints.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, each Gateway will be sending requests to Endpoint Pickers, and then following the directions of that Endpoint Picker.

I don't really follow this. So the gateway sends to many EPPs and somehow it follows a single EPP to the model server endpoints. How does the gateway pick which EPP to follow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will first decide the cluster to send a request to (likely based on some form of out-of-band load reporting/scraping of Endpoint Pickers), and then follow the recommendations of the Endpoint Picker in that cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the part I wonder if we need to spell out in this proposal as how does the "out-of-band" load report work? Or we want to leave it open? My biggest concern is that I don't seem to see anything in the inferencePoolmport that helps any of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the gateway pick which EPP to follow?

With an implementation-specific scheduler. After the IG selects an InferencePool, it asks the referenced EPP to select an endpoint from the pool and routes to the selected endpoint.

yes, that's the part I wonder if we need to spell out in this proposal as how does the "out-of-band" load report work? Or we want to leave it open?

Yes, this should remain implementation-specific. Please see the latest version of the proposal for examples.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bexxmodd
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 11, 2025
}

type ImportedCluster struct {
// Name of the exporting cluster (must be unique within the list).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be great if we can tie this to https://multicluster.sigs.k8s.io/concepts/cluster-profile-api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @ryanzhang-oss. @robscott, this might be a reason for making the value of the export annotation implementation-specific. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feedback as #1374 (comment). If an implementation chooses to use cluster-profile-api to populate the cluster list, can't the ClusterProfile namespace be inferred? If not, this could be a use case for adding an optional namespace field. WDYT @robscott?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danehans how would implementation-specific support of ClusterProfile work? Couldn't we standardize that? In any case, I think that we should have a single value that is broadly understood + portable, likely "All" or "ClusterSet".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott an implementation that used cluster-profile-api may iterate through instances of ClusterProfile to populate the InferencePoolImport cluster list. I mention the possibility of adding an optional namespace field to inferencepool.status.clusters[] since ClusterProfile is a namespaced resource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case would it be preferable to have a profile struct inside inferencepool.status.clusters? IE:

clusters:
- name: foo
  profile:
     name: bar
     namespace: baz

If that's a path that we'd be happy with, I think I'd rather punt on including profile in status until we have a clear e2e design for how it would work.

Copy link

@ryanzhang-oss ryanzhang-oss Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I take it back. The clusterProfile is mostly used in the hub cluster while inferencePoolImport can be put in any cluster. Thus, it is not a great fit.

@danehans
Copy link
Contributor

xref notes from today's meeting to discuss the design proposal.

// - Ready: at least one EPP/parent is ready.
//
// +kubebuilder:validation:Optional
Conditions []metav1.Condition `json:"conditions,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InferencePoolImport itself has no spec which means the "observedGeneration" in the conditions are not functional either. How can a user know the freshness of the conditions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user needs to know the freshness, but controller will probably need to be able to check. Is there something (OG, resourceVersion, etc.) that's incremented on status changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this may be a reason for using spec instead of purely status. I think it's important from a UX standpoint for the resource to contain at least 1 status condition that is set by the controller, e.g., Ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would fit inside spec? I expect this resource to be managed by a controller and primarily meant as a way to communicate status. I think the MVP for this resource should accomplish the following:

  1. Exist in a cluster so it can be targeted by Routes + used as a way to distinguish between targeting a cluster-local InferencePool and the entire set of connected InferencePools.
  2. Provide some basic list of conditions that can be used to communicate if there are any issues with the multi-cluster InferencePool. Given the multi-tenant nature of InferencePool (multiple Gateways/controllers can point to an InferencePool), we'll need to ensure that this status also supports multiple controllers collaborating. I'd recommend starting with only Accepted as the condition, mirroring other Gateway API resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would fit inside spec?

@robscott simply moving the cluster list from status to spec so observedGeneration in status conditions is functional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a bigger problem here as I think about it more. It's possible that there could be two distinct multi-cluster implementations of InferencePool operating on the same cluster.

So far, we've only required Gateway implementations to support InferencePools that are targeted by a Route attached to a Gateway supported by that controller. This GEP could have the impact of changing that scope to mean that each multi-cluster capable Gateway controller should watch all InferencePools and create InferencePoolImports/multi-cluster plumbing for all exported InferencePools, regardless of if a Gateway is pointing at that InferencePool.

Some potential solutions here:

  1. Require users to specify a controller/class name that should handle the multi-cluster portion of the InferencePool
  2. Require implementations to interoperate when multiple multi-cluster capable inference gateways are present on the same cluster

At first glance, 1) seems simplest but unfortunately rules out some legitimate use cases such as migrating between multiple clusters or combining different sets of partially overlapping clusters, ie a GKE Fleet and an Istio Mesh.

If we want to support 2), I think that would require the following:

A) Everything in InferencePoolImport must exist exclusively in nested status (similar to InferencePool)
B) All multi-cluster capable Gateway implementations must plumb support for all InferencePools that have been exported, even if they don't own a Gateway connected to the InferencePool (users may want to exclusively reference the InferencePoolImport)
C) InferencePoolImports should be upserted by Gateway implementations, adding an entry to status for their Gateway implementation
D) To handle garbage collection InferencePoolImports controllers should add owner references to their relevant GatewayClasses?

I don't love either option, interested in what others think here.

cc @bexxmodd @danehans @howardjohn @keithmattix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott thanks for bringing this issue to our attention. I feel that 2B is a show stopper. Requiring an implementation to setup the multi-cluster infra for a non-owned exported pool feels controdictory to the Gateway API ownership model and the "minimize blast radius" principal.

At first glance, 1) seems simplest but unfortunately rules out some legitimate use cases such as migrating between multiple clusters...

This is similar to one of the use cases behind GatewayClass- allow users to migrate between implementations but in a multi-cluster context. To start off simple, we could add a GatewayClass/Gateway annotation that represents a class of InferencePools, where multi-cluster is the first use case. A user would apply this annoation to the multi-cluster Gatewayclass or Gateway and each cluster-local GatewayClass or Gateway. The multi-cluster controller would only export InferencePools that contain the export annotation and a parent with a matching InferencePool class annotation. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, just moving the cluster names to the state won't really solve the freshness problem.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 16, 2025
Adds TBD InferencePool status condition
// - Ready: at least one EPP/parent is ready.
//
// +kubebuilder:validation:Optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user needs to know the freshness, but controller will probably need to be able to check. Is there something (OG, resourceVersion, etc.) that's incremented on status changes?

### InferencePoolImport Naming

The exporting controller will create an InferencePoolImport resource using the exported InferencePool namespace and name. A cluster name entry in
`inferencepoolimport.statu.clusters[]` is added for each cluster that exports an InferencePool with the same ns/name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that ONLY having cluster name in status is effectively requiring the gateway controller to read from remote API servers whether the gateway controller is operating in endpoint mode or parent mode. Are we sure we want that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Monday's meeting, it was my understanding that we achieved consensus on starting minimal:

  1. A list of exporting clusters.
  2. Implementations are responsible for discovering the exported InferencePool in the cluster.
  3. InferencePoolImport namespace/name sameness to aid in discovering the exported InferencePool and simplified UX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not saying it's a MUST for this initial implementation to support non global controllers. I'm just saying we should probably document the requirement for implementors

@danehans
Copy link
Contributor

@nirrozenbaum @kfswain @robscott @bexxmodd @mikemorris @ryanzhang-oss @nirrozenbaum @keithmattix @srampal @elevran thank you for your involvement with this proposal. I have added a topic to tomorrow's community meeting to discuss this PR. The plan is to use the meeting to resolve the final details so we can get this PR through the finish line shortly after.

1. **Export an InferencePool:** An [Inference Platform Owner](https://gateway-api-inference-extension.sigs.k8s.io/concepts/roles-and-personas/)
exports an InferencePool by annotating it.
2. **Exporting Controller:**
- Watches exported InferencePool resources (must have access to the K8s API server).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if this controller needs to watch all the inferencePool resources, sig MC has a KEP to avoid storing secrete in the controller https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/5339-clusterprofile-plugin-credentials
  2. Most of the multi-cluster manager projects (Kubefleet/OCM etc) uses pull mode instead of push so they don't watch any of the resources in the working cluster

// - Ready: at least one EPP/parent is ready.
//
// +kubebuilder:validation:Optional
Conditions []metav1.Condition `json:"conditions,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, just moving the cluster names to the state won't really solve the freshness problem.

@bexxmodd bexxmodd changed the title Proposal for Multi-Cluster Inference Gateways Proposal for Multi-Cluster Inference Pooling Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants