Skip to content

Conversation

zetxqx
Copy link
Contributor

@zetxqx zetxqx commented Sep 16, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

Following https://cloud.google.com/stackdriver/docs/managed-prometheus/exporters/inference-optimized-gateway,

Add helm conditional GKE Monitoring: All GKE-specific monitoring resources (ClusterPodMonitoring, ServiceAccount, Secret, and associated RBAC rules) are added and wrapped in a conditional block. They will only be deployed if inferenceExtension.monitoring.gke.enabled is set to true in values.yaml. This prevents the creation of unnecessary resources when GKE monitoring is not required.

Tested by using the following command

export NAMESPACE=inference-demo
export HELM_RELEASE_NAME=infpool-gemma-2b

❯ helm upgrade -i $HELM_RELEASE_NAME \
  config/charts/inferencepool \
  -n $NAMESPACE \
  --create-namespace \
  --set inferencePool.modelServers.matchLabels.app=vllm-gemma2b \
  --set provider.name=gke \
  --set inferenceExtension.monitoring.gke.enabled=true
  
❯ helm status infpool-gemma-2b --show-resources
NAME: infpool-gemma-2b
LAST DEPLOYED: Tue Sep 16 22:07:24 2025
NAMESPACE: inference-demo
STATUS: deployed
REVISION: 12
RESOURCES:
==> v1/InferencePool
NAME               AGE
infpool-gemma-2b   3h21m

==> v1/PodMonitoring
infpool-gemma-2b   3h21m

==> v1/ServiceAccount
NAME                                 SECRETS   AGE
infpool-gemma-2b-metrics-reader-sa   0         3h21m
infpool-gemma-2b-epp   0     3h21m

==> v1/Secret
NAME                                     TYPE                                  DATA   AGE
infpool-gemma-2b-metrics-reader-secret   kubernetes.io/service-account-token   3      3h21m

==> v1/ConfigMap
NAME                   DATA   AGE
infpool-gemma-2b-epp   1      3h21m

==> v1/ClusterRole
NAME                                             CREATED AT
inference-demo-infpool-gemma-2b-metrics-reader   2025-09-16T22:07:26Z
infpool-gemma-2b-inference-demo-epp   2025-09-16T18:47:18Z

==> v1/ClusterRoleBinding
NAME                                                          ROLE                                                         AGE
inference-demo-infpool-gemma-2b-metrics-reader-role-binding   ClusterRole/inference-demo-infpool-gemma-2b-metrics-reader   74s
infpool-gemma-2b-inference-demo-epp   ClusterRole/infpool-gemma-2b-inference-demo-epp   3h21m

==> v1/RoleBinding
NAME                                                                              ROLE                                               AGE
gmp-system:collector:inference-demo-infpool-gemma-2b-metrics-reader-secret-read   Role/infpool-gemma-2b-metrics-reader-secret-read   3h21m
infpool-gemma-2b-epp   Role/infpool-gemma-2b-epp   3h21m

==> v1/Service
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
infpool-gemma-2b-epp   ClusterIP   34.118.236.102   <none>        9002/TCP,9090/TCP   3h21m

==> v1/Deployment
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
infpool-gemma-2b-epp   1/1     1            1           3h21m

==> v1/Role
NAME                                          CREATED AT
infpool-gemma-2b-metrics-reader-secret-read   2025-09-16T18:47:19Z
infpool-gemma-2b-epp   2025-09-16T18:47:19Z

==> v1/Pod(related)
NAME                                    READY   STATUS    RESTARTS   AGE
infpool-gemma-2b-epp-868c7675c6-rbvw9   1/1     Running   0          3h21m

==> v1/HealthCheckPolicy
NAME               AGE
infpool-gemma-2b   3h21m


TEST SUITE: None
NOTES:
InferencePool infpool-gemma-2b deployed.
  

Which issue(s) this PR fixes:

Fixes #1452

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 16, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 16, 2025
Copy link

netlify bot commented Sep 16, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 39a943b
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68ccb0e217aa5e0008df9ea5
😎 Deploy Preview https://deploy-preview-1600--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@zetxqx
Copy link
Contributor Author

zetxqx commented Sep 16, 2025

/assign @JeffLuoo @liu-cong @ahg-g

Configured the helm chart for gke monitoring, could you take a look?

name: {{ $saName }}
namespace: {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
Copy link
Contributor

@liu-cong liu-cong Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not make new ClusterRoles, #1393 is asking converting existing Cluster RBAC to namespace scoped.

Copy link
Contributor

@JeffLuoo JeffLuoo Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a namespaced scope rbac, make sure all namespace align: epp, secret, and the namespace in cluster pod monitoring.

Also, since we are using a namespace scoped objects, we can consider using the podMonitoring instead a cluster pod monitoring, where the podMonitoring is also namespace scoped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffLuoo does PodMonitoring exist outside of GKE?
I think it would be good if we can find a namespace scoped solution, but one that works for all deployment options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodMonitoring is available by default on GKE. But people can install it on any K8s distribution by deploying it manually using https://github.com/GoogleCloudPlatform/prometheus-engine.

@sallyom adds an option using the prometheus-operator: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1425/files and I believe it is namespace scoped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or did you mean PodMonitor from Prometheus operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new commit(2d9a5e5) to make most of the Cluster scoped resource to namespaced. However, the following two resources is kept to make the gmp scraping metrics

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: {{ $roleName }}
rules:
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: {{ $roleBindingName }}
subjects:
- kind: ServiceAccount
  name: {{ $saName }}
  namespace: {{ .Release.Namespace }}
roleRef:
  kind: ClusterRole
  name: {{ $roleName }}
  apiGroup: rbac.authorization.k8s.io
  1. if I change ClusterRole to Role, it does not let me set a nonResourecesURls
Error: UPGRADE FAILED: failed to create resource: Role.rbac.authorization.k8s.io "infpool-gemma-2b-metrics-reader" is invalid: rules[0].nonResourceURLs: Invalid value: []string{"/metrics"}: namespaced rules cannot apply to non-resource URLs
  1. if I remove those two completely, the GMP just cannot scraping the metrics because of permission issue. I thought the following Podmonitoring and secret-read role binding should work but it didn't. Is this expected? @JeffLuoo
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
 name: {{ .Release.Name }}
 namespace: {{ .Release.Namespace }}
 labels:
   {{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
spec:
 endpoints:
 - port: metrics
   scheme: http
   interval: {{ .Values.inferenceExtension.monitoring.interval }}
   path: /metrics
   authorization:
     type: Bearer
     credentials:
       secret:
         name: {{ $secretName }}
         key: token
 selector:
   matchLabels:
     {{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The nonResourceURLs for metrics URL is required. And the nonResourceURLs is cluster scoped: https://github.com/kubernetes/kubernetes/blob/f42b497cf25548aa0f327c675e11c57240bfab4b/staging/src/k8s.io/api/rbac/v1/types.go#L68-L69. Can you try keep the two roles you removed and see if PodMonitoring would work then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which two roles are you referring to? the current commit is a working version. just having the ClusterRole and ClusterRoleBinding for metrics read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted a summary of this in #1393 (comment). TLDR is that if Cluster RBAC is unavoidable, we can use uniquely named Cluster RBAC names to avoid collision.

Copy link
Contributor Author

@zetxqx zetxqx Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, thanks, amend the commit, now I only leave the ClusterRole and ClusterRoleBinding for GKE metrics read. And changed the name to include the namespace. Updated the PR description to reflect what we have now in the helm.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zetxqx
Once this PR has been reviewed and has the lgtm label, please ask for approval from ahg-g. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zetxqx
Copy link
Contributor Author

zetxqx commented Sep 16, 2025

For review, you can check the PR description to see the results from helm status to see if the helm installed resources is as expected: #1600 (comment)

@zetxqx zetxqx mentioned this pull request Sep 18, 2025
@zetxqx
Copy link
Contributor Author

zetxqx commented Sep 18, 2025

@liu-cong @JeffLuoo can you take another look again? we'll have another patch release #1616 . It will be great if this one can catch it.

Copy link
Contributor

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits, otherwise lgtm

gke:
enabled: false
# Set to true if the cluster is an Autopilot cluster.
autopilot: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be future proof, let's use provider.gke.autopilot, in case we need to parameterize other stuff for autopilot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving some thought, provider field in values.yaml only have a name field. So there are a few options:

Option 1: Keep As Is (Nested under monitoring)

This is the current approach, where the GKE-specific setting is nested directly under the feature it affects.

values.yaml Snippet
# ...
monitoring:
  interval: "10s"
  # ...
  gke:
    enabled: false
    # Set to true if the cluster is an Autopilot cluster.
    autopilot: false
# ...

Option 2: Centralized Provider with the current name field

Given there is a name field under provider.

values.yaml Snippet
provider:
  # The name of the provider. Supported values: "gke", "none".
  name: gke

  # GKE-specific configuration.
  # This block is only used if name is "gke".
  gke:
    # Set to true if the cluster is an Autopilot cluster.
    autopilot: false

Option 3: Exclusive Provider Block

but this maynot be backward compatible, we need to change upstream values in llm-d as well.

values.yaml Snippet

The user enables a provider by uncommenting its block. Only one block should be active.

# Cloud provider specific configuration.
# You MUST enable exactly ONE provider.
provider:
  # Google Kubernetes Engine (GKE) specific configuration
  gke:
    # Set to true if the cluster is an Autopilot cluster.
    # This is optional and defaults to false if not set.
    autopilot: false

  # Generic provider for non-cloud-specific setups (would be commented out)
  # none: {}

Given the above three option, I feel keep it as is may be simple, and currently autopilot is only needed for monitoring? If we have new feature coming in, we can refactor the values structure at that time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with this, we probably don't need to treat helm as strong as APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, I like the second option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the second option, please take a look

@liu-cong
Copy link
Contributor

/lgtm

@JeffLuoo can you do another pass?

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 18, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 19, 2025
@JeffLuoo
Copy link
Contributor

JeffLuoo commented Sep 19, 2025

/lgtm

Thanks for adding it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GKE monitoring config to the helm chart
6 participants