Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion config/charts/inferencepool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,9 @@ inferenceExtension:

**Note:** Prometheus monitoring requires the Prometheus Operator and ServiceMonitor CRD to be installed in the cluster.

For GKE environments, monitoring is automatically configured when `provider.name` is set to `gke`.
For GKE environments, monitoring is enabled by setting `provider.name` to `gke` and `inferenceExtension.monitoring.gke.enabled` to `true`. This will create the necessary `PodMonitoring` and RBAC resources for metrics collection.

If you are using a GKE Autopilot cluster, you also need to set `provider.gke.autopilot` to `true`.

Then apply it with:

Expand Down Expand Up @@ -174,8 +176,10 @@ The following table list the configurable parameters of the chart.
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |

## Notes

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if or .Values.inferenceExtension.monitoring.prometheus.enabled .Values.inferenceExtension.monitoring.gke.enabled }}
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
apiVersion: v1
kind: Secret
metadata:
Expand Down
89 changes: 85 additions & 4 deletions config/charts/inferencepool/templates/gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,44 @@ spec:
timeoutSec: 300 # 5-minute timeout (adjust as needed)
logging:
enabled: true # log all requests by default
{{- if .Values.inferenceExtension.monitoring.gke.enabled }}
{{- $metricsReadSA := printf "%s-metrics-reader-sa" .Release.Name -}}
{{- $metricsReadSecretName := printf "%s-metrics-reader-secret" .Release.Name -}}
{{- $metricsReadRoleName := printf "%s-%s-metrics-reader" .Release.Namespace .Release.Name -}}
{{- $metricsReadRoleBindingName := printf "%s-%s-metrics-reader-role-binding" .Release.Namespace .Release.Name -}}
{{- $secretReadRoleName := printf "%s-metrics-reader-secret-read" .Release.Name -}}
{{- $gmpNamespace := "gmp-system" -}}
{{- $isAutopilot := false -}}
{{- with .Values.provider.gke }}
{{- $isAutopilot = .autopilot | default false -}}
{{- end }}
{{- if $isAutopilot -}}
{{- $gmpNamespace = "gke-gmp-system" -}}
{{- end -}}
{{- $gmpCollectorRoleBindingName := printf "%s:collector:%s-%s-metrics-reader-secret-read" $gmpNamespace .Release.Namespace .Release.Name -}}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ $metricsReadSA }}
namespace: {{ .Release.Namespace }}
---
apiVersion: v1
kind: Secret
metadata:
name: {{ $metricsReadSecretName }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
annotations:
kubernetes.io/service-account.name: {{ $metricsReadSA }}
type: kubernetes.io/service-account-token
---
apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
kind: PodMonitoring
metadata:
name: {{ .Release.Namespace }}-{{ .Release.Name }}
name: {{ .Release.Name }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
spec:
Expand All @@ -52,10 +85,58 @@ spec:
type: Bearer
credentials:
secret:
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
name: {{ $metricsReadSecretName }}
key: token
namespace: {{ .Release.Namespace }}
selector:
matchLabels:
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ $metricsReadRoleName }}
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ $metricsReadRoleBindingName }}
subjects:
- kind: ServiceAccount
name: {{ $metricsReadSA }}
namespace: {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
Copy link
Contributor

@liu-cong liu-cong Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not make new ClusterRoles, #1393 is asking converting existing Cluster RBAC to namespace scoped.

Copy link
Contributor

@JeffLuoo JeffLuoo Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a namespaced scope rbac, make sure all namespace align: epp, secret, and the namespace in cluster pod monitoring.

Also, since we are using a namespace scoped objects, we can consider using the podMonitoring instead a cluster pod monitoring, where the podMonitoring is also namespace scoped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffLuoo does PodMonitoring exist outside of GKE?
I think it would be good if we can find a namespace scoped solution, but one that works for all deployment options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodMonitoring is available by default on GKE. But people can install it on any K8s distribution by deploying it manually using https://github.com/GoogleCloudPlatform/prometheus-engine.

@sallyom adds an option using the prometheus-operator: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1425/files and I believe it is namespace scoped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or did you mean PodMonitor from Prometheus operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new commit(2d9a5e5) to make most of the Cluster scoped resource to namespaced. However, the following two resources is kept to make the gmp scraping metrics

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: {{ $roleName }}
rules:
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: {{ $roleBindingName }}
subjects:
- kind: ServiceAccount
  name: {{ $saName }}
  namespace: {{ .Release.Namespace }}
roleRef:
  kind: ClusterRole
  name: {{ $roleName }}
  apiGroup: rbac.authorization.k8s.io
  1. if I change ClusterRole to Role, it does not let me set a nonResourecesURls
Error: UPGRADE FAILED: failed to create resource: Role.rbac.authorization.k8s.io "infpool-gemma-2b-metrics-reader" is invalid: rules[0].nonResourceURLs: Invalid value: []string{"/metrics"}: namespaced rules cannot apply to non-resource URLs
  1. if I remove those two completely, the GMP just cannot scraping the metrics because of permission issue. I thought the following Podmonitoring and secret-read role binding should work but it didn't. Is this expected? @JeffLuoo
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
 name: {{ .Release.Name }}
 namespace: {{ .Release.Namespace }}
 labels:
   {{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
spec:
 endpoints:
 - port: metrics
   scheme: http
   interval: {{ .Values.inferenceExtension.monitoring.interval }}
   path: /metrics
   authorization:
     type: Bearer
     credentials:
       secret:
         name: {{ $secretName }}
         key: token
 selector:
   matchLabels:
     {{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The nonResourceURLs for metrics URL is required. And the nonResourceURLs is cluster scoped: https://github.com/kubernetes/kubernetes/blob/f42b497cf25548aa0f327c675e11c57240bfab4b/staging/src/k8s.io/api/rbac/v1/types.go#L68-L69. Can you try keep the two roles you removed and see if PodMonitoring would work then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which two roles are you referring to? the current commit is a working version. just having the ClusterRole and ClusterRoleBinding for metrics read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted a summary of this in #1393 (comment). TLDR is that if Cluster RBAC is unavoidable, we can use uniquely named Cluster RBAC names to avoid collision.

Copy link
Contributor Author

@zetxqx zetxqx Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, thanks, amend the commit, now I only leave the ClusterRole and ClusterRoleBinding for GKE metrics read. And changed the name to include the namespace. Updated the PR description to reflect what we have now in the helm.

name: {{ $metricsReadRoleName }}
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ $secretReadRoleName }}
rules:
- resources:
- secrets
apiGroups: [""]
verbs: ["get", "list", "watch"]
resourceNames: [{{ $metricsReadSecretName | quote }}]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ $gmpCollectorRoleBindingName }}
namespace: {{ .Release.Namespace }}
roleRef:
name: {{ $secretReadRoleName }}
kind: Role
apiGroup: rbac.authorization.k8s.io
subjects:
- name: collector
namespace: {{ $gmpNamespace }}
kind: ServiceAccount
{{- end }}
{{- end }}
8 changes: 8 additions & 0 deletions config/charts/inferencepool/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ inferenceExtension:
# Prometheus ServiceMonitor will be created when enabled for EPP metrics collection
prometheus:
enabled: false

gke:
enabled: false

inferencePool:
targetPorts:
Expand All @@ -67,3 +70,8 @@ inferencePool:
provider:
name: none

# GKE-specific configuration.
# This block is only used if name is "gke".
gke:
# Set to true if the cluster is an Autopilot cluster.
autopilot: false