Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions config/charts/inferencepool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,30 @@ Then apply it with:
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
```

### Install with Monitoring

To enable metrics collection and monitoring for the EndpointPicker, you can configure Prometheus ServiceMonitor creation:

```yaml
inferenceExtension:
monitoring:
interval: "10s"
prometheus:
enabled: true
secret:
name: inference-gateway-sa-metrics-reader-secret
```

**Note:** Prometheus monitoring requires the Prometheus Operator and ServiceMonitor CRD to be installed in the cluster.

For GKE environments, monitoring is automatically configured when `provider.name` is set to `gke`.

Then apply it with:

```txt
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
```

## Uninstall

Run the following command to uninstall the chart:
Expand Down Expand Up @@ -146,6 +170,9 @@ The following table list the configurable parameters of the chart.
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
| `inferenceExtension.flags.has-enable-leader-election` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. |
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |

Expand Down
12 changes: 12 additions & 0 deletions config/charts/inferencepool/templates/epp-sa-token-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{{- if or .Values.inferenceExtension.monitoring.prometheus.enabled .Values.inferenceExtension.monitoring.gke.enabled }}
apiVersion: v1
kind: Secret
metadata:
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
annotations:
kubernetes.io/service-account.name: {{ include "gateway-api-inference-extension.name" . }}
type: kubernetes.io/service-account-token
{{- end }}
25 changes: 25 additions & 0 deletions config/charts/inferencepool/templates/epp-servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "gateway-api-inference-extension.name" . }}-monitor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment, add namespace (I assume this is namespace scoped?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

namespace: {{ .Release.Namespace }}
labels:
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
spec:
endpoints:
- interval: {{ .Values.inferenceExtension.monitoring.interval }}
port: "http-metrics"
path: "/metrics"
authorization:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ServiceMonitor is namespace-scoped, does the secret need to reside in the same namespace of the CR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK - but I've never run with the secret in another ns so not 100% sure - definitely I'd say best practice though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(namespace is included in the secret template)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT there is no option to add namespace in that authorization.credentials section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that both the CRD and secret use the namespace template .Release.Namespace so it should be good. Closing this comment.

credentials:
key: token
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
jobLabel: {{ include "gateway-api-inference-extension.name" . }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchLabels:
{{- include "gateway-api-inference-extension.labels" . | nindent 6 }}
{{- end }}
6 changes: 3 additions & 3 deletions config/charts/inferencepool/templates/gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ spec:
endpoints:
- port: metrics
scheme: http
interval: 5s
interval: {{ .Values.inferenceExtension.monitoring.interval }}
path: /metrics
authorization:
type: Bearer
credentials:
secret:
name: {{ .Values.gke.monitoringSecret.name }}
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
key: token
namespace: {{ .Values.gke.monitoringSecret.namespace }}
namespace: {{ .Release.Namespace }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffLuoo this means we need to change the GKE docs since we now have to create a secret under the pool's namespace and we need to create one for each epp deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update in this PR includes a new file config/charts/inferencepool/templates/epp-sa-token-secret.yaml that is the secret required for scraping the metric. The namespace uses the same template {{ .Release.Namespace }}.

selector:
matchLabels:
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}
Expand Down
6 changes: 6 additions & 0 deletions config/charts/inferencepool/templates/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ rules:
- subjectaccessreviews
verbs:
- create
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
- nonResourceURLs:
- "/metrics"
verbs:
- get
{{- end }}
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
15 changes: 11 additions & 4 deletions config/charts/inferencepool/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,17 @@ inferenceExtension:

tolerations: []

# Monitoring configuration for EPP
monitoring:
interval: "10s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Make the default scraping interval 15s instead of 10s.

See https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work that the rate query should have a range four times the scrape interval. The most common range in the query is 1m, hence setting interval to 15s will be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came across this, which is why I set to 10s - says to set to less than 15s - wdyt?
https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/tools/dashboards#troubleshooting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for catching this. I checked the dashboard and we are all using $__rate_interval. Should be good here to keep it as 10s.

# Service account token secret for authentication
secret:
name: inference-gateway-sa-metrics-reader-secret

# Prometheus ServiceMonitor will be created when enabled for EPP metrics collection
prometheus:
enabled: false

inferencePool:
targetPorts:
- number: 8000
Expand All @@ -56,7 +67,3 @@ inferencePool:
provider:
name: none

gke:
monitoringSecret:
name: inference-gateway-sa-metrics-reader-secret
namespace: default