Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions config/charts/inferencepool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ To install via the latest published chart in staging (--version v0 indicates la
```txt
$ helm install vllm-llama3-8b-instruct \
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
--set provider.name=[none|gke] \
--set provider.name=[none|gke|istio] \
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
```

Expand Down Expand Up @@ -95,7 +95,7 @@ Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install for Tri
$ helm install triton-llama3-8b-instruct \
--set inferencePool.modelServers.matchLabels.app=triton-llama3-8b-instruct \
--set inferencePool.modelServerType=triton-tensorrt-llm \
--set provider.name=[none|gke] \
--set provider.name=[none|gke|istio] \
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
```

Expand Down Expand Up @@ -148,7 +148,31 @@ The following table list the configurable parameters of the chart.
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
| `inferenceExtension.flags.has-enable-leader-election` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. |
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |

### Provider Specific Configuration

This section should document any Gateway provider specific values configurations.

#### GKE

These are the options available to you with `provider.name` set to `gke`:

| **Parameter Name** | **Description** |
|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| `gke.monitoringSecret.name` | The name of the monitoring secret to be used. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
| `gke.monitoringSecret.namespace` | The namespace that the monitoring secret lives in. Defaults to `default`. |


#### Istio

These are the options available to you with `provider.name` set to `istio`:

| **Parameter Name** | **Description** |
|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| `istio.destinationRule.host` | Custom host value for the destination rule. If not set this will use the default value which is derrived from the epp service name and release namespace to gerenate a valid service address. |
| `istio.destinationRule.trafficPolicy.connectionPool` | Configure the connectionPool level settings of the traffic policy |


## Notes

Expand Down
16 changes: 16 additions & 0 deletions config/charts/inferencepool/templates/istio.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{{- if eq .Values.provider.name "istio" }}
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
Comment on lines +2 to +3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to use Istio specific APIs (DestinationRule) or standard GW APIs can be used instead?
(for example: BackendTrafficPolicy and BackendTLSPolicy
https://gateway-api.sigs.k8s.io/api-types/backendtrafficpolicy/
https://gateway-api.sigs.k8s.io/api-types/backendtlspolicy/?h=backendtlspolicy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely open to it if you want to contribute an alternative, this is just beyond the scope of this initial implementation for integration with llm-d

metadata:
name: {{ include "gateway-api-inference-extension.name" . }}
spec:
host: {{ .Values.istio.destinationRule.host | default (printf "%s.%s.svc.cluster.local" (include "gateway-api-inference-extension.name" .) .Release.Namespace) }}
trafficPolicy:
tls:
mode: SIMPLE
insecureSkipVerify: true
{{- if .Values.istio.destinationRule.trafficPolicy.connectionPool }}
connectionPool:
{{- .Values.istio.destinationRule.trafficPolicy.connectionPool | toYaml | nindent 6 }}
{{- end }}
{{- end }}
12 changes: 12 additions & 0 deletions config/charts/inferencepool/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,22 @@ inferencePool:
# This will soon be deprecated when upstream GW providers support v1, just doing something simple for now.
targetPortNumber: 8000

# Options: ["gke", "istio", "none"]
provider:
name: none

gke:
monitoringSecret:
name: inference-gateway-sa-metrics-reader-secret
namespace: default

istio:
destinationRule:
# Provide a way to override the default calculated host
host: ""
# Optional: Enables customization of the traffic policy
trafficPolicy: {}
# connectionPool:
# http:
# maxRequestsPerConnection: 256000