@@ -10,13 +10,15 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
10
10
## ** Prerequisites**
11
11
12
12
A cluster with:
13
- - Support for services of type ` LoadBalancer ` . For kind clusters, follow [ this guide] ( https://kind.sigs.k8s.io/docs/user/loadbalancer )
13
+
14
+ - Support for services of type ` LoadBalancer ` . For kind clusters, follow [ this guide] ( https://kind.sigs.k8s.io/docs/user/loadbalancer )
14
15
to get services of type LoadBalancer working.
15
- - Support for [ sidecar containers] ( https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ ) (enabled by default since Kubernetes v1.29)
16
+ - Support for [ sidecar containers] ( https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ ) (enabled by default since Kubernetes v1.29)
16
17
to run the model server deployment.
17
18
18
19
Tooling:
19
- - [ Helm] ( https://helm.sh/docs/intro/install/ ) installed
20
+
21
+ - [ Helm] ( https://helm.sh/docs/intro/install/ ) installed.
20
22
21
23
## ** Steps**
22
24
@@ -44,7 +46,7 @@ Tooling:
44
46
45
47
```bash
46
48
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to the set of Llama models
47
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/gpu-deployment.yaml
49
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/vllm/gpu-deployment.yaml
48
50
```
49
51
50
52
=== "CPU-Based Model Server"
@@ -63,7 +65,7 @@ Tooling:
63
65
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
64
66
65
67
```bash
66
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/cpu-deployment.yaml
68
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/vllm/cpu-deployment.yaml
67
69
```
68
70
69
71
=== "vLLM Simulator Model Server"
@@ -74,14 +76,14 @@ Tooling:
74
76
To deploy the vLLM simulator, run the following command.
75
77
76
78
```bash
77
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/sim-deployment.yaml
79
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/vllm/sim-deployment.yaml
78
80
```
79
81
80
82
### Install the Inference Extension CRDs
81
83
82
- ``` bash
83
- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/ download/manifests.yaml
84
- ```
84
+ ``` bash
85
+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.0 /manifests.yaml
86
+ ```
85
87
86
88
### Deploy the InferencePool and Endpoint Picker Extension
87
89
@@ -144,7 +146,7 @@ Tooling:
144
146
2. Deploy Inference Gateway:
145
147
146
148
```bash
147
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gke/gateway.yaml
149
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/gke/gateway.yaml
148
150
```
149
151
150
152
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
@@ -157,15 +159,15 @@ Tooling:
157
159
3. Deploy the HTTPRoute
158
160
159
161
```bash
160
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gke/httproute.yaml
162
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/gke/httproute.yaml
161
163
```
162
164
163
165
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
164
166
165
167
```bash
166
168
kubectl get httproute llm-route -o yaml
167
169
```
168
-
170
+
169
171
=== "Istio"
170
172
171
173
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -195,13 +197,13 @@ Tooling:
195
197
3. If you run the Endpoint Picker (EPP) with the `--secure-serving` flag set to `true` (the default mode), it is currently using a self-signed certificate. As a security measure, Istio does not trust self-signed certificates by default. As a temporary workaround, you can apply the destination rule to bypass TLS verification for EPP. A more secure TLS implementation in EPP is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).
196
198
197
199
```bash
198
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/istio/destination-rule.yaml
200
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/istio/destination-rule.yaml
199
201
```
200
202
201
203
4. Deploy Gateway
202
204
203
205
```bash
204
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/istio/gateway.yaml
206
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/istio/gateway.yaml
205
207
```
206
208
207
209
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
@@ -211,13 +213,13 @@ Tooling:
211
213
inference-gateway inference-gateway <MY_ADDRESS> True 22s
212
214
```
213
215
214
- 6 . Deploy the HTTPRoute
216
+ 5 . Deploy the HTTPRoute
215
217
216
218
```bash
217
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/istio/httproute.yaml
219
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/istio/httproute.yaml
218
220
```
219
221
220
- 7 . Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
222
+ 6 . Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
221
223
222
224
```bash
223
225
kubectl get httproute llm-route -o yaml
@@ -250,7 +252,7 @@ Tooling:
250
252
4. Deploy the Gateway
251
253
252
254
```bash
253
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/kgateway/gateway.yaml
255
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/kgateway/gateway.yaml
254
256
```
255
257
256
258
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
@@ -263,7 +265,7 @@ Tooling:
263
265
5. Deploy the HTTPRoute
264
266
265
267
```bash
266
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/kgateway/httproute.yaml
268
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/kgateway/httproute.yaml
267
269
```
268
270
269
271
6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
@@ -297,7 +299,7 @@ Tooling:
297
299
4. Deploy the Gateway
298
300
299
301
```bash
300
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/agentgateway/gateway.yaml
302
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/agentgateway/gateway.yaml
301
303
```
302
304
303
305
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
@@ -310,7 +312,7 @@ Tooling:
310
312
5. Deploy the HTTPRoute
311
313
312
314
```bash
313
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/agentgateway/httproute.yaml
315
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/agentgateway/httproute.yaml
314
316
```
315
317
316
318
6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
@@ -328,10 +330,9 @@ Tooling:
328
330
Deploy the sample InferenceObjective which allows you to specify priority of requests.
329
331
330
332
``` bash
331
- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/inferenceobjective.yaml
333
+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/inferenceobjective.yaml
332
334
```
333
335
334
-
335
336
### Try it out
336
337
337
338
Wait until the gateway is ready.
@@ -357,36 +358,36 @@ Tooling:
357
358
358
359
``` bash
359
360
helm uninstall vllm-llama3-8b-instruct
360
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/inferenceobjective.yaml --ignore-not-found
361
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
362
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/gpu-deployment.yaml --ignore-not-found
363
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/vllm/sim-deployment.yaml --ignore-not-found
361
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/inferenceobjective.yaml --ignore-not-found
362
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
363
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/vllm/gpu-deployment.yaml --ignore-not-found
364
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/vllm/sim-deployment.yaml --ignore-not-found
364
365
kubectl delete secret hf-token --ignore-not-found
365
366
```
366
367
367
368
1. Uninstall the Gateway API Inference Extension CRDs
368
369
369
370
` ` ` bash
370
- kubectl delete -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd --ignore-not-found
371
+ kubectl delete -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.0/manifests.yaml --ignore-not-found
371
372
` ` `
372
373
373
374
1. Choose one of the following options to cleanup the Inference Gateway.
374
375
375
376
=== " GKE"
376
377
377
378
` ` ` bash
378
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gke/gateway.yaml --ignore-not-found
379
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gke/healthcheck.yaml --ignore-not-found
380
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gke/gcp-backend-policy.yaml --ignore-not-found
381
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/gke/httproute.yaml --ignore-not-found
379
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/gke/gateway.yaml --ignore-not-found
380
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/gke/healthcheck.yaml --ignore-not-found
381
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/gke/gcp-backend-policy.yaml --ignore-not-found
382
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/gke/httproute.yaml --ignore-not-found
382
383
` ` `
383
384
384
385
=== " Istio"
385
386
386
387
` ` ` bash
387
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/istio/gateway.yaml --ignore-not-found
388
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/istio/destination-rule.yaml --ignore-not-found
389
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/istio/httproute.yaml --ignore-not-found
388
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/istio/gateway.yaml --ignore-not-found
389
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/istio/destination-rule.yaml --ignore-not-found
390
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/istio/httproute.yaml --ignore-not-found
390
391
` ` `
391
392
392
393
The following steps assume you would like to clean up ALL Istio resources that were created in this quickstart guide.
@@ -397,7 +398,7 @@ Tooling:
397
398
istioctl uninstall -y --purge
398
399
` ` `
399
400
400
- 1 . Remove the Istio namespace
401
+ 2 . Remove the Istio namespace
401
402
402
403
` ` ` bash
403
404
kubectl delete ns istio-system
@@ -406,8 +407,8 @@ Tooling:
406
407
=== " Kgateway"
407
408
408
409
` ` ` bash
409
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/kgateway/gateway.yaml --ignore-not-found
410
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /config/manifests/gateway/kgateway/httproute.yaml --ignore-not-found
410
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/kgateway/gateway.yaml --ignore-not-found
411
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0 /config/manifests/gateway/kgateway/httproute.yaml --ignore-not-found
411
412
` ` `
412
413
413
414
The following steps assume you would like to cleanup ALL Kgateway resources that were created in this quickstart guide.
@@ -418,13 +419,13 @@ Tooling:
418
419
helm uninstall kgateway -n kgateway-system
419
420
` ` `
420
421
421
- 1 . Uninstall the Kgateway CRDs.
422
+ 2 . Uninstall the Kgateway CRDs.
422
423
423
424
` ` ` bash
424
425
helm uninstall kgateway-crds -n kgateway-system
425
426
` ` `
426
427
427
- 1 . Remove the Kgateway namespace.
428
+ 3 . Remove the Kgateway namespace.
428
429
429
430
` ` ` bash
430
431
kubectl delete ns kgateway-system
@@ -433,8 +434,8 @@ Tooling:
433
434
=== " Agentgateway"
434
435
435
436
` ` ` bash
436
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main/ config/manifests/gateway/kgateway /gateway.yaml --ignore-not-found
437
- kubectl delete -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main/ config/manifests/gateway/kgateway /httproute.yaml --ignore-not-found
437
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0/ config/manifests/gateway/agentgateway /gateway.yaml --ignore-not-found
438
+ kubectl delete -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v1.0.0/ config/manifests/gateway/agentgateway /httproute.yaml --ignore-not-found
438
439
` ` `
439
440
440
441
The following steps assume you would like to cleanup ALL Kgateway resources that were created in this quickstart guide.
0 commit comments