Skip to content

Commit c38b48d

Browse files
authored
Merge pull request #1 from mikemorris/gep-3388-retry-budget-api-design
Add API design for GEP-3388 Retry Budgets
2 parents 78667d6 + db8bded commit c38b48d

File tree

2 files changed

+257
-5
lines changed

2 files changed

+257
-5
lines changed

geps/gep-3388/index.md

Lines changed: 256 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# GEP-3388: Retry Budgets
22

33
* Issue: [#3388](https://github.com/kubernetes-sigs/gateway-api/issues/3388)
4-
* Status: Provisional
4+
* Status: Implementable
55

66
(See status definitions [here](/geps/overview/#gep-states).)
77

@@ -29,7 +29,7 @@ Multiple data plane proxies offer optional configuration for budgeted retries, i
2929

3030
Configuring a limit for client retries is an important factor in building a resilient system, allowing requests to be successfully retried during periods of intermittent failure. But too many client-side retries can also exacerbate consistent failures and slow down recovery, quickly overwhelming a failing system and leading to cascading failures such as retry storms. Configuring a sane limit for max client-side retries is often challenging in complex systems. Allowing an application developer (Ana) to configure a dynamic "retry budget" reduces the risk of a high number of retries across clients. It allows a service to perform as expected in both times of high & low request load, as well as both during periods of intermittent & consistent failures.
3131

32-
While retry budget configuration has been a frequently discussed feature within the community, differences in the semantics between data plane implementations creates a challenge for a consensus on the correct location for the configuration. This proposal aims to determine where retry budget's should be defined within the Gateway API, and whether data plane proxies may need to be altered to accommodate the specification.
32+
While retry budget configuration has been a frequently discussed feature within the community, differences in the semantics between data plane implementations creates a challenge for a consensus on the correct location for the configuration. This proposal aims to determine where retry budget's should be defined within the Gateway API, and whether data plane proxies may need to be altered to accommodate the specification.
3333

3434
### Background on implementations
3535

@@ -79,13 +79,265 @@ The implementation of a version of Linkerd's `ttl` parameter within Envoy might
7979

8080
## API
8181

82+
Two possible API designs are provided below, likely only one should be selected for implementation.
83+
8284
### Go
8385

84-
TODO
86+
```golang
87+
type RetryPolicy struct {
88+
// RetryPolicy defines the configuration for when to retry a request to a target backend.
89+
// Implementations SHOULD retry on connection errors (disconnect, reset, timeout,
90+
// TCP failure) if a retry stanza is configured.
91+
//
92+
// Support: Extended
93+
//
94+
// +optional
95+
// <gateway:experimental>
96+
//
97+
// Note: there is no Override or Default policy configuration.
98+
99+
metav1.TypeMeta `json:",inline"`
100+
metav1.ObjectMeta `json:"metadata,omitempty"`
101+
102+
// Spec defines the desired state of BackendLBPolicy.
103+
Spec RetryPolicySpec `json:"spec"`
104+
105+
// Status defines the current state of BackendLBPolicy.
106+
Status PolicyStatus `json:"status,omitempty"`
107+
}
108+
109+
type RetryPolicySpec struct {
110+
// TargetRef identifies an API object to apply policy to.
111+
// Currently, Backends (i.e. Service, ServiceImport, or any
112+
// implementation-specific backendRef) are the only valid API
113+
// target references.
114+
// +listType=map
115+
// +listMapKey=group
116+
// +listMapKey=kind
117+
// +listMapKey=name
118+
// +kubebuilder:validation:MinItems=1
119+
// +kubebuilder:validation:MaxItems=16
120+
TargetRefs []LocalPolicyTargetReference `json:"targetRefs"`
121+
122+
// TODO: This captures the basic idea, but should likely be a new type.
123+
From []ReferenceGrantFrom `json:"from,omitempty"`
124+
125+
CommonRetryPolicy `json:",inline"`
126+
}
127+
128+
type BackendTrafficPolicy struct {
129+
// BackendTrafficPolicy defines the configuration for how traffic to a target backend should be handled.
130+
//
131+
// Support: Extended
132+
//
133+
// +optional
134+
// <gateway:experimental>
135+
//
136+
// Note: there is no Override or Default policy configuration.
137+
138+
metav1.TypeMeta `json:",inline"`
139+
metav1.ObjectMeta `json:"metadata,omitempty"`
140+
141+
// Spec defines the desired state of BackendTrafficPolicy.
142+
Spec BackendTrafficPolicySpec `json:"spec"`
143+
144+
// Status defines the current state of BackendTrafficPolicy.
145+
Status PolicyStatus `json:"status,omitempty"`
146+
}
147+
148+
type BackendTrafficPolicySpec struct {
149+
// TargetRef identifies an API object to apply policy to.
150+
// Currently, Backends (i.e. Service, ServiceImport, or any
151+
// implementation-specific backendRef) are the only valid API
152+
// target references.
153+
// +listType=map
154+
// +listMapKey=group
155+
// +listMapKey=kind
156+
// +listMapKey=name
157+
// +kubebuilder:validation:MinItems=1
158+
// +kubebuilder:validation:MaxItems=16
159+
TargetRefs []LocalPolicyTargetReference `json:"targetRefs"`
160+
161+
// TODO: This captures the basic idea, but should likely be a new type.
162+
From []ReferenceGrantFrom `json:"from,omitempty"`
163+
164+
// Retry defines the configuration for when to retry a request to a target backend.
165+
//
166+
// Implementations SHOULD retry on connection errors (disconnect, reset, timeout,
167+
// TCP failure) if a retry stanza is configured.
168+
//
169+
// Support: Extended
170+
//
171+
// +optional
172+
// <gateway:experimental>
173+
Retry *CommonRetryPolicy `json:"retry,omitempty"`
174+
175+
// SessionPersistence defines and configures session persistence
176+
// for the backend.
177+
//
178+
// Support: Extended
179+
//
180+
// +optional
181+
SessionPersistence *SessionPersistence `json:"sessionPersistence,omitempty"`
182+
}
183+
184+
// CommonRetryPolicy defines the configuration for when to retry a request.
185+
//
186+
type CommonRetryPolicy struct {
187+
// TODO: Does it make sense to include this configuration in the policy or not?
188+
//
189+
// Support: Extended
190+
//
191+
// +optional
192+
HTTP *HTTPRouteRetry `json:"http,omitempty"`
193+
194+
// Support: Extended
195+
//
196+
// +optional
197+
BudgetPercent *Int `json:"budgetPercent,omitempty"`
198+
199+
// Support: Extended
200+
//
201+
// +optional
202+
BudgetInterval *Duration `json:"budgetInterval,omitempty"`
203+
204+
// Support: Extended
205+
//
206+
// +optional
207+
minRetryRate *RequestRate `json:"retryRate,omitempty"`
208+
}
209+
210+
// RequestRate expresses a rate of requests over a given period of time.
211+
//
212+
type RequestRate struct {
213+
// Support: Extended
214+
//
215+
// +optional
216+
Count *Int `json:"count,omitempty"`
217+
218+
// Support: Extended
219+
//
220+
// +optional
221+
Interval *Duration `json:"interval,omitempty"`
222+
}
223+
224+
// Duration is a string value representing a duration in time. The foramat is
225+
// as specified in GEP-2257, a strict subset of the syntax parsed by Golang
226+
// time.ParseDuration.
227+
//
228+
// +kubebuilder:validation:Pattern=`^([0-9]{1,5}(h|m|s|ms)){1,4}$`
229+
type Duration string
85230

86231
### YAML
87232

88-
TODO
233+
```yaml
234+
apiVersion: gateway.networking.x-k8s.io/v1alpha1
235+
kind: RetryPolicy
236+
metadata:
237+
name: retry-policy-example
238+
spec:
239+
targetRefs:
240+
- group: ""
241+
kind: Service
242+
name: foo
243+
from:
244+
- kind: Mesh
245+
namespace: istio-system
246+
name: istio
247+
- kind: Gateway
248+
name: foo-ingress
249+
http:
250+
codes:
251+
- 500
252+
- 502
253+
- 503
254+
- 504
255+
attempts: 2
256+
backoff: 100ms
257+
budgetPercent: 20
258+
budgetInterval: 10s
259+
minRetryRate:
260+
count: 3
261+
interval: 1s
262+
status:
263+
ancestors:
264+
- ancestorRef:
265+
kind: Mesh
266+
namespace: istio-system
267+
name: istio
268+
controllerName: "istio.io/mesh-controller"
269+
conditions:
270+
- type: "Accepted"
271+
status: "True"
272+
reason: "Accepted"
273+
- ancestorRef:
274+
kind: Gateway
275+
namespace: foo-ns
276+
name: foo-ingress
277+
controllerName: "istio.io/mesh-controller"
278+
conditions:
279+
- type: "Accepted"
280+
status: "False"
281+
reason: "Invalid"
282+
message: "RetryPolicy fields budgetPercentage, budgetInterval and minRetryRate are not supported for Istio ingress gateways."
283+
```
284+
285+
```yaml
286+
apiVersion: gateway.networking.x-k8s.io/v1alpha1
287+
kind: BackendTrafficPolicy
288+
metadata:
289+
name: traffic-policy-example
290+
spec:
291+
targetRefs:
292+
- group: ""
293+
kind: Service
294+
name: foo
295+
from:
296+
- kind: Mesh
297+
namespace: istio-system
298+
name: istio
299+
- kind: Gateway
300+
name: foo-ingress
301+
retry:
302+
http:
303+
codes:
304+
- 500
305+
- 502
306+
- 503
307+
- 504
308+
attempts: 2
309+
backoff: 100ms
310+
budgetPercent: 20
311+
budgetInterval: 10s
312+
minRetryRate:
313+
count: 3
314+
interval: 1s
315+
sessionPersistence:
316+
...
317+
status:
318+
ancestors:
319+
- ancestorRef:
320+
kind: Mesh
321+
namespace: istio-system
322+
name: istio
323+
controllerName: "istio.io/mesh-controller"
324+
conditions:
325+
- type: "Accepted"
326+
status: "False"
327+
reason: "Invalid"
328+
message: "BackendTrafficPolicy field sessionPersistence is not supported for Istio mesh traffic."
329+
- ancestorRef:
330+
kind: Gateway
331+
namespace: foo-ns
332+
name: foo-ingress
333+
controllerName: "istio.io/mesh-controller"
334+
conditions:
335+
- type: "Accepted"
336+
status: "False"
337+
reason: "Invalid"
338+
message: "BackendTrafficPolicy fields retry.budgetPercentage, retry.budgetInterval and retry.minRetryRate are not supported for Istio ingress gateways."
339+
...
340+
```
89341

90342
## Conformance Details
91343

geps/gep-3388/metadata.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ apiVersion: internal.gateway.networking.k8s.io/v1alpha1
22
kind: GEPDetails
33
number: 3388
44
name: Retry Budgets
5-
status: Provisional
5+
status: Implementable
66
# Any authors who contribute to the GEP in any way should be listed here using
77
# their Github handle.
88
authors:

0 commit comments

Comments
 (0)