Skip to content

Commit 6914a45

Browse files
committed
RFC-0075: initial draft
1 parent 064ad6a commit 6914a45

File tree

2 files changed

+316
-1
lines changed

2 files changed

+316
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Coding happens all the time and is encouraged. We just recognize there is a poin
4343
| 61 | [SDK3 Diagnostics](rfc/0061-sdk3-diagnostics.md) | Michael N. | ACCEPTED |
4444
| 64 | [SDK3 Field-Level Encryption](rfc/0064-sdk3-field-level-encryption.md) | David N. | ACCEPTED |
4545
| 69 | [KV Error Map V2](rfc/0069-kv-error-map-v2.md) | Brett L. | ACCEPTED |
46+
| 75 | [Faster Failover and Configuration Push](rfc/0075-faster-failover-and-configuration-push.md) | Sergey | ACCEPTED |
4647

4748
### Draft & Review RFCs
4849

@@ -65,7 +66,6 @@ Coding happens all the time and is encouraged. We just recognize there is a poin
6566
| 72 | Queues And Topics [\[doc\]](https://docs.google.com/document/d/1x-wn--F1Qg6y342pBerLLfpWnAGud2HQRA0YpEVMkqU) | Michael N. | DRAFT
6667
| 73 | KV Range Scan [\[doc\]](https://docs.google.com/document/d/1ir4E9XRvVOncReuR_QgohyompgoIvnZ0De1ik0WkrYs) | David N. & Michael N. | DRAFT
6768
| 74 | Configuration Profiles [\[doc\]](https://docs.google.com/document/d/1LNCYgV2Eqymp3pGmA8WKPQOLSpcRyv0P7NpMYHVcUM0/) | Mike R. | DRAFT
68-
| 75 | [Faster Failover and Configuration Push](https://github.com/couchbaselabs/sdk-rfcs/pull/123) | Sergey | DRAFT |
6969

7070
### Identified RFCs
7171

Lines changed: 315 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,315 @@
1+
# Meta
2+
3+
| Field | Value |
4+
|----------------|----------------------------------------|
5+
| RFC Name | Faster Failover and Configuration Push |
6+
| RFC ID | 75 |
7+
| Start Date | 2023-06-14 |
8+
| Owner | Sergey Avseyev |
9+
| Current Status | DRAFT |
10+
| Revision | #1 |
11+
12+
# Summary
13+
14+
TBD
15+
16+
# Motivation
17+
18+
TBD
19+
20+
# Relation to Other RFCs
21+
22+
This RFC relates to the following documents:
23+
24+
* [RFC-0005][rfc-0005]: VBucket Retry Logic.
25+
26+
* [RFC-0024][rfc-0024]: Fast-Failover SDK.
27+
28+
29+
# High-Level Design
30+
31+
TBD
32+
33+
# User-Facing API
34+
35+
TBD
36+
37+
# Implementation Details
38+
39+
## Protocol Changes
40+
41+
[https://issues.couchbase.com/browse/MB-57311]: #
42+
43+
### Get Cluster Config with Known Version
44+
45+
[https://review.couchbase.org/c/kv_engine/+/192301]: #
46+
47+
The KV engine introduces a new HELLO flag called `GetClusterConfigWithKnownVersion` with a value of `0x1d`. This flag
48+
does not change the behavior of the server but allows determining if the node supports epoch-revision fields for the
49+
`GetClusterConfig` (`0xb5`) operation. If the node acknowledges `GetClusterConfigWithKnownVersion`, then the SDK can use
50+
the new version of the command.
51+
52+
Epoch and revision are signed 64-bit integers encoded in network (big-endian) order.
53+
54+
55+
Byte/ 0 | 1 | 2 | 3 |
56+
/ | | | |
57+
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
58+
+---------------+---------------+---------------+---------------+
59+
0| 0x80 | 0xb5 | 0x00 | 0x00 |
60+
+---------------+---------------+---------------+---------------+
61+
4| 0x00 | 0x00 | 0x00 | 0x00 |
62+
+---------------+---------------+---------------+---------------+
63+
8| 0x00 | 0x00 | 0x00 | 0x00 |
64+
+---------------+---------------+---------------+---------------+
65+
12| 0xde | 0xad | 0xbe | 0xef |
66+
+---------------+---------------+---------------+---------------+
67+
16| 0x00 | 0x00 | 0x00 | 0x00 |
68+
+---------------+---------------+---------------+---------------+
69+
20| 0x00 | 0x00 | 0x00 | 0x00 |
70+
+---------------+---------------+---------------+---------------+
71+
24| 0x42 | 0x00 | 0x00 | 0x00 |
72+
+---------------+---------------+---------------+---------------+
73+
28| 0x00 | 0x00 | 0x00 | 0x00 |
74+
+---------------+---------------+---------------+---------------+
75+
32| 0x00 | 0x00 | 0x00 | 0x00 |
76+
+---------------+---------------+---------------+---------------+
77+
36| 0x08 | 0x07 | 0x06 | 0x05 |
78+
+---------------+---------------+---------------+---------------+
79+
40| 0x04 | 0x03 | 0x02 | 0x01 |
80+
+---------------+---------------+---------------+---------------+
81+
GET_CLUSTER_CONFIG command
82+
Field (offset) (value)
83+
Magic (0) : 0x80 (client request, SDK -> kv_engine)
84+
Opcode (1) : 0xb5
85+
Key length (2,3) : 0x0000
86+
Extra length (4) : 0x00
87+
Data type (5) : 0x00 (RAW)
88+
Vbucket (6,7) : 0x0000
89+
Total body (8-11) : 0x00000010 (16 bytes)
90+
Opaque (12-15): 0xdeadbeef
91+
CAS (16-23): 0x0000000000000000
92+
Epoch (24-31): 0x0000000000000042 (66 in base-10)
93+
Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10)
94+
95+
If the node has a cluster configuration newer than what is specified in the example, the response will include the new
96+
configuration in the body with the data type set to `JSON` (`0x01`). Otherwise, the response will have an empty body
97+
with the data type `RAW` (`0x00`).
98+
99+
### Deduplicate Cluster Configuration for `NotMyVbucket` Responses
100+
101+
[https://review.couchbase.org/c/kv_engine/+/190899]: #
102+
103+
The KV engine introduces a new HELLO flag called `DedupeNotMyVbucketClustermap` with a value of `0x1e`. Once this flag
104+
is negotiated, the node might send an empty body with `NotMyVbucket` (`0x07`) status codes. The KV engine tracks the
105+
revision that has been sent to the SDK over the socket connection, so a response with a `NotMyVbucket` status will only
106+
have a body if the pushed version is older than the active configuration.
107+
108+
The KV engine updates the pushed configuration version in the following cases:
109+
* Configuration sent to the SDK in response to a `GetClusterConfig` (`0xb5`) request.
110+
* Configuration pushed to the SDK that enabled the HELLO flag `ClustermapChangeNotification` (`0x0d`).
111+
112+
Note, that `DedupeNotMyVbucketClustermap` affects `ClustermapChangeNotification` and `ClustermapChangeNotificationBrief`
113+
features, that described below. In other words, if deduplication enabled, the cluster configuration will be announce for
114+
the socket connection only once.
115+
116+
### Enforcing Snappy Compression for Cluster Configuration Payloads
117+
118+
[https://review.couchbase.org/c/kv_engine/+/192152]: #
119+
[https://review.couchbase.org/c/kv_engine/+/192316]: #
120+
121+
The KV engine introduces a new HELLO flag called `SnappyEverywhere` with a value of `0x13`. Once this flag is
122+
negotiated, the node will always use the compressed version of the cluster configuration and data type flags will be set
123+
to `JSON | SNAPPY` (`0x03`).
124+
125+
### `GetClusterConfig` and Out-of-Order Execution
126+
127+
[https://issues.couchbase.com/browse/MB-56885]: #
128+
129+
HELLO flag `UnorderedExecution` (`0x0e`) enables Out-of-Order (OoO) execution, so that the KV engine is being allowed to
130+
reorder operations. [kv\_engine/docs/UnorderedExecution.md][kv-unordered-execution] provides more details on this
131+
feature.
132+
133+
The `GetClusterConfig` (`0xb5`) command is explicitly marked as compatible with OoO execution, allowing it to be served
134+
without waiting for the completion of in-flight operations. Specifically, `GetClusterConfig` will not wait for long
135+
operations such as mutations with SyncDurability requirements. All current SDKs are expected to be compatible with the
136+
OoO execution mode, so no changes are expected.
137+
138+
### Cluster Configuration Notification Changes
139+
140+
Prior to server version 7.6, the KV engine had an opt-in feature to push configuration updates to SDKs. This feature
141+
could be enabled using the HELLO flag `ClustermapChangeNotification` (`0x0d`), which depends on `Duplex` (`0x0c`). More
142+
details about `Duplex` can be found in [kv\_engine/docs/Duplex.md][kv-duplex]. When both flags are negotiated, the
143+
server will send unsolicited configuration updates to the SDK without expecting any acknowledgement mechanism. While
144+
this approach proves to have better responsiveness compared to [RFC-0024: Fast Failover][rfc-0024], it also has its own
145+
drawbacks, such as:
146+
147+
1. The SDK subscribes all connections using HELLO, and during rebalance, all connections will receive all notifications.
148+
2. In a Lambda scenario, if failover occurs while the SDK process is paused, upon resuming, the SDK must process all
149+
updates on all sockets. This process takes unnecessary time, unlike when the SDK polls every 2.5 seconds.
150+
151+
Since version 7.6, the KV engine introduces the HELLO flag `ClustermapChangeNotificationBrief` (`0x1f`). This flag
152+
instructs the KV engine to exclude the cluster configuration content from the notification. In this case, the data type
153+
will be `RAW` (`0x00`). Below is the typical structure of the notification when the brief mode is enabled:
154+
155+
156+
Byte/ 0 | 1 | 2 | 3 |
157+
/ | | | |
158+
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
159+
+---------------+---------------+---------------+---------------+
160+
0| 0x82 | 0x01 | 0x00 | 0x00 |
161+
+---------------+---------------+---------------+---------------+
162+
4| 0x00 | 0x00 | 0x00 | 0x00 |
163+
+---------------+---------------+---------------+---------------+
164+
8| 0x00 | 0x00 | 0x00 | 0x00 |
165+
+---------------+---------------+---------------+---------------+
166+
12| 0x00 | 0x00 | 0x00 | 0x00 |
167+
+---------------+---------------+---------------+---------------+
168+
16| 0x00 | 0x00 | 0x00 | 0x00 |
169+
+---------------+---------------+---------------+---------------+
170+
20| 0x00 | 0x00 | 0x00 | 0x00 |
171+
+---------------+---------------+---------------+---------------+
172+
24| 0x42 | 0x00 | 0x00 | 0x00 |
173+
+---------------+---------------+---------------+---------------+
174+
28| 0x00 | 0x00 | 0x00 | 0x00 |
175+
+---------------+---------------+---------------+---------------+
176+
32| 0x00 | 0x00 | 0x00 | 0x00 |
177+
+---------------+---------------+---------------+---------------+
178+
36| 0x08 | 0x07 | 0x06 | 0x05 |
179+
+---------------+---------------+---------------+---------------+
180+
40| 0x04 | 0x03 | 0x02 | 0x01 |
181+
+---------------+---------------+---------------+---------------+
182+
CLUSTERMAP_CHANGE_NOTIFICATION command
183+
Field (offset) (value)
184+
Magic (0) : 0x82 (server request, kv_engine -> SDK)
185+
Opcode (1) : 0x01
186+
Key length (2,3) : 0x0000
187+
Extra length (4) : 0x10 (two int64_t fields in extras)
188+
Data type (5) : 0x00 (RAW)
189+
Vbucket (6,7) : 0x0000
190+
Total body (8-11) : 0x00000010 (16 bytes)
191+
Opaque (12-15): 0x00000000
192+
CAS (16-23): 0x0000000000000000
193+
Epoch (24-31): 0x0000000000000042 (66 in base-10)
194+
Revision (32-39): 0x0102030405060708 (72623859790382856 in base-10)
195+
196+
So note that magic is `ServerRequest` (`0x82`), that is enabled by `Duplex` (`0x0c`) HELLO flag. Also note that just
197+
like in regular cluster configuration notification, epoch and revision fields are sent as extras.
198+
199+
Note that the magic value for this notification is `ServerRequest` (`0x82`), which is enabled by the `Duplex` (`0x0c`)
200+
HELLO flag. Additionally, similar to the regular cluster configuration notification, the epoch and revision fields are
201+
sent as extras.
202+
203+
Once the brief cluster configuration notification is received, it is up to the SDK to decide whether to send a
204+
`GetClusterConfig` (`0xb5`) request to retrieve the actual configuration body.
205+
206+
In essence, the `ClustermapChangeNotificationBrief` feature only saves network traffic. If
207+
`DedupeNotMyVbucketClustermap` is not enabled, the number of notifications will be the same as before. However, this
208+
feature can still be used as a building block to implement a debouncing mechanism. When properly configured, it can help
209+
reduce the number of requests. Further details on this topic will be covered in the "Library Changes" section.
210+
211+
## Library Changes
212+
213+
### Configuration Push
214+
215+
The previously mentioned `ClustermapChangeNotificationBrief` feature enables the SDK to subscribe all connections for
216+
configuration updates. These notifications are lightweight and can be deduplicated by the server when the
217+
`DedupeNotMyVbucketClustermap` option is negotiated.
218+
219+
#### Mixed Clusters
220+
221+
In clusters where there is a mix of nodes with older server versions, meaning that some nodes do not acknowledge
222+
`ClustermapChangeNotificationBrief`, the respective connection should notify the configuration monitor about its lack of
223+
support for configuration pushes from the server. As a result, the monitor should utilize the old polling mechanism for
224+
this particular node instead.
225+
226+
### Enhancements in Handling the `NotMyVbucket` Status
227+
228+
Combination of `DedupeNotMyVbucketClustermap` and `ClustermapChangeNotificationBrief` allows to save traffic by not
229+
sending configuration, if SDK already seen the same revision, and also sends only pair of `Epoch`/`Revision`. So it is
230+
up to SDK to initiate configuration update once the non-empty payload returned along with `NotMyVbucket` status code.
231+
232+
Several modifications are required in the SDK:
233+
1. The retry orchestrator should be able to retry an operation based on configuration updates rather than the timer signal.
234+
2. The configuration monitor should have the ability to throttle configuration requests due to the following reasons:
235+
1. During rebalance, multiple operations may return a `NotMyVbucket` status, triggering a configuration refresh.
236+
2. Since `ClustermapChangeNotificationBrief` will cause all connections to subscribe to updates and receive them, it is
237+
necessary to account for potential high volumes of updates.
238+
239+
Below is a diagram that illustrates an example of the SDK workflow, where the GET request is waiting for the arrival of
240+
a new configuration.
241+
242+
```mermaid
243+
sequenceDiagram
244+
autonumber
245+
conn_1->>+kv_node_1: get("foo", vb=115)
246+
kv_node_1->>-conn_1: NotMyVbucket(epoch=1, rev=11)
247+
conn_1-->>+retry_orchestrator: pending(get, "foo", epoch=1, rev=11)
248+
retry_orchestrator-->retry_orchestrator: put operation to wating queue
249+
conn_1-->>+config_monitor: refresh configuration
250+
config_monitor-->config_monitor: wait to throttle config requests
251+
config_monitor->>+conn_2: get_config()
252+
conn_2->>+kv_node_2: get_config()
253+
kv_node_2->>-conn_2: configuration(epoch=1, rev=11)
254+
conn_2->>-config_monitor: apply new configuration
255+
config_monitor->>retry_orchestrator: purge waiting queue(epoch=1, rev=11)
256+
retry_orchestrator->>conn_1: retry get("foo")
257+
conn_1->>+kv_node_2: get("foo", vb=115)
258+
kv_node_2->>-conn_1: Success()
259+
```
260+
261+
# Language Specifics
262+
263+
## Feature Checklist
264+
265+
1. `GetClusterConfigWithKnownVersion` (`0x1d`). The SDK should always supply current configuration version if the
266+
connection has acknowledged feature flag.
267+
268+
2. `DedupeNotMyVbucketClustermap` (`0x1e`). The SDK should be ready that the KV engine will not repeat configuration
269+
payload if it already been sent to the socket by any means (`NotMyVbucket` status, `ClustermapChangeNotification`,
270+
`GetClusterConfig`).
271+
272+
3. Out-of-Order Execution. `Duplex` (`0x0c`) feature should be always negotiated in HELLO.
273+
274+
4. `ClustermapChangeNotificationBrief` (`0x1f`). The SDK should always subscribe for configuration notifications, if the
275+
server supports it, and fallback to polling if it does not.
276+
277+
5. SDK should not emit configuration refresh request if there is one already in-flight. This should be independent of
278+
the source of the signal, as it might come from all the nodes during rebalance when the configuration push is
279+
enabled, or from `NotMyVbucket` responses.
280+
281+
6. [OPTIONAL] `SnappyEverywhere` (`0x13`). The SDK should be ready that KV engine might send Snappy-compressed payload with any
282+
of the response types (including push notifications). Check datatype `SNAPPY` (`0x02`).
283+
284+
# Open Questions
285+
286+
1. Behaviour in mixed clusters. Upgrade, when new nodes can push config, while old nodes cannot. Downgrade, when new
287+
nodes cannot push configuration (should we even consider downgrade?).
288+
289+
2. TBD
290+
291+
3. TBD
292+
293+
# Revisions
294+
295+
* Revision #1 (2023-XX-YY; Sergey Avseyev)
296+
* Completed initial draft.
297+
298+
# Signoff
299+
300+
| Language | Team Member | Signoff Date | Revision |
301+
|-------------|----------------|--------------|----------|
302+
| .NET | Jeffry Morris | | |
303+
| C/C++ | Sergey Avseyev | | |
304+
| Go | Charles Dixon | | |
305+
| Java/Kotlin | David Nault | | |
306+
| Node.js | Jared Casey | | |
307+
| PHP | Sergey Avseyev | | |
308+
| Python | Jared Casey | | |
309+
| Ruby | Sergey Avseyev | | |
310+
| Scala | Graham Pople | | |
311+
312+
[kv-unordered-execution]: https://github.com/couchbase/kv_engine/blob/master/docs/UnorderedExecution.md
313+
[kv-duplex]: https://github.com/couchbase/kv_engine/blob/master/docs/Duplex.md
314+
[rfc-0005]: rfc/0005-vbucket-retries.md
315+
[rfc-0024]: rfc/0024-fast-failover.md

0 commit comments

Comments
 (0)