Skip to content

Conversation

stockholmux
Copy link
Member

Description

Initial version of the blog for the 9.0 release.

I ran into a few issues that I could use some help with:

  1. The atomic slot migration paragraphs were me writing up with @PingXie said at Keyspace. It could use a pretty complete review to make sure I captured the story completely and accruately.
  2. I based the 'much, much, more' section on a slide from Keyspace. I tired to hunt down the related PRs but there were a few that I couldn't pin down ('1000+ node cluster', and 'Zero copy responses'). I specific PR link would make this stronger.

Issues Resolved

#312

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Signed-off-by: Kyle J. Davis <[email protected]>
@stockholmux stockholmux requested a review from madolson as a code owner October 12, 2025 06:07
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The technical details look pretty good

* **Zero copy responses**: Large requests avoid internal memory copying, yielding up to 20% higher throughput,
* **[Pipeline memory prefetch](https://github.com/valkey-io/valkey/pull/2092)**: Memory prefetching when pipelining, yielding up to 40% higher throughput,
* **[Multipath TCP](https://github.com/valkey-io/valkey/pull/1811)**: Adds Multipath TCP support which can reduce latency by 25%.
* **1000+ node cluster**: stability improvements for very large clusters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cut this as there will be a dedicated blog. We didn't actually update much for this release.

This approach works for most situations, but corner cases can lead to degraded performance, operational headaches, and, at worst, blocked node migrations and lost data.

Key-by-key migration uses a move-then-delete sequence.
Performance issues arise when Valkey tries to access a key during a partially migrated state: if the migration hasn’t completed, the client may not know if the key resides on a the original node or the new node and leading to a condition that has more network hops and additional processing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Performance issues arise when Valkey tries to access a key during a partially migrated state: if the migration hasn’t completed, the client may not know if the key resides on a the original node or the new node and leading to a condition that has more network hops and additional processing.
Performance issues arise when a client tries to access a key during a partially migrated state: if the migration hasn’t completed, the client may not know if the key resides on a the original node or the new node and leading to a condition that has more network hops and additional processing.

Performance issues arise when Valkey tries to access a key during a partially migrated state: if the migration hasn’t completed, the client may not know if the key resides on a the original node or the new node and leading to a condition that has more network hops and additional processing.
Worse, in a multi-key operation, if one key resides in the original node and another in the new node, Valkey cannot properly execute the command, so it requires the client to retry the request until the data resides on a single node, leading to a mini-outage where Valkey still has the data but it is inaccessible until the migration is complete for the affected data.
Finally, in a situation where Valkey is attempting to migrate a very large key (such as collections in data types like sorted sets, sets, or lists) from one node to another, the entire key may be too large to be accepted by the target node’s input buffer leading to a blocked migration that needs manual intervention.
To unblock the migration you end up losing data either through forcing the slot assignment or deleting the key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can increase the buffer limit so this goes through, you don't need to lose data.

Finally, in a situation where Valkey is attempting to migrate a very large key (such as collections in data types like sorted sets, sets, or lists) from one node to another, the entire key may be too large to be accepted by the target node’s input buffer leading to a blocked migration that needs manual intervention.
To unblock the migration you end up losing data either through forcing the slot assignment or deleting the key.

In Valkey, the keyspace is not really flat, but instead keys are bundled into one of 16,384 ‘slots’ and each node takes one or more slots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Valkey, the keyspace is not really flat, but instead keys are bundled into one of 16,384 ‘slots’ and each node takes one or more slots.
In Valkey, keys are bundled into one of 16,384 ‘slots’ and each node takes one or more slots.

Is the flat comment needed, seems like it may be confusing since we do call the keyspace flat.


In Valkey, the keyspace is not really flat, but instead keys are bundled into one of 16,384 ‘slots’ and each node takes one or more slots.
In Valkey 9.0 instead of being key-by-key, Valkey migrates entire slots at a time, atomically moving the slot from one node to another using the AOF format.
AOF does not work on a key-by-key basis, but instead AOF plays back all the operations that make up the data, so the input buffer only has to deal with individual items in collections and those will never exceed the size of the input buffer, avoiding large key migration issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the aof format is also key by key. The important information is we copy all the data from the source to the primary before we delete any data on the source.

The second half of this sentence is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants