Skip to content

Conversation

@iskettaneh
Copy link
Contributor

@iskettaneh iskettaneh commented Oct 17, 2025

Backport 1/1 commits from #155063.

/cc @cockroachdb/release


This commit fixes a bug where if the server controller is being drained, the drain command would return a response without populating the IsDraining. The CLI interprets this as if the draining is completed.

server: return draining progress if the server controller is draining

This commit fixes a bug where if the server controller is being drained, the drain command would return a response without populating the IsDraining field. The CLI interprets this as if the draining is completed.

Output example when running:

./cockroach node drain 3 --insecure --url {pgurl:3} --logtostderr=INFO

I251008 15:54:42.613200 15 2@rpc/peer.go:613  [rnode=?,raddr=10.142.1.228:26257,class=system,rpc] 1  connection is now healthy
node is draining... remaining: 3
I251008 15:54:42.622586 1 cli/rpc_node_shutdown.go:184  [-] 2  drain details: tenant servers: 3
node is draining... remaining: 3
I251008 15:54:42.824526 1 cli/rpc_node_shutdown.go:184  [-] 3  drain details: tenant servers: 3
node is draining... remaining: 1
I251008 15:54:43.026405 1 cli/rpc_node_shutdown.go:184  [-] 4  drain details: tenant servers: 1
node is draining... remaining: 1
I251008 15:54:43.228596 1 cli/rpc_node_shutdown.go:184  [-] 5  drain details: tenant servers: 1
node is draining... remaining: 243
I251008 15:54:44.580413 1 cli/rpc_node_shutdown.go:184  [-] 6  drain details: liveness record: 1, range lease iterations: 175, descriptor leases: 67
node is draining... remaining: 0 (complete)
drain ok

Release note (bug fix): fixed a bug in the drain command where draining a node using virtual clusters (such as clusters running Physical Cluster Replication) could return before the drain was complete, possibly resulting in shutting down the node while it still had active SQL clients and ranges leases.

Epic: None

Release justification: need to backport a bug fix to draining where we used to not fully drain nodes when using virtual clustters.

This commit fixes a bug where if the server controller is being drained,
the drain command would return a response without populating the
IsDraining field. The CLI interprets this as if the draining is
completed.

Output example when running:

```
./cockroach node drain 3 --insecure --url {pgurl:3} --logtostderr=INFO

I251008 15:54:42.613200 15 2@rpc/peer.go:613  [rnode=?,raddr=10.142.1.228:26257,class=system,rpc] 1  connection is now healthy
node is draining... remaining: 3
I251008 15:54:42.622586 1 cli/rpc_node_shutdown.go:184  [-] 2  drain details: tenant servers: 3
node is draining... remaining: 3
I251008 15:54:42.824526 1 cli/rpc_node_shutdown.go:184  [-] 3  drain details: tenant servers: 3
node is draining... remaining: 1
I251008 15:54:43.026405 1 cli/rpc_node_shutdown.go:184  [-] 4  drain details: tenant servers: 1
node is draining... remaining: 1
I251008 15:54:43.228596 1 cli/rpc_node_shutdown.go:184  [-] 5  drain details: tenant servers: 1
node is draining... remaining: 243
I251008 15:54:44.580413 1 cli/rpc_node_shutdown.go:184  [-] 6  drain details: liveness record: 1, range lease iterations: 175, descriptor leases: 67
node is draining... remaining: 0 (complete)
drain ok
```

Release note (bug fix): fixed a bug in the drain command where draining
a node using virtual clusters (such as clusters running Physical
Cluster Replication) could return before the drain was complete,
possibly resulting in shutting down the node while it still had active
SQL clients and ranges leases.

Epic: None
@iskettaneh iskettaneh requested review from a team as code owners October 17, 2025 17:56
@blathers-crl
Copy link

blathers-crl bot commented Oct 17, 2025

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes. Includes test-only changes, build system changes, etc.
  • Fixes for serious issues. Defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to.

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-kv KV Team labels Oct 17, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link

blathers-crl bot commented Oct 17, 2025

❌ PR #155642 does not comply with backport policy

Confidence: medium
Explanation: This PR modifies production files, specifically pkg/server/drain.go and pkg/server/server_controller.go. It addresses a problem where the drain command might not properly indicate ongoing drain processes, potentially leading to premature node shutdowns. Though it appears to fix a significant issue that could affect node stability, it does not strictly meet the criteria for a critical bug fix under the provided policy. There is no explicit mention of this fix being tied to a disabled feature flag, nor are there new cluster or session settings introduced to control the behavior dynamically. The change could potentially lead to improved stability, addressing a significant operational pitfall, but without direct evidence of incidents, it falls short of the critical bug criteria directly related to stability or data loss.
Recommendation: Reconsider the backport, or gate the fix behind a feature flag or provide an explicit release justification that confirms the urgency and necessity of this change as per the backporting policy criteria.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@iskettaneh iskettaneh requested a review from arulajmani October 17, 2025 19:55
@iskettaneh iskettaneh merged commit 6a592b2 into cockroachdb:release-24.3 Oct 22, 2025
15 of 16 checks passed
@celiala
Copy link
Collaborator

celiala commented Oct 27, 2025

blathers backport staging-v24.3.22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-kv KV Team target-release-24.3.23

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants