Skip to content

Conversation

@ZhouXing19
Copy link
Collaborator

@ZhouXing19 ZhouXing19 commented Oct 28, 2025

Informs: #150015

Rebased from #156307

Release note: TBD

@cockroach-teamcity
Copy link
Member

This change is Reviewable

This commit introduces a new table-level storage parameter
`canary_window` that can be specified in CREATE TABLE or ALTER TABLE
statements. The canary window specifies how long newly collected
statistics will remain in "canary" state before being promoted to
stable, enabling gradual rollout of new statistics to mitigate
performance regressions.

When set to a non-zero duration, this parameter enables canary
statistics rollout for the table. During the canary window, the cluster
setting sql.stats.canary_fraction (introduced in the next commit)
determines what percentage of queries use the new canary statistics
versus the previous stable statistics, providing a buffer period for
observation and intervention.

The window is capped by 48 hours to avoid an outrageously long canary
window.

Note that this commit adds only syntax support and storage for the
parameter. The actual canary statistics selection logic will be
implemented in subsequent commits.

Release note (sql change): A new table storage parameter `canary_window`
has been introduced to enable gradual rollout of newly collected table
statistics. It takes a duration string as the value, with maximum
allowed duration 48 hours. When set with a non-negative duration, the
new statistics remain in a "canary" state for the specified duration
before being promoted to stable. This allows for controlled exposure
and intervention opportunities before statistics are fully deployed
across all queries.
See release note for details. Note that this cluster setting doesn't
apply to internal queries.

Release note (sql change): introduce the cluster setting `sql.stats.canary_fraction`
which takes a float number within range [0, 1]. Its value determines
what fraction of queries will use "canary statistics" (newly collected
stats within their canary window) versus "stable statistics"
(previously proven stats). For example, a value of 0.2 means 20% of
queries will test canary stats while 80% use stable stats. The
selection is atomic per query: if a query is chosen for canary
evaluation, it will use canary statistics for ALL tables it references
(where available). A query never uses a mix of canary and stable
statistics.
…indow

This commit implements the core logic for canary statistics rollout,
allowing gradual deployment of newly collected full statistics.
Previously, all queries would immediately use the most recent full
statistics, which could cause performance regressions if the new full
statistics were inaccurate.

The implementation adds a `CanaryWindowSize` field in table descriptors
and catalog interfaces to define the canary period, along with logic in
the statistics builder to skip "canary" statistics (the latest stats
within the canary window) when not using the canary path. The cluster
setting `sql.stats.canary_fraction` controls what percentage of queries
use canary statistics.

Release note (sql change): implement canary full statistics rollout core logic, which
is configurable via the table-level storage paramter
(`canary_window`) and the cluster setting
`sql.stats.canary_fraction`.
… selection

This commit adds a new session variable `stats_as_of` that allows
controlling statistics selection based on a specific timestamp rather
than the current time. Previously, statistics selection was always
relative to the current wall clock time, making it difficult to get
consistent query plans for historical analysis or testing.

This feature is only for debugging and troubleshooting, and should not
be used in production.

The implementation is also integrated into the existing canary
statistics logic to respect the as-of timestamp when determining canary
window boundaries.

Release note (sql change): adds a new session variable `stats_as_of`
that allows controlling statistics selection based on a specific
timestamp rather than the current time.
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants