Skip to content

Conversation

@dhartunian
Copy link
Collaborator

@dhartunian dhartunian commented Oct 27, 2025

Backport 1/1 commits from #148542.

/cc @cockroachdb/release


Previously, tracing transactions in CRDB could be enabled using a single cluster setting: sql.trace.txn.enable_threshold, which, if it was non-zero, would enable tracing on all transactions and only keep and emit the traces for transactions which exceeded the execution time threshold. This has proven to be difficult to use as it reuquires paying tracing cost on the entire workload.

This change modifies the transaction tracing capabilities of CRDB to incorporate a sample rate. A new cluster setting is introduced, named sql.trace.txn.sample_rate which is zero by default. When set to a positive number between 0 and 1, this will set the probability of a transaction having tracing enabled.

For tracing to emit a trace for a transaction, both settings will now need to be set and will need to both trigger for successful output to the SQL_EXEC logging channel. Transactions first pass through the sample rate filter which governs whether a trace is captured, then they will be evaluated through the enable_threshold which governs whether the trace is emitted to the logs.

Epic: None
Resolves: CRDB-51662

Release note (ops change, sql change): In order to selectively capture traces for transactions running in an active workload without haing to capture them via statement diagnostic bundles, customers can now use the sql.trace.txn.sample_rate cluster setting to enable tracing for a fraction of their workload. The sql.trace.txn.enable_threshold will still need to be set to a positive value to provide a filter for how slow a transaction needs to be after being sampled to merit emitting a trace. Traces are emitted to the SQL_EXEC logging channel.


Release justification: critical observability need that has been approved. disabled by default.

@blathers-crl
Copy link

blathers-crl bot commented Oct 27, 2025

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes. Includes test-only changes, build system changes, etc.
  • Fixes for serious issues. Defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to.

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-observability labels Oct 27, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link

blathers-crl bot commented Oct 27, 2025

✅ PR #156282 is compliant with backport policy

Confidence: high
Feature flag detected: Yes
Backward compatible: true
Explanation: The pull request is compliant with the backport policy. It introduces new cluster settings that control the behavior of probabilistic transaction tracing, which are disabled by default. The new settings sql.trace.txn.sample_rate and the modifications to sql.trace.txn.enable_threshold allow these features to be enabled or disabled at runtime, satisfying the feature flag requirement for backport policy compliance. Moreover, the Release justification: line in the PR states a critical observability need, providing further justification for the backport. The changes made to the code base primarily revolve around the addition and use of these settings to control the tracing functionality, which supports backward compatibility and ensures that existing functionalities are not negatively impacted.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Contributor

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks!

angles-n-daemons and others added 2 commits October 28, 2025 15:19
So that tests can verify unstructured logs as part of their
functionality, we introduce a new, generic logspy, to complement the
common structured logspy.

Fixes: none
Epic: none

Release note: none
Previously, tracing transactions in CRDB could be enabled using a
single cluster setting: `sql.trace.txn.enable_threshold`, which, if
it was non-zero, would enable tracing on all transactions and only
keep and emit the traces for transactions which exceeded the execution
time threshold. This has proven to be difficult to use as it reuquires
paying tracing cost on the entire workload.

This change modifies the transaction tracing capabilities of CRDB to
incorporate a sample rate. A new cluster setting is introduced, named
`sql.trace.txn.sample_rate` which is zero by default. When set to a
positive number between 0 and 1, this will set the probability of a
transaction having tracing enabled.

For tracing to emit a trace for a transaction, both settings will now
need to be set and will need to both trigger for successful output to
the `SQL_EXEC` logging channel. Transactions first pass through the
sample rate filter which governs whether a trace is captured, then
they will be evaluated through the `enable_threshold` which governs
whether the trace is emitted to the logs.

Epic: None
Resolves: CRDB-51662

Release note (ops change, sql change): In order to selectively capture
traces for transactions running in an active workload without haing
to capture them via statement diagnostic bundles, customers can now
use the `sql.trace.txn.sample_rate` cluster setting to enable tracing
for a fraction of their workload. The `sql.trace.txn.enable_threshold`
will still need to be set to a positive value to provide a filter
for how slow a transaction needs to be after being sampled to merit
emitting a trace. Traces are emitted to the `SQL_EXEC` logging
channel.
@dhartunian dhartunian requested review from a team as code owners October 28, 2025 19:20
@dhartunian dhartunian requested review from Abhinav1299, aa-joshi, arjunmahishi, kyle-a-wong and nkodali and removed request for a team October 28, 2025 19:20
@dhartunian dhartunian merged commit 43e5c88 into cockroachdb:release-25.2 Oct 28, 2025
15 of 16 checks passed
@celiala
Copy link
Collaborator

celiala commented Oct 28, 2025

blathers backport staging-v25.2.8

@blathers-crl
Copy link

blathers-crl bot commented Oct 28, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error updating backport branch to sha a75f51012faea901ee6e28a869bb4b79a6f1fccf: PATCH https://api.github.com/repos/dhartunian/cockroach/git/refs/heads/blathers/backport-staging-v25.2.8-156282: 422 Reference cannot be updated []

Backport to branch staging-v25.2.8 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@nkodali
Copy link
Collaborator

nkodali commented Oct 28, 2025

Noting that I've approved and reviewed this backport.

@celiala
Copy link
Collaborator

celiala commented Oct 28, 2025

blathers backport staging-v25.2.8

@blathers-crl
Copy link

blathers-crl bot commented Oct 28, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating backport branch refs/heads/blathers/backport-staging-v25.2.8-156282: POST https://api.github.com/repos/dhartunian/cockroach/git/refs: 422 Reference already exists []

Backport to branch staging-v25.2.8 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches backport-failed T-observability target-release-25.2.9

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants