Skip to content

Conversation

@celiala
Copy link
Collaborator

@celiala celiala commented Oct 28, 2025

Backport 2/2 commits from #156282.

/cc @cockroachdb/release


Backport 1/1 commits from #148542.

/cc @cockroachdb/release


Previously, tracing transactions in CRDB could be enabled using a single cluster setting: sql.trace.txn.enable_threshold, which, if it was non-zero, would enable tracing on all transactions and only keep and emit the traces for transactions which exceeded the execution time threshold. This has proven to be difficult to use as it reuquires paying tracing cost on the entire workload.

This change modifies the transaction tracing capabilities of CRDB to incorporate a sample rate. A new cluster setting is introduced, named sql.trace.txn.sample_rate which is zero by default. When set to a positive number between 0 and 1, this will set the probability of a transaction having tracing enabled.

For tracing to emit a trace for a transaction, both settings will now need to be set and will need to both trigger for successful output to the SQL_EXEC logging channel. Transactions first pass through the sample rate filter which governs whether a trace is captured, then they will be evaluated through the enable_threshold which governs whether the trace is emitted to the logs.

Epic: None
Resolves: CRDB-51662

Release note (ops change, sql change): In order to selectively capture traces for transactions running in an active workload without haing to capture them via statement diagnostic bundles, customers can now use the sql.trace.txn.sample_rate cluster setting to enable tracing for a fraction of their workload. The sql.trace.txn.enable_threshold will still need to be set to a positive value to provide a filter for how slow a transaction needs to be after being sampled to merit emitting a trace. Traces are emitted to the SQL_EXEC logging channel.


Release justification: critical observability need that has been approved. disabled by default.

angles-n-daemons and others added 2 commits October 28, 2025 16:12
So that tests can verify unstructured logs as part of their
functionality, we introduce a new, generic logspy, to complement the
common structured logspy.

Fixes: none
Epic: none

Release note: none
Previously, tracing transactions in CRDB could be enabled using a
single cluster setting: `sql.trace.txn.enable_threshold`, which, if
it was non-zero, would enable tracing on all transactions and only
keep and emit the traces for transactions which exceeded the execution
time threshold. This has proven to be difficult to use as it reuquires
paying tracing cost on the entire workload.

This change modifies the transaction tracing capabilities of CRDB to
incorporate a sample rate. A new cluster setting is introduced, named
`sql.trace.txn.sample_rate` which is zero by default. When set to a
positive number between 0 and 1, this will set the probability of a
transaction having tracing enabled.

For tracing to emit a trace for a transaction, both settings will now
need to be set and will need to both trigger for successful output to
the `SQL_EXEC` logging channel. Transactions first pass through the
sample rate filter which governs whether a trace is captured, then
they will be evaluated through the `enable_threshold` which governs
whether the trace is emitted to the logs.

Epic: None
Resolves: CRDB-51662

Release note (ops change, sql change): In order to selectively capture
traces for transactions running in an active workload without haing
to capture them via statement diagnostic bundles, customers can now
use the `sql.trace.txn.sample_rate` cluster setting to enable tracing
for a fraction of their workload. The `sql.trace.txn.enable_threshold`
will still need to be set to a positive value to provide a filter
for how slow a transaction needs to be after being sampled to merit
emitting a trace. Traces are emitted to the `SQL_EXEC` logging
channel.
@celiala celiala requested review from a team as code owners October 28, 2025 23:15
@celiala celiala requested review from Abhinav1299, aa-joshi, arjunmahishi and dhartunian and removed request for a team October 28, 2025 23:15
@blathers-crl
Copy link

blathers-crl bot commented Oct 28, 2025

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes. Includes test-only changes, build system changes, etc.
  • Fixes for serious issues. Defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to.

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-code-systems labels Oct 28, 2025
@blathers-crl
Copy link

blathers-crl bot commented Oct 28, 2025

✅ PR #156409 is compliant with backport policy

Confidence: high
Feature flag detected: Yes
Backward compatible: true
Explanation: The PR complies with the CockroachDB backport policy due to the inclusion of an approved release justification within the PR body. The PR’s body explicitly mentions a 'critical observability need', and it is stated that this change is 'disabled by default', which aligns with the feature flag requirements of the policy. Moreover, the files affected are primarily production code files, but there is a clear indication that the feature introduced by this PR is behind a disabled-by-default feature flag (sql.trace.txn.sample_rate). The PR does not seem to introduce any backward compatibility issues nor does it remove any version gates, maintaining the existing functionality and compatibility. Therefore, all key policy areas are satisfactorily addressed.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich reviewed 2 of 2 files at r1, 11 of 11 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @Abhinav1299, and @arjunmahishi)

@celiala celiala merged commit 74dbb7b into cockroachdb:staging-v25.2.8 Oct 29, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-code-systems v25.2.8

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants