fix(684): Reset max value metrics #709

joshua-spacetime · 2024-01-09T00:32:02Z

Fixes #684.

Prometheus does not reset metrics.
It only updates them with new values that it collects. If it does not receive a new value for a metric, it will keep the old value.

Therefore it is important to periodically reset certain metrics. Namely gauge metrics that only receive updates occasionally.

Maximum reducer duration is one such example,
since some reducers run much more infrequently than others.

Description of Changes

Please describe your change, mention any related tickets, and so on here.

API and ABI breaking changes

If this is an API or ABI breaking change, please apply the
corresponding GitHub label.

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

Fixes #684. Prometheus does not reset metrics. It only updates them with new values that it collects. If it does not receive a new value for a metric, it will keep the old value. Therefore it is important to periodically reset certain metrics. Namely gauge metrics that only receive updates occasionally. Maximum reducer duration is one such example, since some reducers run much more infrequently than others.

joshua-spacetime · 2024-01-09T00:35:07Z

crates/core/src/db/datastore/locking_tx_datastore/mod.rs

-            .entry(hash(workload, db, reducer))
+            .entry((*db, *workload, reducer.to_owned()))


Note: This change is not related to the main fix. This was an optimization to avoid extra allocations when recording metrics. But I'd rather take the allocations than deal with a potential hash collision when debugging a performance issue.

cloutiertyler

LGTM

Max value metrics are currently reset to zero after each collection period. However this won't be reflected in the metrics store immediately. Once the database records new values for said metrics, only then will the reset be reflected in the metrics store. This subtlety is imperceptible for metrics that are frequently updated. But if a metric is not frequently updated, or for periods of reduced load in general, the metrics store may present stale max value metrics. This anomaly was thought to have been fixed by #709. However it only fixed it for DB_METRICS. This patch removes the anomaly for WORKER_METRICS as well.

joshua-spacetime commented Jan 9, 2024

View reviewed changes

joshua-spacetime enabled auto-merge January 9, 2024 00:46

cloutiertyler approved these changes Jan 9, 2024

View reviewed changes

joshua-spacetime added this pull request to the merge queue Jan 9, 2024

Merged via the queue into master with commit 906803f Jan 9, 2024

joshua-spacetime deleted the joshua/fix/684/reset-metrics-for-max-values branch January 9, 2024 04:56

joshua-spacetime mentioned this pull request Jan 16, 2024

fix: Reset max queue length and reducer wait time metrics #725

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(684): Reset max value metrics #709

fix(684): Reset max value metrics #709

Uh oh!

joshua-spacetime commented Jan 9, 2024

Uh oh!

joshua-spacetime Jan 9, 2024 •

edited

Loading

Uh oh!

cloutiertyler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		.entry(hash(workload, db, reducer))
		.entry((db, workload, reducer.to_owned()))

fix(684): Reset max value metrics #709

fix(684): Reset max value metrics #709

Uh oh!

Conversation

joshua-spacetime commented Jan 9, 2024

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Uh oh!

joshua-spacetime Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloutiertyler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joshua-spacetime Jan 9, 2024 •

edited

Loading