Skip to content

Conversation

@joshua-spacetime
Copy link
Collaborator

Fixes #684.

Prometheus does not reset metrics.
It only updates them with new values that it collects. If it does not receive a new value for a metric, it will keep the old value.

Therefore it is important to periodically reset certain metrics. Namely gauge metrics that only receive updates occasionally.

Maximum reducer duration is one such example,
since some reducers run much more infrequently than others.

Description of Changes

Please describe your change, mention any related tickets, and so on here.

API and ABI breaking changes

If this is an API or ABI breaking change, please apply the
corresponding GitHub label.

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

Fixes #684.

Prometheus does not reset metrics.
It only updates them with new values that it collects.
If it does not receive a new value for a metric, it will keep the old value.

Therefore it is important to periodically reset certain metrics.
Namely gauge metrics that only receive updates occasionally.

Maximum reducer duration is one such example,
since some reducers run much more infrequently than others.
Comment on lines -2413 to +2402
.entry(hash(workload, db, reducer))
.entry((*db, *workload, reducer.to_owned()))
Copy link
Collaborator Author

@joshua-spacetime joshua-spacetime Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This change is not related to the main fix. This was an optimization to avoid extra allocations when recording metrics. But I'd rather take the allocations than deal with a potential hash collision when debugging a performance issue.

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joshua-spacetime joshua-spacetime added this pull request to the merge queue Jan 9, 2024
Merged via the queue into master with commit 906803f Jan 9, 2024
@joshua-spacetime joshua-spacetime deleted the joshua/fix/684/reset-metrics-for-max-values branch January 9, 2024 04:56
joshua-spacetime added a commit that referenced this pull request Jan 16, 2024
Max value metrics are currently reset to zero after each collection period.
However this won't be reflected in the metrics store immediately.

Once the database records new values for said metrics,
only then will the reset be reflected in the metrics store.

This subtlety is imperceptible for metrics that are frequently updated.
But if a metric is not frequently updated,
or for periods of reduced load in general,
the metrics store may present stale max value metrics.

This anomaly was thought to have been fixed by #709.
However it only fixed it for DB_METRICS.
This patch removes the anomaly for WORKER_METRICS as well.
github-merge-queue bot pushed a commit that referenced this pull request Jan 17, 2024
Max value metrics are currently reset to zero after each collection period.
However this won't be reflected in the metrics store immediately.

Once the database records new values for said metrics,
only then will the reset be reflected in the metrics store.

This subtlety is imperceptible for metrics that are frequently updated.
But if a metric is not frequently updated,
or for periods of reduced load in general,
the metrics store may present stale max value metrics.

This anomaly was thought to have been fixed by #709.
However it only fixed it for DB_METRICS.
This patch removes the anomaly for WORKER_METRICS as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Max cpu time is not resetting

3 participants