feat: adds distributor lag counter to push.go #18012

aarogoss · 2025-06-09T15:29:39Z

What this PR does / why we need it:
Rather than relying on recording rules to monitor distributor lag metrics, this PR creates a new Prometheus counter in the push.go module.

This counter allow us to track the difference in time from when a distributor receives a log push request and the ingestion payload's most recent log timestamp.

This difference represents how far back in time the logs were captured, giving us insight into distributor "lag". If this counter's values remain steady or increase over time, we know the ingestion agents are falling behind and will eventually start dropping logs.

This counter metric has an additional label, "userAgent". This field is extracted from the HTTP request, providing insight into which ingestion agents are being used by a particular tenant. Should we see incoming log ingestion start to fall behind, we can use this label to provide instructions for customers to adjust the agent configuration specifically

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Rather than relying on recording rules to monitor distributor lag metrics, this PR creates a new prometheus counter in the `push.go` module. This counter allow us to track the difference in time from when a distributor receives a log push request and the ingestion payload's most recent log timestamp. This difference represents how far back in time the logs were captured, giving us insight into distributor "lag". If this counter's values remain steady or increase over time, we know the ingestion agents are falling behind and will eventually start dropping logs. This counter metric has an additional label, "userAgent". This field is extracted from the HTTP request, providing insight into which ingestion agents are being used by a particular tenant. Should we see incoming log ingestion start to fall behind, we can use this label to provide instructions for customers to adjust the agent configuration specifically

…fana/loki into agoss/add-distributor-lag-metric

The OTLP endpoint often experiences an inordinate amount of time skew in the distributor mostRecentLagMs calculation. This filters out any values which are greater than 1B.

aarogoss added 2 commits June 9, 2025 08:10

Merge branch 'main' into agoss/add-distributor-lag-metric

a6d6f6a

pull-request-size bot added the size/S label Jun 9, 2025

aarogoss added 4 commits June 9, 2025 09:29

Merge branch 'main' into agoss/add-distributor-lag-metric

ab80850

chore: minor adjustment to distributor lag metric description

baac6e3

Merge branch 'agoss/add-distributor-laglog -metric' of github.com:gra…

3f0c697

…fana/loki into agoss/add-distributor-lag-metric

chore: one more change to distributor lag metric description

e2cc83c

aarogoss added the component/distributor label Jun 9, 2025

aarogoss self-assigned this Jun 9, 2025

aarogoss and others added 5 commits June 9, 2025 10:34

Merge branch 'main' into agoss/add-distributor-lag-metric

7057d52

chore: filter out 1B deltas from distributor lag metric

83cc176

The OTLP endpoint often experiences an inordinate amount of time skew in the distributor mostRecentLagMs calculation. This filters out any values which are greater than 1B.

chore: remove distributor lag metric log statement

1f9d974

Merge branch 'main' into agoss/add-distributor-lag-metric

0236830

Merge branch 'main' into agoss/add-distributor-lag-metric

5cc8e4a

aarogoss marked this pull request as ready for review June 9, 2025 22:31

aarogoss requested a review from a team as a code owner June 9, 2025 22:31

aarogoss added 3 commits June 10, 2025 07:30

Merge branch 'main' into agoss/add-distributor-lag-metric

a1c9077

Merge branch 'main' into agoss/add-distributor-lag-metric

b478fdf

Merge branch 'main' into agoss/add-distributor-lag-metric

af07c92

paul1r approved these changes Jun 10, 2025

View reviewed changes

aarogoss enabled auto-merge (squash) June 10, 2025 15:22

aarogoss disabled auto-merge June 10, 2025 15:22

aarogoss merged commit 6495be0 into main Jun 10, 2025
65 checks passed

aarogoss deleted the agoss/add-distributor-lag-metric branch June 10, 2025 15:22

paul1r pushed a commit that referenced this pull request Jun 10, 2025

feat: adds distributor lag counter to push.go (#18012)

7311482

This was referenced Aug 11, 2025

chore(k267): release 3.5.0 #18796

Closed

chore(k268): release 3.5.0 #18891

Closed

loki-gh-app bot mentioned this pull request Sep 1, 2025

chore(k270): release 3.5.0 #19075

Closed

loki-gh-app bot mentioned this pull request Sep 8, 2025

chore(k271): release 3.5.0 #19126

Closed

loki-gh-app bot mentioned this pull request Sep 15, 2025

chore(k272): release 3.5.0 #19194

Closed

loki-gh-app bot mentioned this pull request Sep 22, 2025

chore(k273): release 3.5.0 #19248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: adds distributor lag counter to push.go #18012

feat: adds distributor lag counter to push.go #18012

Uh oh!

aarogoss commented Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: adds distributor lag counter to push.go #18012

feat: adds distributor lag counter to push.go #18012

Uh oh!

Conversation

aarogoss commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aarogoss commented Jun 9, 2025 •

edited

Loading