Skip to content

Conversation

@meinenec
Copy link
Contributor

What this PR does / why we need it:

When running with zone-aware ingesters, the chart creates three services for loki-testing-ingester-zone-<zone a-c>-headless but when rollout-operator attempts to call /ingester/prepare-downscale it gets serviceName: loki-testing-ingester-zone-a from the statefulset. This does not have the -headless suffix and the call fails. This causes the ingesters to fail to scale down properly.

Which issue(s) this PR fixes:
Fixes #18174

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@meinenec meinenec requested a review from a team as a code owner July 23, 2025 15:17
Copy link
Contributor

@jkroepke jkroepke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit.

@meinenec
Copy link
Contributor Author

LGTM, just one nit.

apologies for missing that and thank you for reviewing!

Copy link
Contributor

@jkroepke jkroepke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@meinenec meinenec force-pushed the helm-ingester-service branch from cce47c2 to b125910 Compare July 23, 2025 20:58
@meinenec meinenec force-pushed the helm-ingester-service branch from 2585c2c to b61ac2d Compare July 24, 2025 15:19
@meinenec
Copy link
Contributor Author

@jkroepke I was introducing some conflicts while rebasing to update my fork. I'll stop doing that and I have addressed new changelog issues. Apologies for the noise on this PR.

Copy link
Contributor

@jkroepke jkroepke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

LGTM

@Jayclifford345 Jayclifford345 merged commit 2706302 into grafana:main Jul 29, 2025
74 checks passed
jkroepke added a commit to jkroepke/loki that referenced this pull request Jul 29, 2025
@jkroepke
Copy link
Contributor

For record: this PR breaks existing setups:

Error: UPGRADE FAILED: cannot patch "loki-ingester-zone-a" with kind StatefulSet: StatefulSet.apps "loki-ingester-zone-a" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'revisionHistoryLimit', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

We are currently discuss, how we handle this. (extensive note in the CHANGELOG OR revert of the PR)

@abenbachir
Copy link
Contributor

abenbachir commented Jul 30, 2025

We pick-up latest chart 6.34.0 and got that same bug. Is it possible to provide us with a fix in 6.35.0 since we need to use some feature in latest chart.

@jkroepke
Copy link
Contributor

@abenbachir Quick Workaround is delete the Statefulset manually before running helm upgrade.

@Jayclifford345
Copy link
Contributor

Hey all, so we have two options here, and we are happy to take a vote on which you would prefer:
👍 - We include the PR and those using zone aware ingesters manually delete their ingesters and perform the upgrade. No data will be lost, because it lives inside the PersistentVolumeClaim, not on the StatefulSet object. We would update the change log to reflect this known issue with upgrading

👎 - We move to removing this PR which fixes the serviceName in the zone-aware ingesters and consider fixing it when we move to Loki 4.0 helm chart in the future.

@abenbachir
Copy link
Contributor

abenbachir commented Jul 30, 2025

I can do the workaround, but how do we know you guys won't rollback this change in next release and we will be doing another workaround.

Also can you provide excat kubctl commands to delete the ingesters, we would like to try it in our staging env.

@jkroepke
Copy link
Contributor

The mentioned workaround is now part of the official upgrading documentation: https://grafana.com/docs/loki/next/setup/upgrade/upgrade-to-6x/#breaking-zone-aware-ingester-statefulset-servicename-fix-6340

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rollout-operator cannot call /ingester/prepare-downscale for clean scale down

5 participants