Skip to content

Conversation

Deezzir
Copy link

@Deezzir Deezzir commented Sep 27, 2025

Issue

Right now, it is almost impossible to track the status of backups. There is no alert to tell an operator that a backup failed. The only possible solution is to look for logs that mark a backup as failed.

Grafana Dashboard rev6 was used.

Related issues:

Solution

The changes enable the pgbackrest_exporter as a service, add a Grafana Dashboard for the exported metrics, and introduce a new set of alert rules to track failed backups.

In addition, the PR adds a CI job that validates the alert rules and unit tests them.

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

return {
"override": "replace",
"summary": "pgbackrest metrics exporter",
"command": "/usr/bin/pgbackrest_exporter",
Copy link
Author

@Deezzir Deezzir Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use /start-pgbackrest-exporter.sh at the moment because the current script is not prepared to be run in a non-snapped environment, reference, like it is done for start-exporter.sh script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Could you create an issue in the repo for us to address this later? Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

codecov bot commented Sep 27, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.34%. Comparing base (28f4fce) to head (ea9a459).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/charm.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1109      +/-   ##
==========================================
+ Coverage   73.22%   73.34%   +0.12%     
==========================================
  Files          15       15              
  Lines        3929     3913      -16     
  Branches      577      573       -4     
==========================================
- Hits         2877     2870       -7     
+ Misses        836      829       -7     
+ Partials      216      214       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dragomirp dragomirp requested review from a team, dragomirp, marceloneppel and taurus-forever and removed request for a team September 28, 2025 21:33
Copy link
Member

@marceloneppel marceloneppel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @Deezzir! Apologies for the delay in reviewing this PR. Thank you so much for that. I'm starting to review it.

@marceloneppel marceloneppel self-requested a review October 9, 2025 12:06
@Deezzir
Copy link
Author

Deezzir commented Oct 14, 2025

@marceloneppel any progress on the review?

Copy link
Member

@marceloneppel marceloneppel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this great work, @Deezzir! I left one minor suggestion.

return {
"override": "replace",
"summary": "pgbackrest metrics exporter",
"command": "/usr/bin/pgbackrest_exporter",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Could you create an issue in the repo for us to address this later? Thanks.

@taurus-forever
Copy link
Contributor

@marceloneppel it looks like test_restart is trying to tell us something. Should we ask @Deezzir to check it?

@marceloneppel
Copy link
Member

@marceloneppel it looks like test_restart is trying to tell us something. Should we ask @Deezzir to check it?

I believe the test is failing due to the recent promotion of the test app. I'm currently testing a fix for that.

@taurus-forever
Copy link
Contributor

taurus-forever commented Oct 15, 2025

I believe the test is failing due to the recent promotion of the test app. I'm currently testing a fix for that.

Oh, apologies. Dragomir suspected it would affect PG K8s. :-)

@Deezzir
Copy link
Author

Deezzir commented Oct 15, 2025

@taurus-forever, looking at logs, I see that postgresql-test-app fails, not the main charm

@marceloneppel
Copy link
Member

@Deezzir, adding a like the following one right before https://github.com/Deezzir/postgresql-k8s-operator/blob/db8f13a4c07b513554faebac4a896fe29160011f/tests/integration/ha_tests/test_restart.py#L41 should make the failing tests pass:

await ops_test.model.relate(DATABASE_APP_NAME, f"{APPLICATION_NAME}:database")

@Deezzir
Copy link
Author

Deezzir commented Oct 15, 2025

@marceloneppel Added in ea9a459

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants