-
Notifications
You must be signed in to change notification settings - Fork 26
PgBackrest Dashboard and alert rules #1109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
return { | ||
"override": "replace", | ||
"summary": "pgbackrest metrics exporter", | ||
"command": "/usr/bin/pgbackrest_exporter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use /start-pgbackrest-exporter.sh
at the moment because the current script is not prepared to be run in a non-snapped environment, reference, like it is done for start-exporter.sh
script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Could you create an issue in the repo for us to address this later? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1109 +/- ##
==========================================
+ Coverage 73.22% 73.34% +0.12%
==========================================
Files 15 15
Lines 3929 3913 -16
Branches 577 573 -4
==========================================
- Hits 2877 2870 -7
+ Misses 836 829 -7
+ Partials 216 214 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Deezzir! Apologies for the delay in reviewing this PR. Thank you so much for that. I'm starting to review it.
@marceloneppel any progress on the review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this great work, @Deezzir! I left one minor suggestion.
return { | ||
"override": "replace", | ||
"summary": "pgbackrest metrics exporter", | ||
"command": "/usr/bin/pgbackrest_exporter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Could you create an issue in the repo for us to address this later? Thanks.
@marceloneppel it looks like test_restart is trying to tell us something. Should we ask @Deezzir to check it? |
I believe the test is failing due to the recent promotion of the test app. I'm currently testing a fix for that. |
Oh, apologies. Dragomir suspected it would affect PG K8s. :-) |
@taurus-forever, looking at logs, I see that |
@Deezzir, adding a like the following one right before https://github.com/Deezzir/postgresql-k8s-operator/blob/db8f13a4c07b513554faebac4a896fe29160011f/tests/integration/ha_tests/test_restart.py#L41 should make the failing tests pass: await ops_test.model.relate(DATABASE_APP_NAME, f"{APPLICATION_NAME}:database") |
@marceloneppel Added in ea9a459 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue
Right now, it is almost impossible to track the status of backups. There is no alert to tell an operator that a backup failed. The only possible solution is to look for logs that mark a backup as failed.
Grafana Dashboard rev6 was used.
Related issues:
Solution
The changes enable the
pgbackrest_exporter
as a service, add a Grafana Dashboard for the exported metrics, and introduce a new set of alert rules to track failed backups.In addition, the PR adds a CI job that validates the alert rules and unit tests them.
Checklist