-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat: Append existing resource labels to all related metrics #2129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Welcome @alen-z! |
8b18b6e to
d6ceb15
Compare
d6ceb15 to
97965d3
Compare
|
/assign @dgrisonnet |
|
Hi @dashpole, thank you for taking this one further. Hey @dgrisonnet, nice to meet you. Currently there's support for Pod and Service resources. I'll be extending the code to all resources, but until then please let me know if there's anything else that needs to be done or done differently to comply with your expectations. New commits soon... |
97965d3 to
aa2f6b9
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: alen-z The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Planning to allocate some time to finish this PR soon... |
|
Hi @alen-z sorry for the delay, I somehow missed this PR. One thing I am not sure I understand properly is how different your proposal is to the existing |
Hi @dgrisonnet, yes. With current state, as far as I understand, we can not get arbitrary/custom Prometheus metric labels into all metrics that kube-state-metrics provides — we want to include labels in metrics from e.g. existing Pod or Service labels. Having custom labels in only one kube-state-metrics metric ( Specific use case is: We have certain set of metadata related to our product (kind of service directory metadata) that we deploy along with our Kubernetes resources in form of resource manifest labels. Those manifest labels contain information about teams that own the product, communication channel etc. — now, we want to route native Prometheus alerts based on this custom owner label to proper teams! This is only possible if, among other metrics, kube-state-metrics has owner label in its metrics. This is only one use case, there might be others related to visualization of Prometheus data based on filters or for pulling out some statistics grouped by custom label... Please let me know if there is different approach to our use case that I might have missed. |
That's totally intentional because having these labels in all the metrics would increase the cardinality of the metrics tremendously and would cause kube-state-metrics to generate even more timeseries than it is currently. Leading to higher memory consumption from Prometheus and potentially higher cost as well for users using a SAS monitoring solution. Your request is very similar to these two ones that were opened some time ago:
And both your use case and their requests can be solved today by using joins in the promQL queries of your alerts on |
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
We have a very similar use case. The problem is we don't use prometheus for queries, although we do still collect metrics via prometheus in certain instances (like kube-state-metrics). Really it's via the otel-collect prometheus receiver. So unfortunately the solution to join metrics at query time does not currently work for us. |
|
@DanTulovsky you could perhaps use the Prometheus agent to aggregate the metrics in the way you want and then remote-write them to your otel collector? |
|
IMO your needs are outside of ksm scope and need a middleware to be achieved such as the Prometheus agent. |
|
I understand your concerns, but since this is an off by default feature, it would not break any existing users. People who choose to use this will need to understand the ramifications. For us, it would make things much simpler. The real solution here is for something like a kube-state-metrics receiver in the otel-collector :) |
|
Still, having something like that is out of scope for kube-state-metrics. The only purpose of kube-state-metrics is to expose metrics about Kubernetes objects with a 1:1 mapping. Aggregation and relabeling should be done by other tools.
This is not a valid reason to add a feature to a project. At the end of the day, we would have to maintain that new functionality and most likely extend it for future needs. No extending the project past its original purpose is intentional to keep it maintainable and avoid making it too complex to use for the users. |
|
I guess I don't understand how this is different from |
|
Hi @dgrisonnet 👋
I would let cluster operators decide what they want to do and provide them with the mechanisms to do it. This goes along what @DanTulovsky is saying with "People who choose to use this will need to understand the ramifications". Nothing would change by default for existing users.
I'd say yes in theory, you are right, but the reality is many alert definitions are used from here, not being applicable to change (use join).
Which is not a very convenient place for us and I'd argue for number of other people and use cases too.
It'd still expose information about objects with 1:1 mapping, only to describe them better with their own labels that already exist? Aggregation without it is hard, not impossible, but hard in reality. -- Ultimately, @dgrisonnet and @rexagod, I'm interested should we keep this PR open and continue the work here or should we close and fork it privately to support our use case, without impacting this project? I suspected this might create some controversy to be honest, so not really surprised to meet the skepticism :) Good discussion. Looking forward to hear from you. |
In the current case, we add this information only once per resource. That's already very expensive if you expose all the labels of all the resources in your clusters hence the warning, but if you were to do that for all the metrics exported by kube-state-metrics, the additional cost would be tremendous. |
Ultimately that's the goal with kube-state-metrics. Empowering cluster operators to tune the project to match their needs without needing any knowledge about the codebase. That's why we've released support for CRD and more recently have been discussing making kube-state-metrics configuration based: #2165. But, on top of the additional resource consumption from storing the labels information multiple times, adding them to all the metrics would go against the current state of kube-state-metrics where we have metrics about specific parts of Kubernetes object. Each one of them have one unique purpose and that's how we currently keep kube-state-metrics simple to use and consume. Because of that, it would be wrong to add unrelated information to the various metrics. So with the current state of kube-state-metrics, I don't think it is wise to add a mechanism that goes against the current design. Additionally, if we commit to this feature that goes against the best practices that we've put in place, we will never be able to move away from it without breaking users. However, I am not against making that possible once we fully empower users to tweak kube-state-metrics however they want.
Charts are hard to extend which is why https://github.com/prometheus-operator/kube-prometheus is not providing them but instead raw manifests that are generating from Jsonnet, which are way easier to extend. I would recommend looking into it if you want to update the rules to match your needs.
So then why make an exception for labels and not also add annotations, containers, statuses, ...? This is a far fetched example and I know why you specifically want labels, but that would break a boundary that we've set between the information.
I totally hear that, but I don't see a way we could improve that today without breaking the current UX of kube-state-metrics.
As you can probably tell, I am not a fan of adding this feature to ksm as it would be considered out of scope right now and I'd prefer leaving any kind of aggregation to a third party. |
|
I'd prefer much stronger arguments, but closing the PR. |
|
I mean the arguments are pretty strong to me:
Although I said that this feature might be reconsidered in the future if we empower our users even more, I would still discourage anyone from doing that since duplicating data is a waste of resource. |
What this PR does / why we need it:
KSM is great, but to make it even better (fit our use case) here's the proposal on how to expand resource related metrics with custom labels. Labels are expected to already be defined as part of the resource.
Now supported resources:
Issue is that Prometheus
relabelingscan't be used for KSM because Prometheus scrapes only KSM container which provides metrics for many resources scattered around the cluster.Ultimately, this should allow us to get custom labels in Alertmanager without any default alert rule changes in Prometheus. Having custom labels can significantly improve alert routing based on more detailed labels.
Example:
Set
--metric-labels-append=regex=meta.example.com/(.*),productand get:Proposed phases
replacementmechanism whenregexis used. Similar to Prometheus relabelings. This should allow us to rename matched labels.How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Increases. Adding labels to exising metrics.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes prometheus-operator/kube-prometheus#887
Fixes #536