-
Notifications
You must be signed in to change notification settings - Fork 660
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
metrics from the operator:
As I see the code here:
kuberay/ray-operator/controllers/ray/common/metrics.go
Lines 9 to 39 in 7374e2c
| // Define all the prometheus counters for all clusters | |
| var ( | |
| clustersCreatedCount = promauto.NewCounterVec( | |
| prometheus.CounterOpts{ | |
| Name: "ray_operator_clusters_created_total", | |
| Help: "Counts number of clusters created", | |
| }, | |
| []string{"namespace"}, | |
| ) | |
| clustersDeletedCount = promauto.NewCounterVec( | |
| prometheus.CounterOpts{ | |
| Name: "ray_operator_clusters_deleted_total", | |
| Help: "Counts number of clusters deleted", | |
| }, | |
| []string{"namespace"}, | |
| ) | |
| clustersSuccessfulCount = promauto.NewCounterVec( | |
| prometheus.CounterOpts{ | |
| Name: "ray_operator_clusters_successful_total", | |
| Help: "Counts number of clusters successful", | |
| }, | |
| []string{"namespace"}, | |
| ) | |
| clustersFailedCount = promauto.NewCounterVec( | |
| prometheus.CounterOpts{ | |
| Name: "ray_operator_clusters_failed_total", | |
| Help: "Counts number of clusters failed", | |
| }, | |
| []string{"namespace"}, | |
| ) | |
| ) |
And according to kubebuilder doc, It is possible to collect the above metrics (e.g ray_operator_clusters_created_total) and the default metrics created by the controller run time.
How to collect the metrics from the operator:
kind create cluster
helm install kuberay-operator kuberay/kuberay-operator --version 0.4.0
helm install raycluster kuberay/ray-cluster --version 0.4.0
kubectl port-forward svc/kuberay-operator 8080:8080
# Then, see the result in http://localhost:8080/metrics.
# You can see metrics kuberay defined and default metrics created by the controller run time.
Later it can be collected by Prometheus monitor.Reason to also run helm install raycluster kuberay/ray-cluster --version 0.4.0 :
According to the code here:
kuberay/ray-operator/controllers/ray/raycluster_controller.go
Lines 390 to 398 in 4714892
| if len(headPods.Items) == 0 || headPods.Items == nil { | |
| // create head pod | |
| r.Log.Info("reconcilePods ", "creating head pod for cluster", instance.Name) | |
| common.CreatedClustersCounterInc(instance.Namespace) | |
| if err := r.createHeadPod(*instance); err != nil { | |
| common.FailedClustersCounterInc(instance.Namespace) | |
| return err | |
| } | |
| common.SuccessfulClustersCounterInc(instance.Namespace) |
Creating raycluster will increase some metrics and it is expected to see increased metrics in http://localhost:8080/metrics.
Some background I collect:
- Add prometheus metrics to internal controller kubernetes-sigs/controller-runtime#132 add Prometheus metrics to the internal controller.
- kuberay binds metrics port to be:
Line 52 in 4714892
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.") - So, listening to port 8080 can collect the operator's metrics.
- Here describe how to add additional metrics from the controller:
- kubebuilder doc: how to publishing-additional-metrics
- Examples: How controller run time adds additional metrics from the controller.
Use case
Collecting metrics from the operator is useful. It helps to debug/benchmark/visualize the operator.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!