Skip to content

[Feature] Add tutorial to explain how to collect metrics from operator for Prometheus  #921

@Yicheng-Lu-llll

Description

@Yicheng-Lu-llll

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

metrics from the operator:

As I see the code here:

// Define all the prometheus counters for all clusters
var (
clustersCreatedCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "ray_operator_clusters_created_total",
Help: "Counts number of clusters created",
},
[]string{"namespace"},
)
clustersDeletedCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "ray_operator_clusters_deleted_total",
Help: "Counts number of clusters deleted",
},
[]string{"namespace"},
)
clustersSuccessfulCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "ray_operator_clusters_successful_total",
Help: "Counts number of clusters successful",
},
[]string{"namespace"},
)
clustersFailedCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "ray_operator_clusters_failed_total",
Help: "Counts number of clusters failed",
},
[]string{"namespace"},
)
)

And according to kubebuilder doc, It is possible to collect the above metrics (e.g ray_operator_clusters_created_total) and the default metrics created by the controller run time.

How to collect the metrics from the operator:

kind create cluster
helm install kuberay-operator kuberay/kuberay-operator --version 0.4.0
helm install raycluster kuberay/ray-cluster --version 0.4.0  
kubectl port-forward svc/kuberay-operator 8080:8080
# Then, see the result in http://localhost:8080/metrics.
# You can see metrics kuberay defined and default metrics created by the controller run time.
Later it can be collected by Prometheus monitor.

Reason to also run helm install raycluster kuberay/ray-cluster --version 0.4.0 :

According to the code here:

if len(headPods.Items) == 0 || headPods.Items == nil {
// create head pod
r.Log.Info("reconcilePods ", "creating head pod for cluster", instance.Name)
common.CreatedClustersCounterInc(instance.Namespace)
if err := r.createHeadPod(*instance); err != nil {
common.FailedClustersCounterInc(instance.Namespace)
return err
}
common.SuccessfulClustersCounterInc(instance.Namespace)

Creating raycluster will increase some metrics and it is expected to see increased metrics in http://localhost:8080/metrics.

Some background I collect:

Use case

Collecting metrics from the operator is useful. It helps to debug/benchmark/visualize the operator.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions