Skip to content

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Nov 10, 2017

The best answer I've found to #80, on paper :) Remains to learn how to use it.

Adding the monitoring label because the combination of readiness alerts for key health like under-replicated partitions (https://github.com/Yolean/kubernetes-kafka/pull/95/files#diff-f8da94a0c2daaa5e09e08330d1ed122a) and end-to-end testing like kafka-monitor may pay off better than internal metrics.

In actual troubleshooting scenarios you'll probably still want to connect some JMX tool (allowed since #96) to really dig into the state of things.

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

The UI works, the problem was that I used kubectl port-forward and only forwarded port 8000. Actually once the UI is loaded you can switch to forwarding 8778, and you'll get fancy graphs :)

Metrics also works:

curl localhost:8778/jolokia/read/kmf.services:type=produce-service,name=*/records-produced-rate | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   268  100   268    0     0   2876      0 --:--:-- --:--:-- --:--:--  2913
{
  "request": {
    "mbean": "kmf.services:name=*,type=produce-service",
    "attribute": "records-produced-rate",
    "type": "read"
  },
  "value": {
    "kmf.services:name=single-cluster-monitor,type=produce-service": {
      "records-produced-rate": 54.47272973797498
    }
  },
  "timestamp": 1510324868,
  "status": 200
}

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Remaining issues:

  • Need a way to throttle load on test clusters like minikube. It's pretty significant by default.
  • Metrics are logged at INFO level, lots and lots of it. With GUI, curl and export to monitoring tools that shouldn't be necessary.

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Some info on Prometheus (non-)compatibility: jolokia/jolokia#206. Mentions https://github.com/fabric8io/agent-bond, and also the importance of a whitelist as we discovered in #49.

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Rate can probably be reduced using produce.record.delay.ms, see https://github.com/linkedin/kafka-monitor/wiki/Service-Configuration#produce-service-configuration-parameters.

Here is probably the logging statement. Should be possible to exclude using custom log4j config.

There's also a GraphiteMetricsReporterService so maybe it's trivial to produce a PrometheusMetricsReporterService.

Instead you need some kind of metrics export.

Currently I only get a lot of `records-produced-total` but no latencies etc.
@solsson solsson force-pushed the linkedin-kafka-monitor branch from 5f86aa9 to e5b1acf Compare September 29, 2018 12:49
@solsson solsson modified the milestones: 5.0 - Java 11, 5.1 Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants