Add kafka client metrics #6138

jack-berg · 2022-06-04T18:15:35Z

Proposal to bridge metrics from the kafka client library into OpenTelemetry.

As this PR currently stands, only metrics which align with those already declared in the kafka metric semantic convention are bridged. However, the semantic conventions are focused around the use case of monitoring kafka brokers, rather than clients. There's a lot more useful data available, which I'd like the semantic conventions to be extended with.

I've added a utility that makes it easier to explore which metrics are available, and if / how they're mapped into OpenTelemetry metrics. The utility prints out all this info in a markdown table, which I've included in the README.

I could use some feedback about:

The best place to include this code from an artifact standpoint.

** Updated 6/8/22 **
I've updated the PR to be a general purpose bridge between kafka client metrics and opentelemetry metric, rather than implementing as an allow list of known metrics. My reasoning is as follows:

There are 202 metrics exposed by kafka client metrics. I made it through mapping about 60% of them before I decided that manually mapping all the metrics is madness. Its error prone, brittle against changes between kafka client versions, and adds little value.
The metrics kafka client exposes are conceptually similar what we do in our micrometer shim: Kafka client has its own internal metrics system. Its authors have already done the analysis to decide what is important to monitor when using the kafka client, and they've provided a hook for to access the metrics in a generic format. In bridging these metrics to opentelemetry we should use a light hand. We should map the metrics to the appropriate opentelemetry instrument types, but should minimize changes to metric names, descriptions, etc. Trying to map these to semantic conventions is folly, because we don't control the instrumentation at its source and the authors could change the metrics drastically in the next version.
The lack of mapping to semantic conventions doesn't mean this data is not useful. This data is very useful to folks operating applications that produce and consume from kafka. We'd be doing a disservice to opentelemetry users and hampering the adoption of opentelemetry by rejecting good telemetry data because of a lack of semantic conventions. Furthermore, I don't think we should try to codify these metrics in the semantic conventions: 202 metrics is just too much data to review. Additionally, there's likely to be significant differences in the metrics that are exposed in kafka clients from language to language. Does that mean we should withhold this data that's available to java kafka client users? No. There's a place for semantic conventions, and there's a place for bridging in telemetry data that is available but not under our control.

…pentelemetry metrics

brunobat

First review in the project

...library/src/main/java/io/opentelemetry/instrumentation/kafkaclients/KafkaMetricRegistry.java

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java

...library/src/main/java/io/opentelemetry/instrumentation/kafkaclients/KafkaMetricRegistry.java

instrumentation/kafka/kafka-clients/README.md

...c/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetricsTest.java

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java

mateuszrzeszutek · 2022-06-21T11:46:33Z

Kafka client has its own internal metrics system. Its authors have already done the analysis to decide what is important to monitor when using the kafka client, and they've provided a hook for to access the metrics in a generic format. In bridging these metrics to opentelemetry we should use a light hand. We should map the metrics to the appropriate opentelemetry instrument types, but should minimize changes to metric names, descriptions, etc. Trying to map these to semantic conventions is folly, because we don't control the instrumentation at its source and the authors could change the metrics drastically in the next version.

100% agree with that 👍

The only thing that I'm slightly worried about is that we won't expose metrics that are common to all messaging clients (like the HTTP client/server metrics; think http.server.duration but across all messaging instrumentations) - but since these are not defined yet in the spec I suppose we'll have to put adding them off until later.

...library/src/main/java/io/opentelemetry/instrumentation/kafkaclients/KafkaMetricRegistry.java

jack-berg · 2022-06-21T23:39:52Z

Thanks for the review @mateuszrzeszutek!

The only thing that I'm slightly worried about is that we won't expose metrics that are common to all messaging clients (like the HTTP client/server metrics; think http.server.duration but across all messaging instrumentations) - but since these are not defined yet in the spec I suppose we'll have to put adding them off until later.

👍 My thinking is that bridging these metrics does not prevent us from later adding additional metrics which have consistency with other messaging systems.

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java

…te set

jack-berg · 2022-06-22T22:18:50Z

instrumentation/kafka/kafka-clients/kafka-clients-common/library/README.md

+and `[client-id, topic]`. If you analyze the sum of records consumed, ignoring dimensions, backends
+are likely to double count. To alleviate this, `OpenTelemetryKafkaMetrics` detects this
+scenario and only records the most granular set of attributes available. In the case
+of `records-consumed-total`, it reports `[client-id, topic]` and ignores `[client-id]`.


I just added a commit with new logic that does what's described in this comment. I think its a fairly important improvement.

jack-berg · 2022-06-22T22:22:13Z

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java

+        it.remove();
+      } else if (curRegisteredObservable.getInstrumentDescriptor().equals(instrumentDescriptor)
+          && attributeKeys.size() > curAttributeKeys.size()
+          && attributeKeys.containsAll(curAttributeKeys)) {


Here's how I detect whether a metric exists with less granular set of attribute keys, as explained in this part of the readme.

In the process of adding support for this, I shuffled some stuff around for better organization, including moving the table print method to OpenTelemetryKafkaMetricTest. It turns out that all the information needed to print the table can be obtained without any additional surface area.

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java

instrumentation/kafka/kafka-clients/kafka-clients-0.11/testing/build.gradle.kts

instrumentation/kafka/kafka-clients/kafka-clients-common/library/README.md

...library/src/main/java/io/opentelemetry/instrumentation/kafkaclients/KafkaMetricRegistry.java

trask

thx!

...ing/src/main/groovy/io/opentelemetry/instrumentation/kafkaclients/KafkaClientBaseTest.groovy

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java

jack-berg · 2022-07-01T23:14:13Z

...ava/io/opentelemetry/instrumentation/kafkaclients/internal/OpenTelemetryMetricsReporter.java

+ * <p>This class is internal and is hence not for public use. Its APIs are unstable and can change
+ * at any time.
+ */
+public class OpenTelemetryMetricsReporter implements MetricsReporter {


@anuraaga unfortunately this has to be public, so I've moved it to an internal class.

...ava/io/opentelemetry/instrumentation/kafkaclients/internal/OpenTelemetryMetricsReporter.java

instrumentation/kafka/kafka-clients/kafka-clients-common/library/README.md

trask · 2022-07-06T16:34:21Z

thx @jack-berg!

jack-berg added 3 commits June 4, 2022 13:03

Add kafka client metrics

4d1a07b

Spotless

10b67e3

Refactor to general purpose bridge between kafka client metrics and o…

e49f04b

…pentelemetry metrics

jack-berg mentioned this pull request Jun 8, 2022

Library-specific metric semantic conventions open-telemetry/opentelemetry-specification#2610

Open

brunobat reviewed Jun 9, 2022

View reviewed changes

jack-berg added 2 commits June 10, 2022 14:14

Include kafka prefix, fix typo

fe89d1c

Spotless, update readme with latest metric names

ef866fc

jack-berg marked this pull request as ready for review June 13, 2022 23:08

jack-berg requested a review from a team as a code owner June 13, 2022 23:08

mateuszrzeszutek approved these changes Jun 21, 2022

View reviewed changes

mateuszrzeszutek reviewed Jun 21, 2022

View reviewed changes

...library/src/main/java/io/opentelemetry/instrumentation/kafkaclients/KafkaMetricRegistry.java Outdated Show resolved Hide resolved

jack-berg added 2 commits June 21, 2022 18:32

PR feedback

a67afad

Map rate measureables to gauges instead of up down counters

11702a1

mateuszrzeszutek approved these changes Jun 22, 2022

View reviewed changes

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java Outdated Show resolved Hide resolved

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java Outdated Show resolved Hide resolved

mateuszrzeszutek linked an issue Jun 22, 2022 that may be closed by this pull request

Kafka-metrics #5941

Closed

mateuszrzeszutek removed a link to an issue Jun 22, 2022

Kafka-metrics #5941

Closed

jack-berg added 2 commits June 22, 2022 13:57

Spotless, quote attributes, log placeholder

4965b22

Move metric table printing to test, only retain most granular attribu…

049d80f

…te set

jack-berg commented Jun 22, 2022

View reviewed changes

mateuszrzeszutek reviewed Jun 23, 2022

View reviewed changes

jack-berg added 2 commits June 23, 2022 12:35

PR feedback

9a4758e

Remove synchornization from metricChange

071edb1

mateuszrzeszutek approved these changes Jun 24, 2022

View reviewed changes

trask mentioned this pull request Jun 24, 2022

JMX support #6131

Closed

trask reviewed Jun 27, 2022

View reviewed changes

jack-berg added 2 commits June 27, 2022 12:26

remove kafka dependency

681534f

PR feedback

a630bdb

trask approved these changes Jun 30, 2022

View reviewed changes

...ing/src/main/groovy/io/opentelemetry/instrumentation/kafkaclients/KafkaClientBaseTest.groovy Show resolved Hide resolved

mateuszrzeszutek approved these changes Jun 30, 2022

View reviewed changes

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java Outdated Show resolved Hide resolved

jack-berg added 2 commits June 30, 2022 10:08

Fix reset

3c6b04d

Adjust configuration pattern to not rely on GlobalOpenTelemetry

8fce56b

jack-berg commented Jul 1, 2022

View reviewed changes

...y/src/main/java/io/opentelemetry/instrumentation/kafkaclients/OpenTelemetryKafkaMetrics.java Outdated Show resolved Hide resolved

jack-berg commented Jul 1, 2022

View reviewed changes

mateuszrzeszutek reviewed Jul 4, 2022

View reviewed changes

...ava/io/opentelemetry/instrumentation/kafkaclients/internal/OpenTelemetryMetricsReporter.java Outdated Show resolved Hide resolved

Merge into KafkaTelemetry

b367d2d

mateuszrzeszutek mentioned this pull request Jul 5, 2022

Idea: deprecate Config, add agent-only InstrumentationConfig #6264

Merged

mateuszrzeszutek approved these changes Jul 6, 2022

View reviewed changes

instrumentation/kafka/kafka-clients/kafka-clients-common/library/README.md Outdated Show resolved Hide resolved

Relocate readme and fix typo

5f290f5

trask merged commit 3e08f36 into open-telemetry:main Jul 6, 2022

mateuszrzeszutek mentioned this pull request Jul 8, 2022

Refactor existing kafka-clients interceptors in library instrumentation #6291

Open

trask mentioned this pull request Aug 21, 2023

Move existing Kafka metrics to a JMX specific namespace open-telemetry/semantic-conventions#269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kafka client metrics #6138

Add kafka client metrics #6138

jack-berg commented Jun 4, 2022 •

edited

Loading

brunobat left a comment •

edited

Loading

mateuszrzeszutek commented Jun 21, 2022

jack-berg commented Jun 21, 2022

jack-berg Jun 22, 2022

jack-berg Jun 22, 2022

trask left a comment

jack-berg Jul 1, 2022

trask commented Jul 6, 2022

Add kafka client metrics #6138

Add kafka client metrics #6138

Conversation

jack-berg commented Jun 4, 2022 • edited Loading

brunobat left a comment • edited Loading

Choose a reason for hiding this comment

mateuszrzeszutek commented Jun 21, 2022

jack-berg commented Jun 21, 2022

jack-berg Jun 22, 2022

Choose a reason for hiding this comment

jack-berg Jun 22, 2022

Choose a reason for hiding this comment

trask left a comment

Choose a reason for hiding this comment

jack-berg Jul 1, 2022

Choose a reason for hiding this comment

trask commented Jul 6, 2022

jack-berg commented Jun 4, 2022 •

edited

Loading

brunobat left a comment •

edited

Loading