Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metrics rpc conventions implement #4838

Merged
merged 12 commits into from
Dec 11, 2021
Merged

Conversation

yangtaoran
Copy link
Contributor

No description provided.

@yangtaoran
Copy link
Contributor Author

hi @trask,the EasyCLA check have passed,thank you.

Copy link
Member

@mateuszrzeszutek mateuszrzeszutek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Thanks! I've left a couple of comments here, but they're all fairly minor things.

@yangtaoran
Copy link
Contributor Author

Thanks for @mateuszrzeszutek suggestions,and I have resolve them.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx @yangtaoran!

can you add tests similar to HttpClientMetricsTest and HttpServerMetricsTest?

Comment on lines 35 to 42
// the list of optional metrics attributes is from
// https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/rpc.md#attributes
Set<AttributeKey> view = new HashSet<>();
view.add(SemanticAttributes.NET_PEER_IP);
view.add(SemanticAttributes.NET_PEER_NAME);
view.add(SemanticAttributes.NET_PEER_PORT);
view.add(SemanticAttributes.NET_TRANSPORT);
return view;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jsuereth: this follows the spec, but I'm worried about the cardinality issue you raised previously, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the linked doc:

To avoid high cardinality, implementations should prefer the most stable of net.peer.name or net.peer.ip, depending on expected deployment profile. For many cloud applications, this is likely net.peer.name as names can be recycled even across re-instantiation of a server with a different ip.

I.e. you can fallback to net.peer.ip if it's you're only option, but you should JUST use net.peer.name if you can.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth so we should go ahead and capture these same attributes on server metrics (unlike on http server metrics)? I guess the situation is not as bad in RPC-land as in browser-land (except for maybe android/grpc?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to re-evaluate server side using a notion of endpoint vs. net.peer.ip for similar reasons to http. However the specification is based on what gRPC does. I'll do some homework here to see if this is still how that's instrumented.

Key thing though, we should be using hostname over IP because hostname should be more stable, especially in the presence of a load balancer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understanding is usering net.peer.name instead of net.peer.ip if the attribute net.peer.name exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the other issue here is around whether we should be using net.peer.* on RPC server-metrics.

We translated the specification on instrumentation from OpenCensus gRPC that lead to this as a basis. HOWEVER, there's a few caveats:

  1. In OpenCensus instrumentation + Views were separately specified. Instrumentation was thorough, views were more conservative. When this was originally written for OTEL the View spec wasn't known, so that part is unspecified. In OpenCensus, views are where the high cardinality issues were dealt with by use case, and where you could select styles of latency metrics to export (e.g. if you wanted SLO monitoring). I think the balance we struck 13 months ago is likely now wrong given the evolution of our APIs/SDKs unless we provide out-of-the-box VIew configuration for these metrics in addition to the rich instrumentation attributes.
  2. RPC semantic conventions now diverge from HTTP semantic conventions. Effectively.... "client" metrics use net.peer.name and "host" metrics should use host.name in HTTP, but RPC both use net.peer.

For both of these reasons, we may want to re-evaluate RPC metrics. Specifically RPC server metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can modify the RPC conventions implement like this:
Client metrics use net.peer.name (or if unavailable net.peer.ip),net.peer.port,net.transport.
Server metrics use net.host.name (or if unavailable nethost.ip),net.transport,exclude net.host.port.
Do you think this is ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's effectively what I'd do.

For the purpose of this PR, I think if you just don't fill out any net.* attributes on the server side you're fine (because of Resource attributes). As we discussed in the Java-SiG yesterday, it'd be good to mark this instrumentation as "experimental" in some fashion so you can try a few options out and we can see how they work in practice.

@yangtaoran
Copy link
Contributor Author

yangtaoran commented Dec 9, 2021

thx @yangtaoran!

can you add tests similar to HttpClientMetricsTest and HttpServerMetricsTest?

hi @trask I have added RpcClientMetricsTest and RpcServerMetricsTest similar to HttpClientMetricsTest and HttpServerMetricsTest

Comment on lines +72 to +73
// TODO(anuraaga): Remove await from this file after 1.8.0 hopefully fixes
// https://github.com/open-telemetry/opentelemetry-java/issues/3725
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anuraaga I tried removing awaits from HttpClientMetricsTest/HttpServerMetricsTest and it failed for me; looks like this issue is still not solved.

.put(SemanticAttributes.NET_TRANSPORT, "ip_tcp")
.build();

Attributes responseAttributes2 =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about splitting this into 2 test cases? One for host name, one for IP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,according to @jsuereth suggestion,we should pefer to net.peer.name, we can fallback to net.peer.ip if net.peer.name is unavailable.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great, thx!

@trask trask merged commit c59bebf into open-telemetry:main Dec 11, 2021
@trask
Copy link
Member

trask commented Dec 11, 2021

Thanks again @yangtaoran!

RashmiRam pushed a commit to RashmiRam/opentelemetry-auto-instr-java that referenced this pull request May 23, 2022
* add metrics rpc conventions

* handle format violations

* handle format violations

* resolve the comments suggestion

* update RpcClientMetrics format

* update RpcServerMetrics format

* resolve time precision and cardinality issue

* invoke buildServerFallbackView method

* add RpcServerMetricsTest and RpcClientMetricsTest

* server metrics attibutes remove net.perr* and add net.host

* Update instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/rpc/RpcServerMetrics.java

* Update instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/rpc/RpcClientMetrics.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants