Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance AWS DynamoDB instrumentation #1191

Merged
merged 1 commit into from Sep 18, 2020
Merged

Enhance AWS DynamoDB instrumentation #1191

merged 1 commit into from Sep 18, 2020

Conversation

ghost
Copy link

@ghost ghost commented Sep 10, 2020

Resolves #1181

  • enhanced DynamoDB instrumentation for AWS SDK 2.2 (added DB attributed)
  • added tests to cover DynamoDB requests separately
  • a bit of refactoring

@ghost ghost changed the title [otel #1181] Enhance AWS DynamoDB instrumentation Enhance AWS DynamoDB instrumentation Sep 11, 2020
@anuraaga
Copy link
Contributor

This turned out much larger than I was anticipating :-)

One thing I should have mentioned earlier is it's on my plate to refactor the handling of the "custom attributes" in a generic way. AWS has essentially semantic conventions defined for the AWS SDK, which we implement in xray and think otel should as well since they're benifecial for any tracing vendor. They are a generic transformation of the JSON itself like here

https://github.com/aws/aws-xray-sdk-java/blob/master/aws-xray-recorder-sdk-aws-sdk-v2/src/main/java/com/amazonaws/xray/interceptors/TracingInterceptor.java#L187
https://github.com/aws/aws-xray-sdk-java/blob/master/aws-xray-recorder-sdk-aws-sdk-v2/src/main/resources/com/amazonaws/xray/interceptors/DefaultOperationParameterWhitelist.json

So it'd be great if we can hold off on adding refactoring to make a generic way of adding custom attributes since I think much of the logic we have right now will actually go away.

Also, I don't think the intention of the DB conventions is to normalize DB statements into something SQL-like. DynamoDB is essentially the same type of DB as MongoDB, but the example recommends not setting statement probably because it's JSON, not a statement

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/database.md#mongodb

So I don't think we should fill statement, which makes a lot of the complexity go away I think, the PR could probably just be something like

if (awsServiceName.equals("DynamoDB")) {
  db.system = "dynamodb"
  db.operation = awsOperationName
}

Sorry for not being stronger about not including statement in the issue itself ><

@anuraaga
Copy link
Contributor

Alternatively, if we do really want to include statement, I would just put the SdkResult JSON as-is, without any sort of transformation logic on our side. But given the example for MongoDB in the spec, I wouldn't recommend it.

@iNikem
Copy link
Contributor

iNikem commented Sep 11, 2020

@anuraaga can you elaborate on the handling of the "custom attributes" in a generic way? I did not quite get it, what attributes are you talking about :)

@iNikem
Copy link
Contributor

iNikem commented Sep 11, 2020

I can see the benefit of having some sort of statement-like info on DynomoDB span. I am not very familiar with AWS SDK and I don't see what json are we talking about? :) But I do see that subclasses of DynamoDbRequest seem to have nice toString method. May be that will be enough?

@anuraaga
Copy link
Contributor

can you elaborate on the handling of the "custom attributes" in a generic way? I did not quite get it, what attributes are you talking about :)

Yeah basically we have a mapping from JSON fields to semantic convention attributes in this data file. So we generically map SDK requests to richer information using it. One of my "job milestone tasks" is to port those conventions to OTel ;)

I can see the benefit of having some sort of statement-like info on DynomoDB span.

Yeah I think .toString() is probably essentially the JSON that would have been sent, though I haven't checked. I'd be ok with that - but it might be worth confirming that we should be doing similar for mongodb too in the spec itself, I don't see a reason for dynamodb and mongodb to differ in terms of semantic conventions since they're so similar.

@trask
Copy link
Member

trask commented Sep 11, 2020

it might be worth confirming that we should be doing similar for mongodb too in the spec itself, I don't see a reason for dynamodb and mongodb to differ in terms of semantic conventions since they're so similar.

I think the spec section on mongodb is just an example, e.g. it doesn't show db.connection_string being captured either.

Our MongoDB instrumentation captures both db.connection_string and db.statement.

Capturing db.statement does bring PII problems though, e.g. for MongoDB we keep the json keys and scrub most of the json values.

@anuraaga
Copy link
Contributor

I should have looked at the mongodb code :D Seems like we record a scrubbed JSON object there.

@kubawach Do you think it's possible to implement similar scrubbing in a generic way instead of having to manually serialize each different type of request? I think for the most part, any SdkField that ends up as a type of AttributeValue we can scrub and otherwise write them as is looking at the Marshalling information in the field. It would also be great to reduce the scope of the PR, it's hard to judge the refactoring along with the logic changes. Our current instrumentation is very simple logic I think, it's easier to understand refactoring after things get complex not before :) I might recommend

  • PR to add db attributes except for statement
  • PR to add statement attribute using generic serialization mechanism
  • PR to refactor if things crossed a complexity line after those

What do you think?

@iNikem
Copy link
Contributor

iNikem commented Sep 13, 2020

I like the idea of smaller PRs

@ghost ghost marked this pull request as ready for review September 14, 2020 08:26
@iNikem
Copy link
Contributor

iNikem commented Sep 14, 2020

  • PR to add db attributes except for statement
  • PR to add statement attribute using generic serialization mechanism
  • PR to refactor if things crossed a complexity line after those

@kubawach ^^

@ghost
Copy link
Author

ghost commented Sep 14, 2020

Thanks for the comments, I appreciate all of them. I'm really happy to see that the proposal sparked a discussion regarding handling of the DynamoDB.

Before I attend the comments and proposals, let me give you some background my motivations of particular design choices.

  1. I come from business application development (over 10 years). I could say that I switched from consumer / user of telemetry solutions to the programmer / developer side. Over lat years I appreciated a lot when my telemetry gave me as many data points as possible, especially when issues (ie of performance nature) occurred. In such cases attributes like "statement" come very handy tracing root cause. Thus a decision to have that included.

  2. DynamoDB domain-specific language is quite hard to read. It evolved over the years, for example switching from "Expected" to "ConditionExpression". However the first attribute is still present in the API, cluttering serialised form. Therefore I decided to include only relevant attributes and introduce a "normalisation" of the requests, making it akin to well-known SQL and easier to read by most developers. All above - in order to maintain readability.

Regarding ideas that were mentioned in the course of the discussion (thanks once again):

@iNikem I believe that regular toString() serialisation of requests won't do the trick, as for exemplary request it looks like that: "CreateTableRequest(AttributeDefinitions=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@103f73a, TableName=sometable, KeySchema=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@103f73a, LocalSecondaryIndexes=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@103f73a, GlobalSecondaryIndexes=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@103f73a)"

@anuraaga regarding "generic" scrubbing instead of custom-per-request one - happy to do so, but we need to mind two issues here:

  1. PII as pointed out by @trask
  2. Readability (limiting number of attributes serialized)
    Therefore if we decide, as a community, to resign from custom ("normalised") form and include attributes, I believe we should scrub keep a kind of "allowed attributes" list and include only these.

@anuraaga regarding splitting the PR - happy to do so as well, once we agree on the final approach (mainly attributes scrubbing). I'd then go for 2 separate PRs:

  1. Adding generic DynamoDB attributes + general refactoring (those are really tightly coupled - adding a way to decorate spans per AwsSdkRequest type)
  2. Generating statements for DynamoDbRequests.

Let me know what you think :)

@iNikem
Copy link
Contributor

iNikem commented Sep 14, 2020

I understand your point of view, @kubawach. I think we have a contention around db.statement, but we generally agree that DynamoDB spans should have DB semantic convention. Therefore, I would propose to make first PR which adds those and to use the simplest thing possible for db.statement: just simple class name.

Then, as a follow up task, let us discuss our options how to add proper db statement. That will be tougher task, so let us address it separately.

@ghost
Copy link
Author

ghost commented Sep 14, 2020

@iNikem great idea, splitting the MR then.

@ghost
Copy link
Author

ghost commented Sep 14, 2020

Dear all (@iNikem and @anuraaga in particular) - I've just split the MR to cover base refactoring / DB attributes here and statement generation in #1199 Please move with discussion there or to appropriate issue #1198 :)

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubawach Thanks a lot for the summary of your reasoning it helps me a lot :) I agree that we want to add statement in some form, and thanks for separating it out since it's definitely a much harder topic than the standard DB attributes so glad to focus on those here.

@Override
public void decorate(Span span, SdkRequest sdkRequest, ExecutionAttributes attributes) {

span.setAttribute(SemanticAttributes.DB_SYSTEM.key(), "dynamodb");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SemanticAttributes.DB_SYSTEM.set(span, "dynamodb") makes it easier to find non-semantic attributes in the future

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small nit, and addressing it will kick the CI so hopefully it doesn't flake

@iNikem iNikem merged commit e8b5488 into open-telemetry:master Sep 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance AWS DynamoDB instrumentation with db.* attributes
4 participants