Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace two JavaCC lexers with a single JFlex one #2113

Merged
merged 4 commits into from
Jan 26, 2021

Conversation

mateuszrzeszutek
Copy link
Member

I'mm introduce a single property to control the sanitization in next PR - this one just replaces the lexer implementation.

@breedx-splk if you find some time, could you run your petclinic benchmark against the agent built from this branch? I'd be very grateful for that 馃槉

}
}
def jflexTargetDir = file"${project.buildDir}/generated/jflex/sql"
def jflexTask = tasks.create("sqlSanitizerJflex", org.xbib.gradle.plugin.JFlexTask) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jflex plugin was supposed to automatically create a task for this, but it's currently broken... I'll create an issue in their project. Meanwhile, a more manual approach still works

// main query FROM clause
expectingTableName = true;
return false;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to remove redundant else when if{} returns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be subjective, and I don't think there's an established convention in this codebase (I've seen lots of both styles). Should we add to https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/docs/contributing/style-guideline.md? IIRC @jkwatson has expressed this preference previously as well.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach. I definitely appreciate that there are multiple duplicate blocks in type-specific normalizers that have collapsed into one simple statement now! 馃帀

For those who wanted to see some additional context, @mateuszrzeszutek posted some benchmark results here:
#2065 (reply in thread)

I definitely glossed over the middle of the lexer impl, but didn't see anything too shocking. I'm going to also try running the other benchmarks (spring pet clinic rest + a6) and will report back.

@breedx-splk
Copy link
Contributor

breedx-splk commented Jan 26, 2021

I ran the benchmarks this afternoon and got these results:
Looking great! 馃帀

whch jar avg (ms) p95 (ms)
No agent 77.91 103.46
Latest snap (no cache, no jflex) 314.61 499.00
Latest snap normalizer disabled 200.10 306.28
Better-lexer 128.27 265.94
Better-lexer normalizer disabled 101.03 148.36
Better-lexer jdbc inst disabled 80.17 99.85

@breedx-splk
Copy link
Contributor

breedx-splk commented Jan 26, 2021

I also looked at GC. The snapshot build had 85 GCs and this branch only had 29 GCs. The total time in stop the world GC decreased from 0.578s in the snapshot to 0.403s in this branch.

More data from the benchmarks:

Latest SNAPSHOT GC

image

Better-lexer branch

image

@breedx-splk
Copy link
Contributor

breedx-splk commented Jan 26, 2021

And with the profile config, we can look at this significant reduction in TLABs. A very significant reduction in allocations! I guess it was a little surprising to me, because the app does so much HTTP as well (and I expected header and body parsing to also be hot), but the top methods clearly show that sanitization is 馃敟 path.

SNAPSHOT TLABs

image
image
image

Better Lexer TLABs

image
image
image

javaagent-api/src/main/jflex/SqlSanitizer.flex Outdated Show resolved Hide resolved
@@ -0,0 +1,378 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're doing this, can we move to instrumentation-api folder? This isn't javaagent-specific. OK for another PR though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move the entire db package in the another PR - there are ~4 classes that need to be moved together.

if (statement == null) {
return new SqlStatementInfo(null, null, null);
}
SqlSanitizer sanitizer = new SqlSanitizer(new java.io.StringReader(statement));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, if only we could replace zzBuffer with statement, reading from Reader into another char[] is just so pointless here. Oh well, if we find we don't need to maintain the grammar much, we might vendor in the generated code and make some tweaks to it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be awesome... and probably would improve the performance even more, since char[] copying seems to take a lot of time in Jason's perf test. Unfortunately jflex does not allow this degree of customisation.

The generated code is pretty much unreadable.
I think that maybe we could use a semi-manual approach: keep a flex grammar file + e.g. patch file with customizations applied to the generated code; whenever somebody has to change something they would have to generate+apply changes by themselves. And fix the patch when it breaks 馃槄

Base automatically changed from master to main January 26, 2021 05:50
Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mateuszrzeszutek
Copy link
Member Author

Thanks for performance tests @breedx-splk 馃檹 It's great to see how much the response times & GC have improved - and this should get even better once we introduce the cache 馃帀

I definitely appreciate that there are multiple duplicate blocks in type-specific normalizers that have collapsed into one simple statement now!

Those duplicates will disappear in next PR, when I introduce one property to rule them all 馃槃

CLOSE_PAREN = ")"
OPEN_COMMENT = "/*"
CLOSE_COMMENT = "*/"
IDENTIFIER = ([:letter:] | "_") ([:letter:] | [0-9] | [_.])*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so happy that the abomination I created to represent the unicode letter character class is gone!

@mateuszrzeszutek
Copy link
Member Author

(Rebased to get the netty 4.1 test fix)

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

馃憤

@mateuszrzeszutek mateuszrzeszutek deleted the better-lexer branch February 5, 2021 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants