-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve performance of first_over_time
and last_over_time
queries by sharding them
#11605
Conversation
Trivy scan found the following vulnerabilities:
|
e70ccb5
to
8f589fe
Compare
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
in each vector, not the first Signed-off-by: Callum Styan <callumstyan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm extended this to include implementing last_over_time
here.
My only comments would be:
- that renaming
first_over_time.go
to something more generic, orfirst_last_over_time.go
like I did in my branch, probably makes sense - some comments would be useful (such as where I commented on the PR) for first time readers of the code
pkg/logql/engine.go
Outdated
T: ts, | ||
//T: ts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can do it like this. What do you think, @cstyan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code on main
overrides he timestamps in each vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the timestamp from the actual vector samples more accurate than the one got from stepEvaluator.Next()
?
Can you help me understand why it won't make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense for us. I'm just not sure some other component is relying on this overridden timestamp.
pkg/logql/first_last_over_time.go
Outdated
// order (see next, moves the current timestamp forward by step). So this | ||
// means that for each downstreamed shard of a first_over_time, selecting the | ||
// first sample here in each iterator is getting us the earliest timestamped | ||
// value. Later on when we merge we select the earliest from all the shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this comment should be broken down into pieces and moved to three different iterators.
firstWithTimestampBatch
, lastWithTimestampBatch
and mergeOverTimeEvaluator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great. Nice work folks! learned few things about sharding in downstream engine.
Added few comments. One question for me is, how do we make sure the correctness of first_over_time
and last_over_time
running with old engine vs downstream engine.?
I see you added tests in TestMappingEquivalence
. But do you think should be also added on other tests like TestRangeMappingEquivalence
on downstream engine tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. thanks for addressing the feedbacks.
Still this one applies I think.
I see you added tests in TestMappingEquivalence
. But do you think should be also added on other tests like TestRangeMappingEquivalence
on downstream engine tests? Rationale being the later tests for more cases of different range aggregation.
approving to unblock.
hey @kavirajk, actually something fun I learned this past week when finally finding the cause of the issues with our |
Signed-off-by: Callum Styan <callumstyan@gmail.com>
first_over_time
and last_over_time
queries.first_over_time
and last_over_time
queries by sharding them
@jeschkies any opinions on merging this as is vs as behind a feature flag? given that it's not being executed via a probabilistic data structure I don't think we need a feature flag |
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
…ueries by sharding them (#11605) Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Callum Styan <callumstyan@gmail.com>
What this PR does / why we need it:
With the introduction of sending query plans with custom nodes to the querier we can now shard
first_over_time
queries.Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR