-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Multifold Improvement in Multi-Clause Boolean Query (Window Scoring Approach) #19046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ec8efed to
530c12c
Compare
|
❌ Gradle check result for 530c12c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 059c177: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1caffb6 to
ecf2a08
Compare
|
❌ Gradle check result for ecf2a08: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for b368016: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 1b603be: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1b603be to
f0ffb2a
Compare
|
❌ Gradle check result for f0ffb2a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
… clauses Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
753f3a8 to
7185528
Compare
|
❌ Gradle check result for 7185528: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Implements window based scoring strategy for multi-fold boolean query speedup
The speedup comes from optimizations at two levels:
Early Termination of
ConjunctionDISI- once 10k hits have been collected from theConjunctionDISI,BulkScorer.score()returnsDISI.NO_MORE_DOCS(essentially max integer) to signal to theCancellableBulkScorerto stop calling thescore()method since scoring/collecting is completed. Previously,DISI.NO_MORE_DOCSwould only be returned when the entireConjunctionDISIhas been exhausted but this optimization early terminates at 10k since any conjunction hits past that are not displayed/sent back to the user (constant score case).Window Scoring Approach - As outlined in the issue, only build clause iterators of window size, run the conjunction once, see if 10k hits have been reached, if not then expand window, collect a larger iterator, and run the conjunction again. This can be further optimized by caching a copy of the previous iterator and utilizing
visit(DocIdSetIterator iterator)to bulk add these already visited docIDs to the new iterator with larger size. Then continue the BKD Traversal from where the last iterator left off (using BKDState) and build the rest of the iterator. This approach would eliminate the redundant work done by each window (of scoring/collecting docs that have already been traversed by the previous iterator) improving performance further. Additional memory usage is something that would have to be benchmarked.Related Issues
Resolves #19045
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.