Skip to content

Conversation

@sawansri
Copy link
Contributor

@sawansri sawansri commented Aug 12, 2025

Description

Implements window based scoring strategy for multi-fold boolean query speedup

The speedup comes from optimizations at two levels:

  1. Early Termination of ConjunctionDISI - once 10k hits have been collected from the ConjunctionDISI, BulkScorer.score() returns DISI.NO_MORE_DOCS (essentially max integer) to signal to the CancellableBulkScorer to stop calling the score() method since scoring/collecting is completed. Previously, DISI.NO_MORE_DOCS would only be returned when the entire ConjunctionDISI has been exhausted but this optimization early terminates at 10k since any conjunction hits past that are not displayed/sent back to the user (constant score case).

  2. Window Scoring Approach - As outlined in the issue, only build clause iterators of window size, run the conjunction once, see if 10k hits have been reached, if not then expand window, collect a larger iterator, and run the conjunction again. This can be further optimized by caching a copy of the previous iterator and utilizing visit(DocIdSetIterator iterator) to bulk add these already visited docIDs to the new iterator with larger size. Then continue the BKD Traversal from where the last iterator left off (using BKDState) and build the rest of the iterator. This approach would eliminate the redundant work done by each window (of scoring/collecting docs that have already been traversed by the previous iterator) improving performance further. Additional memory usage is something that would have to be benchmarked.

Related Issues

Resolves #19045

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

❌ Gradle check result for 530c12c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 059c177: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Performance labels Aug 13, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for ecf2a08: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for b368016: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 1b603be: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for f0ffb2a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

❌ Gradle check result for 7185528: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Search:Performance v3.4.0 Issues and PRs related to version 3.4.0

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[Approximation Framework] Multifold Improvement in Multi-Clause Boolean Query (Window Scoring Approach)

3 participants