Skip to content

Conversation

@sawansri
Copy link
Contributor

@sawansri sawansri commented Jul 7, 2025

Description

Implements ApproximateBooleanQuery which flattens and rewrites to single clause queries at the OpenSearch level and applies approximation if possible.

Related Issues

Resolves #18692

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added _No response_ enhancement Enhancement or improvement to existing feature or request lucene labels Jul 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2025

❌ Gradle check result for 5e709c3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2025

❌ Gradle check result for 0056e3d:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

❌ Gradle check result for 2b9514b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

❌ Gradle check result for 61fbf45: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Jul 10, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 77c7068: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

sawansri added 2 commits July 10, 2025 18:06
Signed-off-by: Sawan Srivastava <[email protected]>
Signed-off-by: Sawan Srivastava <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for be6a051: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sawan Srivastava <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 5a8e024: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 6c53755: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for b1eb7e6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sawan Srivastava <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 6399b32: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sawan Srivastava <[email protected]>
@github-actions
Copy link
Contributor

✅ Gradle check result for 5154731: SUCCESS

@codecov
Copy link

codecov bot commented Jul 11, 2025

Codecov Report

❌ Patch coverage is 28.30189% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.81%. Comparing base (5d9695c) to head (5154731).
⚠️ Report is 177 commits behind head on main.

Files with missing lines Patch % Lines
...ch/search/approximate/ApproximateBooleanQuery.java 23.07% 30 Missing ⚠️
...arch/search/approximate/ApproximateScoreQuery.java 11.11% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18693      +/-   ##
============================================
+ Coverage     72.74%   72.81%   +0.07%     
- Complexity    68400    68452      +52     
============================================
  Files          5568     5569       +1     
  Lines        314401   314452      +51     
  Branches      45598    45610      +12     
============================================
+ Hits         228696   228966     +270     
+ Misses        67055    66843     -212     
+ Partials      18650    18643       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sawansri
Copy link
Contributor Author

sawansri commented Jul 11, 2025

I'm seeing good latency reduction for range queries (in line with ApproximatePointRangeQuery).

Before approximation:

|                                        50th percentile latency | filter-range |     146.287 |     ms |
|                                        90th percentile latency | filter-range |     154.617 |     ms |
|                                        99th percentile latency | filter-range |     158.012 |     ms |
|                                       100th percentile latency | filter-range |     164.048 |     ms |
|                                   50th percentile service time | filter-range |     144.662 |     ms |
|                                   90th percentile service time | filter-range |         153 |     ms |
|                                   99th percentile service time | filter-range |      155.89 |     ms |
|                                  100th percentile service time | filter-range |     162.017 |     ms |
|                                                     error rate | filter-range |           0 |      % |

After approximation:

|                                        50th percentile latency | filter-range |     9.08292 |     ms |
|                                        90th percentile latency | filter-range |     9.70475 |     ms |
|                                        99th percentile latency | filter-range |     10.5631 |     ms |
|                                       100th percentile latency | filter-range |     11.2487 |     ms |
|                                   50th percentile service time | filter-range |     7.37912 |     ms |
|                                   90th percentile service time | filter-range |     7.90009 |     ms |
|                                   99th percentile service time | filter-range |     9.12548 |     ms |
|                                  100th percentile service time | filter-range |     9.60697 |     ms |
|                                                     error rate | filter-range |           0 |      % |

Query:

curl -X POST "http://localhost:9200/big5/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": {
          "bool": {
            "filter": {
                "range": {
                    "metrics.size": {
                     "gte": 1000,
                     "lt": 2000
              }
            }
         }
       }
    }
  }' | jq '.'

}

// TODO: Figure out why multi-clause breaks testPhrasePrefix() in HighlighterWithAnalyzersTests.java
return ((BooleanQuery) query).clauses().size() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that we will only get BooleanQuery instances here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the result after calling booleanQueryBuilder.build() is always a BooleanQuery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query here might be MATCH_ALL_DOCS since you call fixNegativeQueryIfNeeded before this check here - so it is not the pure output of BooleanQueryBuilder's build method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we can probably add a defensive check before casting. Thanks.

private final int size;
private final List<BooleanClause> clauses;
private ApproximateBooleanQuery booleanQuery;
public boolean isUnwrapped = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant variable

}

// For single clause boolean queries, check if the clause can be approximated
if (clauses.size() == 1 && clauses.get(0).occur() != BooleanClause.Occur.MUST_NOT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need clauses.size() == 1 again since it is guaranteed by BoolQueryBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but this check will be necessary when multi-clause boolean query approximation will be implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add it then IMO.

appxResolved.setContext(context);
}
try {
resolvedQuery = resolvedQuery.rewrite(context.searcher());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling rewrite() inside setContext() breaks typical Lucene patterns where rewrite happens before context setting. Why is additional rewriting needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I was running into issues with MultiTermQueries within single clause boolean queries not rewriting properly. Forcing the resolved query through a rewrite seemed to fix the issue.

@prudhvigodithi
Copy link
Member

@sawansri can we close this PR in favor of #19046 ?

@sawansri
Copy link
Contributor Author

@sawansri can we close this PR in favor of #19046 ?

Yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request lucene _No response_ Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Approximation Framework] Extend approximation to single clause boolean queries

3 participants