[RFC] 10x Performance Improvement in Multi-Clause Boolean Queries


### Is your feature request related to a problem? Please describe

## Abstract

This proposal extends OpenSearch's existing approximation framework to support **multi-clause boolean queries**, targeting significant performance improvements for log and analytics workloads. This approach specifically addresses conjunction operations with constant scoring clauses (such as range + numeric term combinations in filter contexts), covering one of the most common query structures in production workloads.


## Motivation

### Current Performance Numbers

OpenSearch's approximation framework shows significant gains:

* **Point Range Queries**: 25x improvement (250ms → <10ms)
* **Term Queries on Numeric Fields**: 25% latency reduction
* **Boolean Filter Queries**: Multi-fold improvement potential (unrealized)

A big gap is **multi-clause boolean queries**, which represent a large portion of queries in log/analytics workloads but cannot leverage the Approximation Framework’s early termination optimizations.


### Target Query Pattern

```
{
  "query": {
    "bool": {
      "must": [
        {"range": {"@timestamp": {"gte": "1998-06-10", "lt": "1998-06-13"}}},
        {"term": {"status": 200}}
      ]
    }
  }
}
```


Each clause is individually approximatable through early termination optimization on BKD trees, but the boolean combination cannot leverage this performance speedup.


## Related Work

The proposal builds upon ongoing approximation framework development:


* **Term Queries on Numeric Fields**: [PR #18679](https://github.com/opensearch-project/OpenSearch/pull/18679) implements ApproximateTermQuery for numeric fields using BKD tree early termination
* **Single-Clause Boolean Query Approximation**: [PR #18693](https://github.com/opensearch-project/OpenSearch/pull/18693) provides foundation for ApproximateBooleanQuery with single approximatable clauses
* **Original Approximation Framework**: [PR #13788](https://github.com/opensearch-project/OpenSearch/pull/13788)introduces ApproximatePointRangeQuery implementation demonstrates BKD tree optimization potential
* **Boolean Query Optimization**: Recently merged [PR #18541](https://github.com/opensearch-project/OpenSearch/pull/18541) rewrites non-scoring MUST clauses to FILTER clauses, expanding approximation coverage
* **Hybrid Query Optimization**: OpenSearch 3.1 implemented [custom bulk scorer approaches](https://opensearch.org/blog/reducing-hybrid-query-latency-in-opensearch-3-1-with-efficient-score-collection/) using coordinated multi-subquery processing, batch processing with bitsets, and segment-level optimization - demonstrating similar architectural patterns to this proposal's custom bulk scorer with coordinated clause iteration and bitset-based document tracking



## Technical Challenge

### Lucene Search Architecture Context

To understand the approximation challenge, we need to examine the Lucene search abstractions used in boolean query execution:

**Weight**: Creates scorers and manages IndexSearcher-level state. For boolean queries, `BooleanWeight` coordinates multiple clause weights.

**BulkScorer**: Scores a range of documents at once. Boolean queries use specialized bulk scorers for optimization, with the default implementation (ConjunctionBulkScorer in this case) calling `collector.collect(docId)` for each matching document in the min-max range.

**Scorer**: Provides an iterator over matching document IDs in ascending order. Boolean queries create multiple scorers (one per clause) that must be coordinated.

**ScorerSupplier**: Creates bulk scorers and provides cost estimates. Boolean queries use `BooleanScorerSupplier` to determine the optimal scoring strategy based on clause costs and types.

**ConjunctionDISI**: This class coordinates multiple DocIdSetIterators to find documents that match ALL clauses (conjunction/AND operation). It implements a "leap-frogging" algorithm where iterators advance each other to find common document IDs, making it essential for FILTER and MUST clause execution but incompatible with naive approximation approaches.


### Why Naive Approximation Fails

Simply applying existing approximation techniques to individual clauses in boolean queries fails to guarantee the expected result count:

**Problem**: If we approximate each clause independently (e.g., limiting each to 10k documents), the intersection of these approximated sets may contain far fewer than 10k documents. For example:

* Clause A (approximated): matches documents [1-10000]
* Clause B (approximated): matches documents [50000-60000]  
* Intersection: 0 documents (despite each clause finding 10k matches)

**Current Approximation Guarantee**: Existing approximation (like `ApproximatePointRangeQuery`) guarantees to return at least the threshold number of documents (typically 10k) when available in the dataset.

**Boolean Query Challenge**: Naive per-clause approximation cannot maintain this guarantee because clause intersections are unpredictable, potentially returning zero results even when millions of matching documents exist.

### ConjunctionDISI Leap-Frogging Problem

Lucene's `ConjunctionDISI.doNext()` implements iterator coordination:


```
private int doNext(int doc) throws IOException {
    advanceHead:
    for (;;) {
        // Find agreement between lead iterators
        final int next2 = lead2.advance(doc);
        if (next2 != doc) {
            doc = lead1.advance(next2);
            if (next2 != doc) continue;
        }

        // Align all other iterators
        for (DocIdSetIterator other : others) {
            if (other.docID() < doc) {
                final int next = other.advance(doc);
                if (next > doc) {
                    doc = lead1.advance(next);
                    continue advanceHead;
                }
            }
        }
        return doc; // All iterators aligned
    }
}
```


**Critical Requirements Violated by Naively Applying Approximation:**

1. All iterators must remain responsive to `advance(docId)` calls
2. Coordinated advancement requires all iterators to stay functional
3. Early termination breaks the DocIdSetIterator contract
4. Any iterator might need to advance to positions discovered by others



## Performance Optimization Rationale

The proposed solution targets the most expensive operations in multi-clause boolean query execution:


1. **Clause Scoring Overhead**: The most significant performance cost comes from scoring individual clauses, particularly when traversing BKD trees for range and term queries on numeric fields. Our approach optimizes this by implementing early termination and resumable scoring that can pause and continue BKD traversal exactly where it left off.
2. **Bitset Creation Cost**: Creating bitsets for each clause is expensive, especially for large segments. By caching the DISI for each clause, we avoid recreating these bitsets on each scoring cycle, allowing us to directly update existing bitsets instead.
3. **Conjunction Overhead**: The final conjunction operation that finds documents matching all clauses is also relatively costly. Our approach maintains state between scoring cycles, continuing from where it last left off after iteratively scoring more clauses.
4. **Early Termination Strategy**: The entire purpose of this optimization is to enable early termination for scoring. By collecting documents incrementally until we reach the target threshold (10k), we avoid unnecessary scoring of documents that won't appear in the final result set.

### Describe the solution you'd like

## Proposed Solution: Resumable Iterator Architecture

**High-Level Approach**: We implement a specialized weight within `ApproximateBooleanQuery` with a `bulkScorer` method returning our custom `ApproximateBooleanScorerSupplier`. This scorer supplier creates resumable DocIdSetIterators for each clause that can pause and resume BKD tree traversal. These iterators are coordinated by a custom `ApproximateConjunctionDISI` that understands how to work with resumable iterators.

Individual clause iterators maintain their own bitsets and can pause/resume scoring while the custom conjunction coordinator handles intersection of multiple DISIs from different clauses. The solution uses custom bulk scoring with a circular process that builds progressively larger bitsets for each clause until sufficient intersection documents are found.

The key implementation detail is maintaining the state of each iterator's position in the BKD tree, allowing scoring to resume exactly where it left off when more documents are needed.

### Core Circular Process

<img width="1550" height="3840" alt="Image" src="https://github.com/user-attachments/assets/4b769c61-089c-4b5b-8f7a-30bb5fae7c04" />

1. **Weight Creation**: `ApproximateBooleanQuery.createWeight()` creates a custom weight to return our `ApproximateBooleanScorerSupplier`
2. **Iterator Creation**: When `weight.bulkScorer(ctx)` is called in `ContextIndexSearcher`, our `ApproximateBooleanScorerSupplier.bulkScorer()` creates resumable DocIdSetIterators (DISIs) for each clause using `scorerSupplier.get(leadCost)`
3. **Conjunction Setup**: These resumable DISIs are passed to our custom `ApproximateConjunctionScorer`, which creates an `ApproximateConjunctionDISI` internally that understands resumable iterators
4. **Bulk Scorer Wrapping**: The conjunction scorer is wrapped in a custom `ApproximateBooleanBulkScorer` that gets returned to the original `weight.bulkScorer()` method
5. **Document Collection**: When `bulkScorer.score(collector, liveDocs, minDocID, maxDocID)` is called in `ContextIndexSearcher`, our custom bulk scorer iterates over the conjunction DISI and collects matching documents
6. **Threshold Tracking**: The bulk scorer keeps track of how many documents have been collected
7. **Resumable Scoring**: When collected documents < 10k (threshold), the scorer returns to the scorer suppliers and scores 10k more documents on each resumable DISI (only for approximated clauses), which resume from their last position in the BKD tree
8. **State Preservation**: The DISI for each clause is cached to avoid recreating expensive bitsets, which is critical for performance as bitset creation is one of the most expensive operations
9. **Continuation**: The bulk scorer updates the minimum bound to the next docID match returned by the previous call, then collects from that docID onwards, maintaining the conjunction state between scoring cycles
10. **Termination**: This process repeats until 10k hits are collected or all documents have been processed

### Scope and Applicability

This strategy is specifically designed for **boolean queries involving conjunction operations** (FILTER and MUST clauses):

**Future Extensibility**: The `ApproximateBooleanBulkScorer` and supporting custom classes are designed to be expanded to other boolean query types. When approximation strategies for other boolean query combinations (involving SHOULD, MUST_NOT, or mixed clause types) are identified and developed, the existing architecture can be extended to support these additional patterns without requiring fundamental redesign.


### Current Limitations

This approximation strategy is currently designed only for **conjunction-based boolean queries**:


* **SHOULD Queries**: Not applicable because SHOULD clauses sum scores across clauses. Early termination after reaching the target document count cannot guarantee the actual top-N highest scoring documents, as remaining unprocessed clauses might contain higher-scoring matches.
* **Mixed Clause Types**: Boolean queries combining FILTER/MUST with SHOULD or MUST_NOT clauses cannot use this approximation strategy due to the scoring and exclusion requirements mentioned above.

**Note**: These limitations reflect the current conjunction-focused implementation. Future versions may extend the strategy to handle other boolean query patterns as appropriate approximation techniques are developed for those scenarios.


### Core Components

The implementation requires custom classes to enable resumable scoring:

<img width="3840" height="3361" alt="Image" src="https://github.com/user-attachments/assets/b08545c7-6066-4713-8495-cf103e855035" />

* **ApproximateBooleanQuery**: Extends BooleanQuery to create a custom Weight that supports approximation for multi-clause boolean queries
* **ApproximateBooleanScorerSupplier**: Creates resumable DocIdSetIterators for each clause and coordinates their usage
* **ResumableDISI**: Custom DocIdSetIterator that is maintained for all documents in the segment, allowing progressive expansion as more documents are scored 
* **BKDState**: Tracks current position in BKD tree traversal for each approximatable clause, enabling resumption from the exact point where scoring was paused
* **ApproximateConjunctionScorer**: Custom Scorer that creates and manages an ApproximateConjunctionDISI
* **ApproximateConjunctionDISI**: Custom conjunction coordinator that understands resumable iterators and handles intersection of multiple DISIs from different clauses, replacing standard ConjunctionDISI logic
* **ApproximateBooleanBulkScorer**: Controls the circular scoring process, managing iterator creation, conjunction coordination, and adaptive continuation based on collected document count

The key performance optimization comes from:

1. Caching clause iterators to avoid recreating expensive bitsets
2. Maintaining BKD traversal state to resume exactly where we left off
3. Preserving conjunction state between scoring cycles
4. Incrementally collecting documents until we reach the target threshold

### Lead Iterator Selection Challenge

An additional challenge comes from Lucene's standard practice of selecting the cheapest (most sparse) iterator as the lead in ConjunctionDISI. If the lead iterator is **not** an approximated clause, it could skip over many documents that approximated clauses would have matched, forcing approximated clauses to be rescored excessively during the leap-frogging coordination process.

**Problem Scenario:**

* Approximated clause A: matches documents [1, 5, 10, 15, 20, ...]
* Non-approximated lead clause B: matches documents [50, 100, 150, ...]
* Clause A must advance to document 50, then 100, then 150, potentially exhausting its approximation threshold while finding very few conjunction matches

**Potential Solution:**

* **Adaptive Threshold Strategy**: When an approximated clause is not the lead iterator, increase its scoring threshold (e.g., from 10k to 25k or 50k documents) to reduce the frequency of circular rescoring rounds and minimize coordination overhead

## Performance Expectations


| Query Type | Current | Projected | Improvement |
|------------|---------|-----------|-------------|
| Multi-clause Boolean Filter | 850ms | TBD | Multifold improvement |
| Single-clause Boolean | 150ms | <10ms | Significant improvement |
| Term Query on Numeric Field | 12ms | 9ms | ~25% |


**Note on Multi-Clause Performance**: The improvement for multi-clause boolean filter queries represents a **potential estimate** based on theoretical analysis. Since no proof of concept has been implemented yet, we are conservatively projecting a multifold performance improvement rather than specific numbers. The actual performance gains will depend on various factors including data distribution, query complexity, and segment characteristics.

In the worst case scenario, if the target document count is not reached within the approximation threshold, this strategy will continue the circular process until all documents in the segment are traversed and scored, potentially resulting in performance equivalent to or slightly worse than current exact scoring due to state management overhead.

Queries used are in the **Benchmarks** section.


## Conclusion

Multi-clause boolean query approximation addresses the highest-impact optimization gap in OpenSearch's approximation framework. The resumable iterator architecture solves the fundamental ConjunctionDISI coordination challenges while maintaining correctness and delivering substantial performance improvements for log/analytics workloads.


### Benchmarks


[Single clause Boolean Query](https://github.com/opensearch-project/OpenSearch/pull/18693#issuecomment-3063317420)

[Term Queries on numeric fields](https://github.com/opensearch-project/OpenSearch/pull/18679#issuecomment-3071292692)

### Related component

Search:Performance

### Describe alternatives you've considered


## Alternatives

### Alternative 1: Adaptive Window-Based Bulk Scoring

This alternative proposes using multiple calls to `bulkScorer.score()`  of approximate clauses with dynamically sized windows to achieve the target conjunction hit count.

**High-Level Approach:**

1. Start with an initial scoring window (e.g., docIDs 0-10k)
2. Calculate hit density from initial window
3. Expand window size based on observed hit density
4. Continue until target hit count is reached or maximum iterations/timeout occurs
5. Failover to full document range if target hits aren't reached

### **Adaptive Window Sizing Heuristic:**

* If initial 10k document window yields 2k hits (20% density):
* For target of 10k hits, we need 8k more hits
* Next window size = (8k hits / 0.2 density) = 40k documents
* Apply safety factor (1.25) = 50k documents
* Score documents 0-60k in next iteration

**Key Features:**

* Only applies windowing to approximatable clauses
* Uses Lucene's existing ConjunctionUtils for conjunction operations
* Tracks collected documents with a single bitset
* Adapts window size based on observed hit density
* Adjusts maximum iterations based on segment size (more iterations for larger segments)
* Fails over to full document range if target hits aren't reached after maximum iterations

**Pros:**

* Simpler implementation than resumable iterators
* Leverages existing Lucene bulk scoring mechanisms
* More maintainable with fewer Lucene extensions
* Guarantees target hit count when enough matching documents exist

**Cons:**

* May require multiple scoring passes
* Less predictable performance for skewed data distributions
* Small performance impact when failover to full range is triggered
* **Effectively re-runs the query with larger windows**: Each window expansion is essentially like running the query again with a larger size
* **Potential worse-case performance**: If the failover mechanism is triggered, it would score all documents on top of the previous windows that were already processed, potentially resulting in worse performance than the non-approximated version
* **No state preservation**: Unlike the resumable approach, this method doesn't preserve BKD traversal state between windows, resulting in redundant work

**Edge Case Handling:**

* Maximum iteration count scales with segment size (e.g., 5 for small segments, 20+ for large segments)
* Timeout threshold to prevent excessive resource usage
* Exponential window growth for sparse datasets
* Minimum window size increase for very low hit densities
* Failover mechanism ensures target hit count is reached when possible

### Alternative 2: Simple Multi-Clause Approximation Without Hit Guarantees

This approach would simply apply approximation to each clause independently without guaranteeing a specific number of hits in the final conjunction result.

**Implementation Approach:**

* Apply existing approximation techniques to each clause independently
* Use standard Lucene conjunction mechanisms to combine results
* Accept whatever number of hits result from the conjunction

**Pros:**

* Minimal implementation complexity - no custom Lucene extensions required
* Faster development
* Leverages existing approximation infrastructure directly
* Significant performance gains for common use cases

**Cons:**

* Cannot guarantee specific hit counts (e.g., the standard 10k document threshold)
* May return very few or zero results for sparse datasets
* Would require feature flag protection due to non-deterministic result sizes

**Practical Applications:**

This approach could be valuable for users who:

* Only need a small number of results (e.g., 10-100 hits)
* Are displaying only the first page of results
* Don't need exact hit counts
* Prioritize query speed over result completeness

For example, dashboard visualizations or search interfaces that only display the first 10-20 results could benefit from this approach with appropriate warnings about potential result limitations on sparse datasets.

### Alternative 3: Contribute Approximation Framework to Lucene Core

**Pros:**

* Reduces Lucene extensions within OpenSearch codebase
* Benefits the broader Lucene ecosystem
* Potentially better long-term maintenance as part of core Lucene

**Cons:**

* Significantly longer integration timeline due to Lucene's release cycle
* Uncertain acceptance - approximation may not align with Lucene's design philosophy
* Would require porting existing OpenSearch approximation features to Lucene first
* Dependency on upstream decisions outside OpenSearch control

### Additional context

_No response_

Query Type	Current	Projected	Improvement
Multi-clause Boolean Filter	850ms	TBD	Multifold improvement
Single-clause Boolean	150ms	<10ms	Significant improvement
Term Query on Numeric Field	12ms	9ms	~25%

[RFC] 10x Performance Improvement in Multi-Clause Boolean Queries #18784

Description

Is your feature request related to a problem? Please describe

Abstract

Motivation

Current Performance Numbers

Target Query Pattern

Related Work

Technical Challenge

Lucene Search Architecture Context

Why Naive Approximation Fails

ConjunctionDISI Leap-Frogging Problem

Performance Optimization Rationale

Describe the solution you'd like

Proposed Solution: Resumable Iterator Architecture

Core Circular Process

Scope and Applicability

Current Limitations

Core Components

Lead Iterator Selection Challenge

Performance Expectations

Conclusion

Benchmarks

Related component

Describe alternatives you've considered

Alternatives

Alternative 1: Adaptive Window-Based Bulk Scoring

Adaptive Window Sizing Heuristic:

Alternative 2: Simple Multi-Clause Approximation Without Hit Guarantees

Alternative 3: Contribute Approximation Framework to Lucene Core

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions