-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
Abstract
This proposal extends OpenSearch's existing approximation framework to support multi-clause boolean queries, targeting significant performance improvements for log and analytics workloads. This approach specifically addresses conjunction operations with constant scoring clauses (such as range + numeric term combinations in filter contexts), covering one of the most common query structures in production workloads.
Motivation
Current Performance Numbers
OpenSearch's approximation framework shows significant gains:
- Point Range Queries: 25x improvement (250ms → <10ms)
- Term Queries on Numeric Fields: 25% latency reduction
- Boolean Filter Queries: Multi-fold improvement potential (unrealized)
A big gap is multi-clause boolean queries, which represent a large portion of queries in log/analytics workloads but cannot leverage the Approximation Framework’s early termination optimizations.
Target Query Pattern
{
"query": {
"bool": {
"must": [
{"range": {"@timestamp": {"gte": "1998-06-10", "lt": "1998-06-13"}}},
{"term": {"status": 200}}
]
}
}
}
Each clause is individually approximatable through early termination optimization on BKD trees, but the boolean combination cannot leverage this performance speedup.
Related Work
The proposal builds upon ongoing approximation framework development:
- Term Queries on Numeric Fields: PR #18679 implements ApproximateTermQuery for numeric fields using BKD tree early termination
- Single-Clause Boolean Query Approximation: PR #18693 provides foundation for ApproximateBooleanQuery with single approximatable clauses
- Original Approximation Framework: PR #13788introduces ApproximatePointRangeQuery implementation demonstrates BKD tree optimization potential
- Boolean Query Optimization: Recently merged PR #18541 rewrites non-scoring MUST clauses to FILTER clauses, expanding approximation coverage
- Hybrid Query Optimization: OpenSearch 3.1 implemented custom bulk scorer approaches using coordinated multi-subquery processing, batch processing with bitsets, and segment-level optimization - demonstrating similar architectural patterns to this proposal's custom bulk scorer with coordinated clause iteration and bitset-based document tracking
Technical Challenge
Lucene Search Architecture Context
To understand the approximation challenge, we need to examine the Lucene search abstractions used in boolean query execution:
Weight: Creates scorers and manages IndexSearcher-level state. For boolean queries, BooleanWeight coordinates multiple clause weights.
BulkScorer: Scores a range of documents at once. Boolean queries use specialized bulk scorers for optimization, with the default implementation (ConjunctionBulkScorer in this case) calling collector.collect(docId) for each matching document in the min-max range.
Scorer: Provides an iterator over matching document IDs in ascending order. Boolean queries create multiple scorers (one per clause) that must be coordinated.
ScorerSupplier: Creates bulk scorers and provides cost estimates. Boolean queries use BooleanScorerSupplier to determine the optimal scoring strategy based on clause costs and types.
ConjunctionDISI: This class coordinates multiple DocIdSetIterators to find documents that match ALL clauses (conjunction/AND operation). It implements a "leap-frogging" algorithm where iterators advance each other to find common document IDs, making it essential for FILTER and MUST clause execution but incompatible with naive approximation approaches.
Why Naive Approximation Fails
Simply applying existing approximation techniques to individual clauses in boolean queries fails to guarantee the expected result count:
Problem: If we approximate each clause independently (e.g., limiting each to 10k documents), the intersection of these approximated sets may contain far fewer than 10k documents. For example:
- Clause A (approximated): matches documents [1-10000]
- Clause B (approximated): matches documents [50000-60000]
- Intersection: 0 documents (despite each clause finding 10k matches)
Current Approximation Guarantee: Existing approximation (like ApproximatePointRangeQuery) guarantees to return at least the threshold number of documents (typically 10k) when available in the dataset.
Boolean Query Challenge: Naive per-clause approximation cannot maintain this guarantee because clause intersections are unpredictable, potentially returning zero results even when millions of matching documents exist.
ConjunctionDISI Leap-Frogging Problem
Lucene's ConjunctionDISI.doNext() implements iterator coordination:
private int doNext(int doc) throws IOException {
advanceHead:
for (;;) {
// Find agreement between lead iterators
final int next2 = lead2.advance(doc);
if (next2 != doc) {
doc = lead1.advance(next2);
if (next2 != doc) continue;
}
// Align all other iterators
for (DocIdSetIterator other : others) {
if (other.docID() < doc) {
final int next = other.advance(doc);
if (next > doc) {
doc = lead1.advance(next);
continue advanceHead;
}
}
}
return doc; // All iterators aligned
}
}
Critical Requirements Violated by Naively Applying Approximation:
- All iterators must remain responsive to
advance(docId)calls - Coordinated advancement requires all iterators to stay functional
- Early termination breaks the DocIdSetIterator contract
- Any iterator might need to advance to positions discovered by others
Performance Optimization Rationale
The proposed solution targets the most expensive operations in multi-clause boolean query execution:
- Clause Scoring Overhead: The most significant performance cost comes from scoring individual clauses, particularly when traversing BKD trees for range and term queries on numeric fields. Our approach optimizes this by implementing early termination and resumable scoring that can pause and continue BKD traversal exactly where it left off.
- Bitset Creation Cost: Creating bitsets for each clause is expensive, especially for large segments. By caching the DISI for each clause, we avoid recreating these bitsets on each scoring cycle, allowing us to directly update existing bitsets instead.
- Conjunction Overhead: The final conjunction operation that finds documents matching all clauses is also relatively costly. Our approach maintains state between scoring cycles, continuing from where it last left off after iteratively scoring more clauses.
- Early Termination Strategy: The entire purpose of this optimization is to enable early termination for scoring. By collecting documents incrementally until we reach the target threshold (10k), we avoid unnecessary scoring of documents that won't appear in the final result set.
Describe the solution you'd like
Proposed Solution: Resumable Iterator Architecture
High-Level Approach: We implement a specialized weight within ApproximateBooleanQuery with a bulkScorer method returning our custom ApproximateBooleanScorerSupplier. This scorer supplier creates resumable DocIdSetIterators for each clause that can pause and resume BKD tree traversal. These iterators are coordinated by a custom ApproximateConjunctionDISI that understands how to work with resumable iterators.
Individual clause iterators maintain their own bitsets and can pause/resume scoring while the custom conjunction coordinator handles intersection of multiple DISIs from different clauses. The solution uses custom bulk scoring with a circular process that builds progressively larger bitsets for each clause until sufficient intersection documents are found.
The key implementation detail is maintaining the state of each iterator's position in the BKD tree, allowing scoring to resume exactly where it left off when more documents are needed.
Core Circular Process
- Weight Creation:
ApproximateBooleanQuery.createWeight()creates a custom weight to return ourApproximateBooleanScorerSupplier - Iterator Creation: When
weight.bulkScorer(ctx)is called inContextIndexSearcher, ourApproximateBooleanScorerSupplier.bulkScorer()creates resumable DocIdSetIterators (DISIs) for each clause usingscorerSupplier.get(leadCost) - Conjunction Setup: These resumable DISIs are passed to our custom
ApproximateConjunctionScorer, which creates anApproximateConjunctionDISIinternally that understands resumable iterators - Bulk Scorer Wrapping: The conjunction scorer is wrapped in a custom
ApproximateBooleanBulkScorerthat gets returned to the originalweight.bulkScorer()method - Document Collection: When
bulkScorer.score(collector, liveDocs, minDocID, maxDocID)is called inContextIndexSearcher, our custom bulk scorer iterates over the conjunction DISI and collects matching documents - Threshold Tracking: The bulk scorer keeps track of how many documents have been collected
- Resumable Scoring: When collected documents < 10k (threshold), the scorer returns to the scorer suppliers and scores 10k more documents on each resumable DISI (only for approximated clauses), which resume from their last position in the BKD tree
- State Preservation: The DISI for each clause is cached to avoid recreating expensive bitsets, which is critical for performance as bitset creation is one of the most expensive operations
- Continuation: The bulk scorer updates the minimum bound to the next docID match returned by the previous call, then collects from that docID onwards, maintaining the conjunction state between scoring cycles
- Termination: This process repeats until 10k hits are collected or all documents have been processed
Scope and Applicability
This strategy is specifically designed for boolean queries involving conjunction operations (FILTER and MUST clauses):
Future Extensibility: The ApproximateBooleanBulkScorer and supporting custom classes are designed to be expanded to other boolean query types. When approximation strategies for other boolean query combinations (involving SHOULD, MUST_NOT, or mixed clause types) are identified and developed, the existing architecture can be extended to support these additional patterns without requiring fundamental redesign.
Current Limitations
This approximation strategy is currently designed only for conjunction-based boolean queries:
- SHOULD Queries: Not applicable because SHOULD clauses sum scores across clauses. Early termination after reaching the target document count cannot guarantee the actual top-N highest scoring documents, as remaining unprocessed clauses might contain higher-scoring matches.
- Mixed Clause Types: Boolean queries combining FILTER/MUST with SHOULD or MUST_NOT clauses cannot use this approximation strategy due to the scoring and exclusion requirements mentioned above.
Note: These limitations reflect the current conjunction-focused implementation. Future versions may extend the strategy to handle other boolean query patterns as appropriate approximation techniques are developed for those scenarios.
Core Components
The implementation requires custom classes to enable resumable scoring:
- ApproximateBooleanQuery: Extends BooleanQuery to create a custom Weight that supports approximation for multi-clause boolean queries
- ApproximateBooleanScorerSupplier: Creates resumable DocIdSetIterators for each clause and coordinates their usage
- ResumableDISI: Custom DocIdSetIterator that is maintained for all documents in the segment, allowing progressive expansion as more documents are scored
- BKDState: Tracks current position in BKD tree traversal for each approximatable clause, enabling resumption from the exact point where scoring was paused
- ApproximateConjunctionScorer: Custom Scorer that creates and manages an ApproximateConjunctionDISI
- ApproximateConjunctionDISI: Custom conjunction coordinator that understands resumable iterators and handles intersection of multiple DISIs from different clauses, replacing standard ConjunctionDISI logic
- ApproximateBooleanBulkScorer: Controls the circular scoring process, managing iterator creation, conjunction coordination, and adaptive continuation based on collected document count
The key performance optimization comes from:
- Caching clause iterators to avoid recreating expensive bitsets
- Maintaining BKD traversal state to resume exactly where we left off
- Preserving conjunction state between scoring cycles
- Incrementally collecting documents until we reach the target threshold
Lead Iterator Selection Challenge
An additional challenge comes from Lucene's standard practice of selecting the cheapest (most sparse) iterator as the lead in ConjunctionDISI. If the lead iterator is not an approximated clause, it could skip over many documents that approximated clauses would have matched, forcing approximated clauses to be rescored excessively during the leap-frogging coordination process.
Problem Scenario:
- Approximated clause A: matches documents [1, 5, 10, 15, 20, ...]
- Non-approximated lead clause B: matches documents [50, 100, 150, ...]
- Clause A must advance to document 50, then 100, then 150, potentially exhausting its approximation threshold while finding very few conjunction matches
Potential Solution:
- Adaptive Threshold Strategy: When an approximated clause is not the lead iterator, increase its scoring threshold (e.g., from 10k to 25k or 50k documents) to reduce the frequency of circular rescoring rounds and minimize coordination overhead
Performance Expectations
| Query Type | Current | Projected | Improvement |
|---|---|---|---|
| Multi-clause Boolean Filter | 850ms | TBD | Multifold improvement |
| Single-clause Boolean | 150ms | <10ms | Significant improvement |
| Term Query on Numeric Field | 12ms | 9ms | ~25% |
Note on Multi-Clause Performance: The improvement for multi-clause boolean filter queries represents a potential estimate based on theoretical analysis. Since no proof of concept has been implemented yet, we are conservatively projecting a multifold performance improvement rather than specific numbers. The actual performance gains will depend on various factors including data distribution, query complexity, and segment characteristics.
In the worst case scenario, if the target document count is not reached within the approximation threshold, this strategy will continue the circular process until all documents in the segment are traversed and scored, potentially resulting in performance equivalent to or slightly worse than current exact scoring due to state management overhead.
Queries used are in the Benchmarks section.
Conclusion
Multi-clause boolean query approximation addresses the highest-impact optimization gap in OpenSearch's approximation framework. The resumable iterator architecture solves the fundamental ConjunctionDISI coordination challenges while maintaining correctness and delivering substantial performance improvements for log/analytics workloads.
Benchmarks
Term Queries on numeric fields
Related component
Search:Performance
Describe alternatives you've considered
Alternatives
Alternative 1: Adaptive Window-Based Bulk Scoring
This alternative proposes using multiple calls to bulkScorer.score() of approximate clauses with dynamically sized windows to achieve the target conjunction hit count.
High-Level Approach:
- Start with an initial scoring window (e.g., docIDs 0-10k)
- Calculate hit density from initial window
- Expand window size based on observed hit density
- Continue until target hit count is reached or maximum iterations/timeout occurs
- Failover to full document range if target hits aren't reached
Adaptive Window Sizing Heuristic:
- If initial 10k document window yields 2k hits (20% density):
- For target of 10k hits, we need 8k more hits
- Next window size = (8k hits / 0.2 density) = 40k documents
- Apply safety factor (1.25) = 50k documents
- Score documents 0-60k in next iteration
Key Features:
- Only applies windowing to approximatable clauses
- Uses Lucene's existing ConjunctionUtils for conjunction operations
- Tracks collected documents with a single bitset
- Adapts window size based on observed hit density
- Adjusts maximum iterations based on segment size (more iterations for larger segments)
- Fails over to full document range if target hits aren't reached after maximum iterations
Pros:
- Simpler implementation than resumable iterators
- Leverages existing Lucene bulk scoring mechanisms
- More maintainable with fewer Lucene extensions
- Guarantees target hit count when enough matching documents exist
Cons:
- May require multiple scoring passes
- Less predictable performance for skewed data distributions
- Small performance impact when failover to full range is triggered
- Effectively re-runs the query with larger windows: Each window expansion is essentially like running the query again with a larger size
- Potential worse-case performance: If the failover mechanism is triggered, it would score all documents on top of the previous windows that were already processed, potentially resulting in worse performance than the non-approximated version
- No state preservation: Unlike the resumable approach, this method doesn't preserve BKD traversal state between windows, resulting in redundant work
Edge Case Handling:
- Maximum iteration count scales with segment size (e.g., 5 for small segments, 20+ for large segments)
- Timeout threshold to prevent excessive resource usage
- Exponential window growth for sparse datasets
- Minimum window size increase for very low hit densities
- Failover mechanism ensures target hit count is reached when possible
Alternative 2: Simple Multi-Clause Approximation Without Hit Guarantees
This approach would simply apply approximation to each clause independently without guaranteeing a specific number of hits in the final conjunction result.
Implementation Approach:
- Apply existing approximation techniques to each clause independently
- Use standard Lucene conjunction mechanisms to combine results
- Accept whatever number of hits result from the conjunction
Pros:
- Minimal implementation complexity - no custom Lucene extensions required
- Faster development
- Leverages existing approximation infrastructure directly
- Significant performance gains for common use cases
Cons:
- Cannot guarantee specific hit counts (e.g., the standard 10k document threshold)
- May return very few or zero results for sparse datasets
- Would require feature flag protection due to non-deterministic result sizes
Practical Applications:
This approach could be valuable for users who:
- Only need a small number of results (e.g., 10-100 hits)
- Are displaying only the first page of results
- Don't need exact hit counts
- Prioritize query speed over result completeness
For example, dashboard visualizations or search interfaces that only display the first 10-20 results could benefit from this approach with appropriate warnings about potential result limitations on sparse datasets.
Alternative 3: Contribute Approximation Framework to Lucene Core
Pros:
- Reduces Lucene extensions within OpenSearch codebase
- Benefits the broader Lucene ecosystem
- Potentially better long-term maintenance as part of core Lucene
Cons:
- Significantly longer integration timeline due to Lucene's release cycle
- Uncertain acceptance - approximation may not align with Lucene's design philosophy
- Would require porting existing OpenSearch approximation features to Lucene first
- Dependency on upstream decisions outside OpenSearch control
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status