Skip to content

Conversation

@mdashti
Copy link

@mdashti mdashti commented Oct 2, 2025

Ticket(s) Closed

What

Implements filter aggregation support in Tantivy, enabling multiple filtered aggregations in a single query.

Why

Currently, there's no way to compute aggregations on different filtered subsets of documents in a single query. Users must run separate queries for each filter, which is slow and inefficient. For example, computing "average price overall + average price for t-shirts + count of electronics" requires three separate queries.

Elasticsearch's filter aggregation solves this by creating a single bucket containing documents matching a query, with support for nested sub-aggregations. This is a common analytics pattern that Tantivy now supports!

How

Added a new FilterAggregation bucket aggregation type that:

  1. Accepts both query strings (parsed via QueryParser) and direct Query objects for custom query types
  2. Uses DocumentQueryEvaluator to evaluate filter queries per-document during aggregation collection, avoiding separate query executions
  3. Extended aggregation collectors to receive SegmentReader references, enabling filter aggregations to create query weights and scorers per segment

Some Implementation Details:

  • FilterAggregation supports two modes:
    • FilterQuery::QueryString: Parsed using Tantivy's standard QueryParser
    • FilterQuery::Direct: Accepts Box<dyn Query> for custom query extensions
  • FilterSegmentCollector evaluates the filter query on each document collected by the main query
  • Documents matching the filter are counted and passed to sub-aggregation collectors
  • Results include doc_count and flattened sub-aggregation results

Tests

Test suite with 20 tests covering:

  • Basic Filtering: Single filters, no matches, multiple independent filters
  • Query Types: Term queries, range queries, boolean queries, bool field queries
  • Nested Filters: 2-level nesting, deep nesting (4+ levels), multiple branches at each level
  • Sub-Aggregations: Terms aggregations, multiple metric aggregations
  • Edge Cases: Empty indexes, malformed queries, base query interaction
  • Custom Queries: Direct Query objects, serialization behavior, equality checks
  • Correctness: Validation against equivalent separate query execution

All tests use the assert_agg_results! macro for clean, consistent result validation with floating-point tolerance.

@mdashti mdashti changed the title feat: implemented filter aggregation feat: added filter aggregation Oct 2, 2025
@philippemnoel
Copy link
Member

Would it make sense for us to get feedback from upstream Tantivy team for this?

@mdashti mdashti changed the title feat: added filter aggregation feat: implemented filter aggregation Oct 3, 2025
Copy link

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review: overall, looks really solid.

Thanks @mdashti!

@mdashti
Copy link
Author

mdashti commented Oct 3, 2025

Would it make sense for us to get feedback from upstream Tantivy team for this?

@philippemnoel Here's the upstream PR: quickwit-oss#2711

Copy link
Author

@mdashti mdashti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuhood Thanks for the feedback.

@mdashti mdashti requested a review from stuhood October 3, 2025 23:09
Copy link

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

Please wait to land until the consumer PR is almost ready.

Copy link

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mdashti mdashti merged commit 1ad7f07 into main Oct 8, 2025
5 checks passed
@mdashti mdashti deleted the paradedb-current-main/filter-impl branch October 8, 2025 21:50
mdashti added a commit that referenced this pull request Oct 21, 2025
mdashti added a commit that referenced this pull request Oct 22, 2025
mdashti added a commit that referenced this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Elasticsearch Filter Aggregation Support

4 participants