feat: implemented filter aggregation #67

mdashti · 2025-10-02T21:08:25Z

Ticket(s) Closed

Closes Add Elasticsearch Filter Aggregation Support quickwit-oss/tantivy#2706

What

Implements filter aggregation support in Tantivy, enabling multiple filtered aggregations in a single query.

Why

Currently, there's no way to compute aggregations on different filtered subsets of documents in a single query. Users must run separate queries for each filter, which is slow and inefficient. For example, computing "average price overall + average price for t-shirts + count of electronics" requires three separate queries.

Elasticsearch's filter aggregation solves this by creating a single bucket containing documents matching a query, with support for nested sub-aggregations. This is a common analytics pattern that Tantivy now supports!

How

Added a new FilterAggregation bucket aggregation type that:

Accepts both query strings (parsed via QueryParser) and direct Query objects for custom query types
Uses DocumentQueryEvaluator to evaluate filter queries per-document during aggregation collection, avoiding separate query executions
Extended aggregation collectors to receive SegmentReader references, enabling filter aggregations to create query weights and scorers per segment

Some Implementation Details:

FilterAggregation supports two modes:
- FilterQuery::QueryString: Parsed using Tantivy's standard QueryParser
- FilterQuery::Direct: Accepts Box<dyn Query> for custom query extensions
FilterSegmentCollector evaluates the filter query on each document collected by the main query
Documents matching the filter are counted and passed to sub-aggregation collectors
Results include doc_count and flattened sub-aggregation results

Tests

Test suite with 20 tests covering:

Basic Filtering: Single filters, no matches, multiple independent filters
Query Types: Term queries, range queries, boolean queries, bool field queries
Nested Filters: 2-level nesting, deep nesting (4+ levels), multiple branches at each level
Sub-Aggregations: Terms aggregations, multiple metric aggregations
Edge Cases: Empty indexes, malformed queries, base query interaction
Custom Queries: Direct Query objects, serialization behavior, equality checks
Correctness: Validation against equivalent separate query execution

All tests use the assert_agg_results! macro for clean, consistent result validation with floating-point tolerance.

…r` + Added tests

philippemnoel · 2025-10-02T23:26:13Z

Would it make sense for us to get feedback from upstream Tantivy team for this?

stuhood

Partial review: overall, looks really solid.

Thanks @mdashti!

examples/filter_aggregation.rs

src/aggregation/bucket/filter.rs

src/aggregation/metric/stats.rs

mdashti · 2025-10-03T20:53:45Z

Would it make sense for us to get feedback from upstream Tantivy team for this?

@philippemnoel Here's the upstream PR: quickwit-oss#2711

mdashti

@stuhood Thanks for the feedback.

examples/filter_aggregation.rs

src/aggregation/bucket/filter.rs

stuhood

Looks good to me, thanks!

Please wait to land until the consumer PR is almost ready.

stuhood

Thanks!

src/aggregation/bucket/filter.rs

mdashti added 30 commits September 28, 2025 10:24

Initial impl

e3012b7

Added Filter impl in `build_single_agg_segment_collector_with_reade…

1e0346a

…r` + Added tests

Added Filter(FilterBucketResult) + Made tests work.

80dd75e

Fixed type issues.

a77343a

Fixed a test.

7c93dd1

8a7a73a: Pass segment_reader

1cf56c0

Added more tests.

499d753

Improved parsing + tests

15baaa7

refactoring

be46f6e

Added more tests.

953b419

refactoring: moved parsing code under QueryParser

71edc1d

Use Tantivy syntax instead of ES

bca4a6b

Added a sanity check test.

f1e2e6b

Simplified impl + tests

5f7d714

Added back tests in a more maintable way

49585c8

nitz.

aa6296d

nitz

d534d72

implemented very simple fast-path

725c829

improved a comment

9ce3425

implemented fast field support

22950df

Used BoundsRange

bf3d3f6

Improved fast field impl + tests

621933a

Simplified execution.

a82877e

Fixed exports + nitz

22735f9

Improved the tests to check to the expected result.

622c9c5

Improved test by checking the whole result JSON

564ee32

Removed brittle perf checks.

c6cb527

Added efficiency verification tests.

d249699

Added one more efficiency check test.

cb5cce3

Improved the efficiency tests.

420c9d1

mdashti changed the title ~~feat: implemented filter aggregation~~ feat: added filter aggregation Oct 2, 2025

mdashti added 2 commits October 2, 2025 15:30

nitz.

567ca87

Added an example

4deca43

mdashti changed the title ~~feat: added filter aggregation~~ feat: implemented filter aggregation Oct 3, 2025

stuhood reviewed Oct 3, 2025

View reviewed changes

Fixed PR comments.

dc100d3

mdashti commented Oct 3, 2025

View reviewed changes

mdashti added 3 commits October 3, 2025 15:02

Applied PR comments + nitz

c973235

nitz.

2f550d8

Improved the code.

319d2b3

mdashti requested a review from stuhood October 3, 2025 23:09

stuhood approved these changes Oct 6, 2025

View reviewed changes

mdashti added 6 commits October 7, 2025 16:05

Fixed a perf issue.

04eb526

Added batch processing.

c8accb6

Made the example more interesting

0b84fb2

Fixed bucket count

2c69c15

Renamed Direct to CustomQuery

b75a477

Fixed lint issues.

807b7de

stuhood approved these changes Oct 7, 2025

View reviewed changes

src/aggregation/bucket/filter.rs Outdated Show resolved Hide resolved

mdashti added 4 commits October 8, 2025 10:12

No need for scorer to be an Option

b6305e9

nitz

5660916

Used BitSet

5faf5e1

Added an optimization for AllQuery

fa374da

mdashti merged commit 1ad7f07 into main Oct 8, 2025
5 checks passed

mdashti deleted the paradedb-current-main/filter-impl branch October 8, 2025 21:50

mdashti added a commit that referenced this pull request Oct 21, 2025

feat: implemented filter aggregation (#67)

3de7d92

mdashti added a commit that referenced this pull request Oct 22, 2025

feat: implemented filter aggregation (#67)

08b49ee

mdashti added a commit that referenced this pull request Oct 22, 2025

feat: implemented filter aggregation (#67)

6450a61

feat: implemented filter aggregation #67

feat: implemented filter aggregation #67

Uh oh!

Conversation

mdashti commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket(s) Closed

What

Why

How

Tests

Uh oh!

philippemnoel commented Oct 2, 2025

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdashti commented Oct 3, 2025

Uh oh!

mdashti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mdashti commented Oct 2, 2025 •

edited

Loading