feat: added filter aggregation #2711

mdashti · 2025-10-03T00:05:56Z

Ticket(s) Closed

Closes Add Elasticsearch Filter Aggregation Support #2706

What

Implements filter aggregation support in Tantivy, enabling multiple filtered aggregations in a single query.

Why

Currently, there's no way to compute aggregations on different filtered subsets of documents in a single query. Users must run separate queries for each filter, which is slow and inefficient. For example, computing "average price overall + average price for t-shirts + count of electronics" requires three separate queries.

Elasticsearch's filter aggregation solves this by creating a single bucket containing documents matching a query, with support for nested sub-aggregations. This is a common analytics pattern that Tantivy now supports!

How

Added a new FilterAggregation bucket aggregation type that:

Accepts both query strings (parsed via QueryParser) and direct Query objects for custom query types
Uses DocumentQueryEvaluator to evaluate filter queries per-document during aggregation collection, avoiding separate query executions
Extended aggregation collectors to receive SegmentReader references, enabling filter aggregations to create query weights and scorers per segment

Some Implementation Details:

FilterAggregation supports two modes:
- FilterQuery::QueryString: Parsed using Tantivy's standard QueryParser
- FilterQuery::Direct: Accepts Box<dyn Query> for custom query extensions
FilterSegmentCollector evaluates the filter query on each document collected by the main query
Documents matching the filter are counted and passed to sub-aggregation collectors
Results include doc_count and flattened sub-aggregation results

Tests

Test suite with 20 tests covering:

Basic Filtering: Single filters, no matches, multiple independent filters
Query Types: Term queries, range queries, boolean queries, bool field queries
Nested Filters: 2-level nesting, deep nesting (4+ levels), multiple branches at each level
Sub-Aggregations: Terms aggregations, multiple metric aggregations
Edge Cases: Empty indexes, malformed queries, base query interaction
Custom Queries: Direct Query objects, serialization behavior, equality checks
Correctness: Validation against equivalent separate query execution

All tests use the assert_agg_results! macro for clean, consistent result validation with floating-point tolerance.

…r` + Added tests

PSeitz · 2025-10-21T14:01:37Z

The failing CI is a flaky test from me

Should be fixed now (after rebase)

mdashti · 2025-10-21T19:24:07Z

Thanks @PSeitz. I've also noticed and fixed it in #2723. Closed that PR.
Now, this PR is good for final review.

PSeitz-dd · 2025-10-22T08:28:29Z

benches/agg_bench.rs

+
+// Filter aggregation benchmarks
+
+fn filter_agg_all_query(index: &Index) {


Suggested change

fn filter_agg_all_query(index: &Index) {

fn filter_agg_all_query_count_agg(index: &Index) {

PSeitz-dd · 2025-10-22T08:28:46Z

benches/agg_bench.rs

+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_term_query(index: &Index) {


Suggested change

fn filter_agg_term_query(index: &Index) {

fn filter_agg_term_query_count_agg(index: &Index) {

PSeitz-dd · 2025-10-22T08:29:09Z

benches/agg_bench.rs

+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_all_query_with_sub_agg(index: &Index) {


Suggested change

fn filter_agg_all_query_with_sub_agg(index: &Index) {

fn filter_agg_all_query_with_sub_aggs(index: &Index) {

PSeitz-dd · 2025-10-22T08:29:17Z

benches/agg_bench.rs

+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_term_query_with_sub_agg(index: &Index) {


Suggested change

fn filter_agg_term_query_with_sub_agg(index: &Index) {

fn filter_agg_term_query_with_sub_aggs(index: &Index) {

mdashti

@PSeitz-dd Thanks for the comments. I also noticed there was a bug and fixed it in 94bdd5d

mdashti · 2025-10-22T08:59:32Z

benches/agg_bench.rs

+
+// Filter aggregation benchmarks
+
+fn filter_agg_all_query(index: &Index) {


mdashti · 2025-10-22T08:59:36Z

benches/agg_bench.rs

+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_term_query(index: &Index) {


mdashti · 2025-10-22T08:59:38Z

benches/agg_bench.rs

+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_all_query_with_sub_agg(index: &Index) {


mdashti · 2025-10-22T08:59:41Z

benches/agg_bench.rs

+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_term_query_with_sub_agg(index: &Index) {


PSeitz · 2025-10-23T08:45:20Z

src/aggregation/bucket/filter.rs

+    fn parse_query(&self, schema: &Schema) -> crate::Result<Box<dyn Query>> {
+        match &self.query {
+            FilterQuery::QueryString(query_str) => {
+                let tokenizer_manager = TokenizerManager::default();


The default tokenizer manager will fail for any fields with custom tokenizers. We'll need a mechanism to pass the TokenizerManager in there.
Probably the same way we pass the aggregations limits, we could put them both in a AggContextParams struct or similar.

fn for_segment( &self, segment_local_id: crate::SegmentOrdinal, reader: &crate::SegmentReader, ) -> crate::Result<Self::Child> { AggregationSegmentCollector::from_agg_req_and_reader( &self.agg, reader, segment_local_id, &self.limits, ) }

Thanks for catching this. Used the default and forgot to pipe it through. Fixed it.

PSeitz · 2025-10-23T09:04:30Z

src/query/query_parser/query_parser.rs

    }
-
-    #[test]
-    pub fn test_set_default_field_integer() {


I think this was removed by accident

Oops. Yeah. Fixed it.

PSeitz · 2025-10-23T09:06:08Z

tests/filter_aggregation.rs

@@ -0,0 +1,1013 @@
+//! Test suite for Filter Aggregation


can you move this to the filter aggregation implementation?

mdashti

@PSeitz Thanks for the comments. Please take another look.

mdashti · 2025-10-23T21:15:57Z

src/query/query_parser/query_parser.rs

    }
-
-    #[test]
-    pub fn test_set_default_field_integer() {


Oops. Yeah. Fixed it.

mdashti · 2025-10-23T21:30:17Z

tests/filter_aggregation.rs

@@ -0,0 +1,1013 @@
+//! Test suite for Filter Aggregation


mdashti · 2025-10-23T21:30:51Z

src/aggregation/bucket/filter.rs

+    fn parse_query(&self, schema: &Schema) -> crate::Result<Box<dyn Query>> {
+        match &self.query {
+            FilterQuery::QueryString(query_str) => {
+                let tokenizer_manager = TokenizerManager::default();


Thanks for catching this. Used the default and forgot to pipe it through. Fixed it.

mdashti · 2025-10-23T21:31:37Z

src/aggregation/bucket/filter.rs

+    /// Get the fast field names used by this aggregation (none for filter aggregation)
+    pub fn get_fast_field_names(&self) -> Vec<&str> {
+        // Filter aggregation doesn't use fast fields directly
+        vec![]


Added further comments. IMO, it should be fixed with a broader change in a follow-up PR.

PSeitz · 2025-10-24T10:15:46Z

src/aggregation/bucket/filter.rs

+    /// - Extension query types
+    ///
+    /// Note: This variant cannot be serialized to JSON (only QueryString can be serialized)
+    CustomQuery(Box<dyn SerializableQuery>),


Why do we use SerializableQuery when the query cannot be serialized?

I think a query constructor would be more suitable here, than de/serializing runtime objects, which may carry state.

PSeitz · 2025-10-24T10:24:32Z

src/aggregation/bucket/filter.rs

+        //
+        // This limitation exists because:
+        // - Query::weight() is called during execution, not during planning
+        // - The fallback decision is made per-segment based on field configuration


I think the decision depends on the schema, which is not segment specific

mdashti added 30 commits September 25, 2025 12:19

Initial impl

e8a0bb0

Added Filter impl in `build_single_agg_segment_collector_with_reade…

dd3eefc

…r` + Added tests

Added Filter(FilterBucketResult) + Made tests work.

bd03cb0

Fixed type issues.

baa4790

Fixed a test.

7ad2379

8a7a73a: Pass segment_reader

225d19d

Added more tests.

5500245

Improved parsing + tests

e38c7a4

refactoring

fb2e9fc

Added more tests.

ac01943

refactoring: moved parsing code under QueryParser

150387a

Use Tantivy syntax instead of ES

5e3b23d

Added a sanity check test.

d3cafc0

Simplified impl + tests

b0de0ca

Added back tests in a more maintable way

a02489c

nitz.

6cedc91

nitz

5208690

implemented very simple fast-path

f942d1f

improved a comment

7d3c054

implemented fast field support

04c2913

Used BoundsRange

4848335

Improved fast field impl + tests

cf209ef

Simplified execution.

9806e10

Fixed exports + nitz

b817391

Improved the tests to check to the expected result.

10d9d26

Improved test by checking the whole result JSON

49736ca

Removed brittle perf checks.

569208e

Added efficiency verification tests.

1e19e02

Added one more efficiency check test.

a2e270a

Improved the efficiency tests.

6401d56

mdashti added 2 commits October 21, 2025 12:25

Merge branch 'tantivy-main' into paradedb/filter-agg-feature

5ac6c9d

nitz.

f55387f

mdashti mentioned this pull request Oct 22, 2025

chore: rebase on Tantivy main paradedb/tantivy#72

Open

PSeitz-dd reviewed Oct 22, 2025

View reviewed changes

mdashti added 2 commits October 22, 2025 01:48

Applied PR comments.

d2cf3de

Fixed the AllQuery optimization

94bdd5d

mdashti commented Oct 22, 2025

View reviewed changes

mdashti requested a review from PSeitz-dd October 22, 2025 09:33

PSeitz reviewed Oct 23, 2025

View reviewed changes

mdashti added 2 commits October 23, 2025 13:46

Applied PR comments.

bc565b0

feat: used erased_serde to allow filter query to be serialized

42c9935

mdashti force-pushed the paradedb/filter-agg-feature branch from 6fa68d6 to 42c9935 Compare October 23, 2025 20:58

mdashti added 7 commits October 23, 2025 14:10

further improved a comment

c1fb35f

Added back tests.

d9b148c

removed an unused method

144465d

removed an unused method

996a966

Added documentation

b8e823f

nitz.

7e75da5

Merge branch 'main' into paradedb/filter-agg-feature

6bc7cf0

mdashti requested a review from PSeitz October 23, 2025 21:55

mdashti commented Oct 23, 2025

View reviewed changes

PSeitz reviewed Oct 24, 2025

View reviewed changes


		// Filter aggregation benchmarks

		fn filter_agg_all_query(index: &Index) {

	fn filter_agg_all_query(index: &Index) {
	fn filter_agg_all_query_count_agg(index: &Index) {

	fn filter_agg_term_query(index: &Index) {
	fn filter_agg_term_query_count_agg(index: &Index) {

	fn filter_agg_all_query_with_sub_agg(index: &Index) {
	fn filter_agg_all_query_with_sub_aggs(index: &Index) {

	fn filter_agg_term_query_with_sub_agg(index: &Index) {
	fn filter_agg_term_query_with_sub_aggs(index: &Index) {

Uh oh!

feat: added filter aggregation #2711

Are you sure you want to change the base?

feat: added filter aggregation #2711

Uh oh!

Conversation

mdashti commented Oct 3, 2025

Ticket(s) Closed

What

Why

How

Tests

Uh oh!

PSeitz commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdashti commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdashti left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PSeitz Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdashti left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

PSeitz commented Oct 21, 2025 •

edited

Loading

mdashti commented Oct 21, 2025 •

edited

Loading

PSeitz Oct 23, 2025 •

edited

Loading