-
Notifications
You must be signed in to change notification settings - Fork 5
feat: implemented filter aggregation #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r` + Added tests
|
Would it make sense for us to get feedback from upstream Tantivy team for this? |
stuhood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review: overall, looks really solid.
Thanks @mdashti!
@philippemnoel Here's the upstream PR: quickwit-oss#2711 |
mdashti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuhood Thanks for the feedback.
stuhood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
Please wait to land until the consumer PR is almost ready.
stuhood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Ticket(s) Closed
What
Implements filter aggregation support in Tantivy, enabling multiple filtered aggregations in a single query.
Why
Currently, there's no way to compute aggregations on different filtered subsets of documents in a single query. Users must run separate queries for each filter, which is slow and inefficient. For example, computing "average price overall + average price for t-shirts + count of electronics" requires three separate queries.
Elasticsearch's filter aggregation solves this by creating a single bucket containing documents matching a query, with support for nested sub-aggregations. This is a common analytics pattern that Tantivy now supports!
How
Added a new
FilterAggregationbucket aggregation type that:QueryParser) and directQueryobjects for custom query typesDocumentQueryEvaluatorto evaluate filter queries per-document during aggregation collection, avoiding separate query executionsSegmentReaderreferences, enabling filter aggregations to create query weights and scorers per segmentSome Implementation Details:
FilterAggregationsupports two modes:FilterQuery::QueryString: Parsed using Tantivy's standardQueryParserFilterQuery::Direct: AcceptsBox<dyn Query>for custom query extensionsFilterSegmentCollectorevaluates the filter query on each document collected by the main querydoc_countand flattened sub-aggregation resultsTests
Test suite with 20 tests covering:
All tests use the
assert_agg_results!macro for clean, consistent result validation with floating-point tolerance.