Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Dec 1, 2023

Which issue does this PR close?

POC for #8376

Rationale for this change

See #8376

TLDR is I would like to use PruningPredicate for "bloom filter like" operations downstream in IOx, and thus I would like to refactor the code that applies Bloom Filters today into the PruningPredicate.

What changes are included in this PR?

  1. New PruningStatistics::contains API proposed in Support general pruning based on <col> = 'const' in PruningPredicate #8376
  2. Add LiteralGuarantee for extracting col = <const> predicates, along with many new tests
  3. Rewrite the BloomFilter code to use the new predicate API

TODO

Are these changes tested?

  • New tests for PruningPredicate contains (=, !=, = with OR, IN and NOT IN)
  • Existing tests pass

Are there any user-facing changes?

The PruningPredicate has a new API

Plans for merging

I do not plan to propose merging this PR as is. Instead I plan a series of PRs:

  1. A PR with some new bloom filter tests
  2. A PR that introduces the contains API with tests
  3. A second PR that refactors the the row group pruning code to use it
  4. A third PR that refactors the RowGroupPruning code so it runs on more than one RowGroupMetadatat at a time.

🤔 musing: At the moment, the row group pruning happens in two phases -- first the min/max statistics are applied, and then the bloom filters are applied which means the potentially bloom filters are only fetched for groups that passed the first predicate.

@github-actions github-actions bot added the core Core DataFusion crate label Dec 1, 2023
Rewrite BloomFilterPruningPredicate in terms of BloomFilterPruningPredicate
@alamb alamb force-pushed the alamb/general_bloom_filter branch from 298da7f to 76ee618 Compare December 4, 2023 21:28
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Dec 4, 2023
@alamb
Copy link
Contributor Author

alamb commented Dec 18, 2023

The pieces of this PR are in, so closing this PR and focusing on merging the parts

@alamb alamb closed this Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant