Skip to content

Conversation

@alamb
Copy link

@alamb alamb commented Jul 23, 2025

Which issue does this PR close?

Merging this PR will update the apache PR apache#7850

Rationale for this change

We need a way to turn off predicate caching as an escape valve for users who run into problems with the cache. This PR adds an option to the ArrowReaderBuilder to control the predicate cache size.

In order to test that this setting actually works, I also wrote tests.

What changes are included in this PR?

Changes:

  1. Add ArrowReaderMetrics struct that tracks the IO and CPU operations of the reader.
  2. Add ArrowReaderBuilder::with_max_predicate_cache_size configuration option to control the predicate cache size.
  3. Add ArrowReaderBuilder::with_metrics for configuring metrics
  4. Add tests to ensure that the predicate cache is used correctly and that the metrics are reported accurately.
  5. TODO: Hook up the cached reader in the sync reader

Are these changes tested?

Yes, all the new code is covered by tests

Are there any user-facing changes?

  1. New config options
  2. New API to get metrics from the arrow reader

@alamb alamb force-pushed the alamb/test_memory_limit branch from 6e0232b to ed3ce13 Compare July 24, 2025 19:30
@alamb alamb changed the title WIP: Add option to control predicate cache, documentation, ArrowReaderMetrics and tests Add option to control predicate cache, documentation, ArrowReaderMetrics and tests Jul 24, 2025
@alamb alamb marked this pull request as ready for review July 24, 2025 19:43
@alamb
Copy link
Author

alamb commented Jul 24, 2025

FYI @XiangpengHao

@XiangpengHao
Copy link
Owner

XiangpengHao commented Jul 24, 2025

Thank you, these are very nice improvements @alamb !

@XiangpengHao XiangpengHao merged commit 6e618b3 into XiangpengHao:pushdown-v4 Jul 24, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants