-
Notifications
You must be signed in to change notification settings - Fork 1k
Refactor build_array_reader into a struct
#7521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
403a636 to
5f42aac
Compare
5f42aac to
9690125
Compare
build_array_reader into a struct
| mod test_util; | ||
|
|
||
| pub use builder::build_array_reader; | ||
| pub(crate) use builder::ArrayReaderBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find the use of pub use in non pub modules confusing to reason about what is pub and what is not. I think using pub(crate) to make it explicit this is crate private makes it easier to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you @alamb.
|
Thanks for the review @zhuqi-lucas |
Make `ArrayReaderBuilder` public (under experimental), I think the ability to build array reader was public prior to #7521 This will allow downstream users to build their own array readers. This is also consistent with many other array readers that are public.
Note to reviewers
Looking at the whitespace blind diff (https://github.com/apache/arrow-rs/pull/7521/files?w=1) I think makes it easier to see what is changed in this PR (changed the plumbing of how parameters are passed around rather than any logic changes)
Which issue does this PR close?
Rationale for this change
I am trying to avoid potentially decoding arrays twice
Applying
ArrowPredicateis sometimes slower than filtering afterwards. Part of the reason for this is that filter columns are decoded twice.I want a way to inject a pre-calculated filter result into the record batch decoding machinery and one way I found that works well is to provide an
ArrayBuilderinstance that uses the cached result. I found it convenient to have a struct on which to hang the cache rather than a bunch of free functionsYou can see how this is used here:
Also, even if we choose not to go with the result
What changes are included in this PR?
ArrayBuilders into a new structAre there any user-facing changes?
No, this code is entirely internal (e.g. this code is not public: https://docs.rs/parquet/latest/parquet/?search=array_builder)