-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Even when there is an active row selection, async_reader::ReaderFactory::read_row_group will still fetch an entire column chunk from object storage.
Describe the solution you'd like
In the event that we have an OffsetIndex we can identify the pages that overlap with the row selection, and only fetch the corresponding byte ranges.
Describe alternatives you've considered
We could not do this
Additional context
This will likely benefit from ObjectStore::get_ranges added in #2336 being integrated into DataFusion to ensure the more granular ranges don't result in a regression, by making lots of small get requests
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate