Efficiently and correctly Extract Page Index statistics into `ArrayRef`s

### Is your feature request related to a problem or challenge?

Related to https://github.com/apache/datafusion/issues/10453

There are at least two types of statistics stored in Parquet files

1. `ColumnChunk` level statistics (a min/max/null_count per column per row group): [`RowGroupMetadata`](https://docs.rs/parquet/latest/parquet/file/metadata/struct.RowGroupMetaData.html) --> [ColumnChunkMetaData](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ColumnChunkMetaData.html) --> [Option](https://doc.rust-lang.org/nightly/core/option/enum.Option.html)<&[Statistics](https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html)>
2. "Page Index" statistics (stored per page, may be more than one page per column per row group): [ColumnChunkMetaData](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ColumnChunkMetaData.html) --> [read_columns_indexes](https://docs.rs/parquet/latest/parquet/file/page_index/index_reader/fn.read_columns_indexes.html#) --> [Vec](https://doc.rust-lang.org/nightly/alloc/vec/struct.Vec.html)<[Index](https://docs.rs/parquet/latest/parquet/file/page_index/index/enum.Index.html)>

As part of  https://github.com/apache/datafusion/issues/10453 we have pulled conversion of the `ColumnChunk` level statistics into `StatisticsConverter` and https://github.com/apache/datafusion/pull/10802 prunes the row groups using this API

It would be good to apply the same treatment to the statistics in the page index

### Describe the solution you'd like

1. Add a clear API to efficiently extract page statistics outside of DataFusion
2. Ensure that API is well tested
3. Ensure the API is fast

### Describe alternatives you've considered

1. Move / refactor the code to extract `ArrayRef` from Index in page_filter ([source link](https://github.com/apache/datafusion/blob/ece7ae5eca451bb2599f13f9f9197fd93b2a8bc2/datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs#L394-L567)) to `StatisticsConverter` ([source](https://github.com/apache/datafusion/blob/ece7ae5eca451bb2599f13f9f9197fd93b2a8bc2/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L363))
2. Update the tests in arrow_statistics  ([source](https://github.com/apache/datafusion/blob/ece7ae5eca451bb2599f13f9f9197fd93b2a8bc2/datafusion/core/tests/parquet/arrow_statistics.rs#L180-L237)) to also verify that the page statistics are correct (I believe the page min/maxes should be the same as the row group min/maxes)
3. Update the parquet code `prune_pages_in_one_row_group` ([source](https://github.com/apache/datafusion/blob/ece7ae5eca451bb2599f13f9f9197fd93b2a8bc2/datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs#L301-L365)) to use the new `StatisticsExtractor` code
4. Update the benchmark ([source](https://github.com/apache/datafusion/blob/ece7ae5eca451bb2599f13f9f9197fd93b2a8bc2/datafusion/core/benches/parquet_statistic.rs#L152)) for extracting page statistics and use that to ensure the statistics extraction code is reasonably performant

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficiently and correctly Extract Page Index statistics into `ArrayRef`s #10806

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficiently and correctly Extract Page Index statistics into ArrayRefs #10806

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Efficiently and correctly Extract Page Index statistics into `ArrayRef`s #10806