Improve parquet ListingTable speed with parquet metadata (short clickbench queries)

### Is your feature request related to a problem or challenge?


I spent some time looking at the ClickBench results with DataFusion 40.0.0 
https://github.com/apache/datafusion/issues/11567#issuecomment-2254520675 (thanks @pmcgleenon 🙏 )

Specifically, I looked into how we could make some of the already fast queries on the the partitioned dataset faster. Unsurprisingly, for the really fast queries the query time is actually dominated by parquet metadata analysis and DataFusion statistics creation.

For example

ClickBench Q0
```
SELECT COUNT(*) FROM hits;
```

To reproduce, run:

```shell
cd datafusion
cargo run --release --bin dfbench -- clickbench --iterations 100 --path benchmarks/data/hits_partitioned  --query 0
```

I profiled this using Instruments. Here are some annotated screenshots

<img width="1728" alt="Screenshot 2024-07-30 at 6 25 43 AM" src="https://github.com/user-attachments/assets/28592700-dc3f-407b-9287-621c32290a53">
<img width="1728" alt="Screenshot 2024-07-30 at 6 26 53 AM" src="https://github.com/user-attachments/assets/3390a26d-f43f-4338-b92f-d681e3f2c378">


Some of my take aways are
1. a substantial amount of time is spent reading the parquet metadata twice
2. A substantial amount of time is spent managing the ScalarValues in statistics


### Describe the solution you'd like

If would be cool to make these queries faster by reducing the per file metadata handling overhead (e.g. don't read the metadata more than once and figure out some way to make statistics handling more efficient)

### Describe alternatives you've considered

Note this project isn't broken down into tasks yet

I think @Ted-Jiang  did some work way back to cache parquet metaddata 

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve parquet ListingTable speed with parquet metadata (short clickbench queries) #11719

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve parquet ListingTable speed with parquet metadata (short clickbench queries) #11719

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions