Skip to content

Conversation

@Weijun-H
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Refactor the code get_statistics! in statistics.rs to improve the performance of converting Parquet statistics to Arrow arrays. Instead of using conditional checks and try_from conversions, use the ok() method to handle the conversion and filtering of values. This change simplifies the code and improves readability.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H -- I think this is a nice code impeovement ❤️

I am not sure this will improve the performance -- but you could run the benchmarks if you wanted to measure (cargo bench --bench parquet_statistic)

I filed #10934 to track adding a benchmark for data pages

@alamb
Copy link
Contributor

alamb commented Jun 16, 2024

cc @xinlifoobar and @marvinlanhenke

@marvinlanhenke
Copy link
Contributor

I think this is a nice code impeovement ❤️

It is indeed. Thanks @Weijun-H really nice refactor.

I am not sure this will improve the performance

I'll guess in terms of performance it won't make any difference.
However, it is more readable and thats always great, so thanks again.

Copy link
Contributor

@xinlifoobar xinlifoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor, thanks :)

@alamb alamb merged commit 378b9ee into apache:main Jun 17, 2024
@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

Thanks again everyone

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants