Skip to content

Conversation

@alchemist51
Copy link
Contributor

@alchemist51 alchemist51 commented Mar 8, 2025

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 8, 2025
@alchemist51 alchemist51 changed the title Change AsyncFileReader trait for u64 Change Parquet API interaction for u64 Mar 8, 2025
@alchemist51
Copy link
Contributor Author

alchemist51 commented Mar 8, 2025

I'm trying to change the trait AsyncFileReader to accept the u64 types instead of usize while keeping the other traits the same.

I was trying to do check for what traits we could change in for ParquetWriter though was not able to conclude on it.

Pretty new to the arrow-rs community and raising my first PR! Any feedback is welcome! @alamb @tustvold

@alchemist51 alchemist51 marked this pull request as ready for review March 9, 2025 18:38
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking promising thank you. Just to set expectations as this will be a breaking change it will need to wait for the next breaking release, likely scheduled for April.

impl<T: AsyncFileReader> MetadataFetch for &mut T {
fn fetch(&mut self, range: Range<usize>) -> BoxFuture<'_, Result<Bytes>> {
self.get_bytes(range)
self.get_bytes(range.start as u64..range.end as u64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also change this trait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@tustvold tustvold added api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version labels Mar 11, 2025
@alamb alamb marked this pull request as draft March 12, 2025 20:36
@alamb
Copy link
Contributor

alamb commented Mar 12, 2025

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alchemist51 alchemist51 force-pushed the change-parquet-usize branch from 48b297d to 11aa2cb Compare March 15, 2025 08:05
@alchemist51 alchemist51 marked this pull request as ready for review March 15, 2025 08:15
Signed-off-by: Arpit Bandejiya <[email protected]>
@alchemist51 alchemist51 force-pushed the change-parquet-usize branch from 03469ad to fef3719 Compare March 15, 2025 08:44
@alchemist51
Copy link
Contributor Author

Sorry about the delay @tustvold @alamb . have updated the PR. Please review.

@alchemist51
Copy link
Contributor Author

This is looking promising thank you. Just to set expectations as this will be a breaking change it will need to wait for the next breaking release, likely scheduled for April.

Is there any annotation or any file where I need to add these traits to mark for breaking change?

@alchemist51 alchemist51 requested a review from tustvold March 15, 2025 13:41
@alamb
Copy link
Contributor

alamb commented Mar 17, 2025

This is looking promising thank you. Just to set expectations as this will be a breaking change it will need to wait for the next breaking release, likely scheduled for April.

Is there any annotation or any file where I need to add these traits to mark for breaking change?

I think we just need to add a github label with "api change" which we have done

@alchemist51
Copy link
Contributor Author

@tustvold could you please run the workflows?

Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question in general about this PR: when we convert from u64 to usize (using as) should we instead use .try_into().expect("overflow")?

Comment on lines 203 to +205
let start = page.offset as usize;
let end = start + page.compressed_page_size as usize;
ranges.push(start..end);
ranges.push(start as u64..end as u64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let start = page.offset as usize;
let end = start + page.compressed_page_size as usize;
ranges.push(start..end);
ranges.push(start as u64..end as u64);
let start = page.offset as u64;
let end = start + page.compressed_page_size as u64;
ranges.push(start..end);

fn fetch(&mut self, range: Range<usize>) -> BoxFuture<'_, Result<Bytes>> {
self.get_bytes(range)
fn fetch(&mut self, range: Range<u64>) -> BoxFuture<'_, Result<Bytes>> {
self.get_bytes(range.start..range.end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait was updated to also use Range<u64>:

Suggested change
self.get_bytes(range.start..range.end)
self.get_bytes(range)

if read != to_read {
let mut buffer = Vec::with_capacity(to_read as usize);
let read = self.take(to_read).read_to_end(&mut buffer).await?;
if read != to_read as usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if read != to_read as usize {
if read as u64 != to_read {

@kylebarron
Copy link
Member

I put up #7371, updated with latest main and addressing latest comments from @mbrobbel

@alamb alamb marked this pull request as draft April 3, 2025 16:36
@alamb
Copy link
Contributor

alamb commented Apr 3, 2025

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@kylebarron
Copy link
Member

@alamb in case you didn't see, I put up #7371 as an up-to-date version of this PR

@alchemist51
Copy link
Contributor Author

Closing the PR since it's already being worked upon. Thanks @kylebarron @alamb

@alchemist51 alchemist51 closed this Apr 6, 2025
alamb added a commit that referenced this pull request Apr 8, 2025
…n 4GB in WASM) (#7371)

* Change AsyncFileReader trait for u64

Signed-off-by: Arpit Bandejiya <[email protected]>

* update metadatafetch trait

Signed-off-by: Arpit Bandejiya <[email protected]>

* Fix lint issue

* fix tests for latest main

* Address comments by @mbrobbel from #7252

* Fix compile

* Revert suffix length back to usize

* Use `u64` for `file_size`

* address comments

* fix calculation of metadata_start

* change file_size type to u64

* change _sized functions to take u64 file_size

* clippy

* remove some potential panics

* use u64 for page index ranges

* Revert change to deprecated method

---------

Signed-off-by: Arpit Bandejiya <[email protected]>
Co-authored-by: Arpit Bandejiya <[email protected]>
Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet Use U64 Instead of Usize (wasm support for files greater than 4GB)

5 participants