Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Jun 19, 2025

Which issue does this PR close?

Currently, no pub api to support write the internal buffer for SerializedFileWriter, it's very helpful when we want to add low level API for example:

Because that we want to update the buf bytes written, if we use the buf internal file to write, we can't update the internal buf written bytes.

The consistent update for the bytes written metrics is the key for our custom index write.

Rationale for this change

Add API to support write with buf byteswritten updating.

What changes are included in this PR?

Add API to support write with buf byteswritten updating.

Are there any user-facing changes?

No

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 19, 2025
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, seems sensible. I think the documentation for SerializedFileWriter could stand some improving to highlight why it's a bad idea to write directly to the underlying writer.

@zhuqi-lucas
Copy link
Contributor Author

Thank you @etseidl for review, i also add the reminder docs in latest PR for users who want to write directly.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas and @etseidl

I have some more comments about comments 😆 but the code looks great here.

Thank you!

///
/// It is inadvisable to directly write to the underlying writer, doing so
/// will likely result in a corrupt parquet file
/// **Warning**: if you write directly to this writer, you will skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you


/// Returns a reference to the underlying writer.
///
/// **Warning**: if you write directly to this writer, you will skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here since it is not possible to write to an immutable buffer we can probably skip this warning

/// the file footer’s recorded offsets and sizes to diverge from reality,
/// resulting in an unreadable or corrupted Parquet file.
///
/// If you want to write safely to the underlying writer, use [`Self::write_all`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@zhuqi-lucas
Copy link
Contributor Author

Thank you @zhuqi-lucas and @etseidl

I have some more comments about comments 😆 but the code looks great here.

Thank you!

Thank you @alamb for review, addressed comments and suggestions in latest PR.

@alamb
Copy link
Contributor

alamb commented Jun 20, 2025

@zhuqi-lucas
Copy link
Contributor Author

Sounds great to include in this release!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do it -- thanks again @zhuqi-lucas

@alamb alamb merged commit fbaf7ce into apache:main Jun 20, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants