-
Notifications
You must be signed in to change notification settings - Fork 1k
Support write to buffer api for SerializedFileWriter #7714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, seems sensible. I think the documentation for SerializedFileWriter could stand some improving to highlight why it's a bad idea to write directly to the underlying writer.
|
Thank you @etseidl for review, i also add the reminder docs in latest PR for users who want to write directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas and @etseidl
I have some more comments about comments 😆 but the code looks great here.
Thank you!
| /// | ||
| /// It is inadvisable to directly write to the underlying writer, doing so | ||
| /// will likely result in a corrupt parquet file | ||
| /// **Warning**: if you write directly to this writer, you will skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
parquet/src/file/writer.rs
Outdated
|
|
||
| /// Returns a reference to the underlying writer. | ||
| /// | ||
| /// **Warning**: if you write directly to this writer, you will skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here since it is not possible to write to an immutable buffer we can probably skip this warning
| /// the file footer’s recorded offsets and sizes to diverge from reality, | ||
| /// resulting in an unreadable or corrupted Parquet file. | ||
| /// | ||
| /// If you want to write safely to the underlying writer, use [`Self::write_all`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Co-authored-by: Andrew Lamb <[email protected]>
Thank you @alamb for review, addressed comments and suggestions in latest PR. |
|
Sounds great to include in this release! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it -- thanks again @zhuqi-lucas
Which issue does this PR close?
Currently, no pub api to support write the internal buffer for SerializedFileWriter, it's very helpful when we want to add low level API for example:
Because that we want to update the buf bytes written, if we use the buf internal file to write, we can't update the internal buf written bytes.
The consistent update for the bytes written metrics is the key for our custom index write.
Rationale for this change
Add API to support write with buf byteswritten updating.
What changes are included in this PR?
Add API to support write with buf byteswritten updating.
Are there any user-facing changes?
No
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.