-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Perform direct write operation if input data are larger than buffer size #11203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Flush(); // Flush buffer before writing every things else | ||
} | ||
idx_t remaining_to_write = write_size - to_copy; | ||
fs.Write(*handle, (void *)(buffer + to_copy), remaining_to_write); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only point that is bothering me... Here I have done a C cast style because the input data is a const std::uint8_t
type...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks fine to me, it's important that the + to_copy
is done before the cast and you did just that 👍
However we don't like having c-style casts, use reinterpret_cast
instead here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case unfortunately it's a const_cast
operation :(
cf 3931b0b
// Perform direct IO, the buffer to write is larger than the internal buffer | ||
// it does not make sens to split a larger buffer into smaller one | ||
idx_t to_copy = 0; | ||
if (offset != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this comment be clarified further?
It sounds like we're not appending to our buffer and flushing it as is, but in reality we're filling our buffer up to FILE_BUFFER_SIZE and then flushing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it better ? 3931b0b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @quentingodeau thanks for the PR! I added one comment, the rest looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good - this is a good idea. One comment:
to_copy = FILE_BUFFER_SIZE - offset; | ||
memcpy(data.get() + offset, buffer, to_copy); | ||
offset += to_copy; | ||
Flush(); // Flush buffer before writing every things else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call Flush()
followed by fs.Write()
of the entire incoming buffer below without first copying part of the buffer into the existing buffer? I don't quite see the benefit of first copying part of the buffer into the original file buffer, and it complicates the code here. Two write calls are happening in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion the underlying FS expect that the buffered are size at least to the size that has been configured. On a remote FS I think it may impact some performance (depending of course of the implementation), and I wanted to avoid the underlying FS to add an extract layer of buffers.
This is only my opinion and you have the final words regarding this. Please confirm if you want me to do the change or not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that - let's keep it as-is then
Thanks! |
Merge pull request duckdb/duckdb#11203 from quentingodeau/feature/buffer-writer
Merge pull request duckdb/duckdb#11203 from quentingodeau/feature/buffer-writer
Merge pull request duckdb/duckdb#11203 from quentingodeau/feature/buffer-writer
Hello,
While debugging I saw that the
BufferedFileWriter
was splitting incoming buffer to smaller one before writing then.Here the result I was expecting better improvement...
Before the change
without_changes.log
After
with_changes.log
But I still propose this PR because depending on the underlying filesystem it may be preferable to have larger buffer.