-
Notifications
You must be signed in to change notification settings - Fork 2.6k
add TRIM support to virtual filesystem, and implementation on linux #11258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one substantial comment on the testing side of this, that I think might require @Mytherin to take a look and make a call on how to move forward, and then took a look also at the rest and left two minor clarifications.
@@ -547,6 +551,28 @@ void SingleFileBlockManager::WriteHeader(DatabaseHeader header) { | |||
active_header = 1 - active_header; | |||
//! Ensure the header write ends up on disk | |||
handle->Sync(); | |||
// Release the free blocks to the filesystem. | |||
TrimFreeBlocks(); | |||
newly_freed_list.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newly_freed_list.clear(); |
This is done already as part of TrimFreeBlocks.
Potentially to consider an D_ASSERT that `newly_freed_list is actually empty.
free_list.erase(free_list.lower_bound(max_block), free_list.end()); | ||
newly_freed_list.erase(newly_freed_list.lower_bound(max_block), newly_freed_list.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not obvious to me looking here that both free_list and newly_freed_list are guaranteed to be sorted (given they are backed by a set<>).
Unsure if that should be made more visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the spec that std::set is sorted. Are you suggesting that we rename these members, to make it clearer that they're sorted?
test/helpers/test_helpers.cpp
Outdated
// Currently this is only supported on linux. | ||
// Because we test this mode on linux, and the no-trim mode on other platforms, | ||
// we get coverage with the option both on and off. | ||
result->options.trim_free_blocks = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this part. I think it's great to test this thoroughly, but it looks this also means we will stop testing duckdb on Linux with its default setting, and I can see potentially some problems being visible only with one of the two modes.
Unsure what's the way forward.
Maybe leaving default to false but having some unittester options to turn this on based on environment variable?
And also, I would consider this to not be linux only, and when this is turned on all filesystems handle this properly (via Trim being a no-op).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I wasn't sure what to do here and was hoping for feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can enable the setting if DUCKDB_ALTERNATIVE_VERIFY
is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make that change. Although DUCKDB_ALTERNATIVE_VERIFY also tests a slightly different checkpoint/shutdown behavior, which could conceivably interact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is true, maybe putting it behind DESTROY_UNPINNED_BLOCKS
would be better? Or RUN_SLOW_VERIFIERS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use DUCKDB_RUN_SLOW_VERIFIERS
Thanks for the PR! Cool functionality. I was not aware this was even possible. I'm very curious how well this works on various file systems/operating systems, but we can experiment with that separately. Could you have a look at the failing CI? |
Thanks! |
Merge pull request duckdb/duckdb#11258 from jkub/fs_trim_support Merge pull request duckdb/duckdb#11268 from lnkuiper/shell_json Merge pull request duckdb/duckdb#11271 from lnkuiper/json_stuff
Merge pull request duckdb/duckdb#11258 from jkub/fs_trim_support Merge pull request duckdb/duckdb#11268 from lnkuiper/shell_json Merge pull request duckdb/duckdb#11271 from lnkuiper/json_stuff
Merge pull request duckdb/duckdb#11258 from jkub/fs_trim_support Merge pull request duckdb/duckdb#11268 from lnkuiper/shell_json Merge pull request duckdb/duckdb#11271 from lnkuiper/json_stuff
Add TRIM method to file system, which enables excising a range of the DB file. The OS can drop pages from the page-cache, and the file-system is free to deallocate the range (sparse file support). Reads to the range will succeed but will return undefined data (though most platforms will return 0s).
This is implemented and tested on linux. Windows/ntfs also has support for trim but I have not implemented it, and so tests do not exercise it.
The feature is enabled via a dbconfig.options.trim_free_blocks, which defaults to false.