-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Fix possible data race when committing block files #12696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Apparently macOS erroneously reports that it has |
It was recently pointed out to me that calling fsync() or fdatasync() on a new file is not sufficient to ensure it's persisted to disk, a the existence of the file itself is stored in the directory inode. This means that ensuring that a new file is actually committed also requires an fsync() on the parent directory. This change ensures that we call fsync() on the blocks directory after committing new block files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 4894e36.
* Sync directory contents. This is required on some environments to ensure that | ||
* newly created files are committed to disk. | ||
*/ | ||
void DirectoryCommit(const fs::path &dirname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, fs:path& dirname
. The same in the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 4894e36
Just in case (I'm not very familiar with this part of Bitcoin): is there a reason this isn't done for the undo data also? If it makes sense there then maybe it's okay to do it once at the end of the function instead of in both? |
While I was looking at this again (originally to commit the undo data as @donaloconnor suggested) I decided it would be nice if we could only sync the parent directory if we know there's actually a new block file on disk. The block files are 128 MB, so most of the time we're flushing we don't need to sync the parent directory. The full logic for all of the paths that can cause a file to be created is really confusing. First I tried adding a global variable that tracks if a new file has been opened, so that we can only sync the directory when that's set. Then I was trying to understand the Since this is pretty important logic I feel like some of these code paths at least need better comments. I am going to take another look through the code tomorrow and see if I can work it out. Maybe we end up always syncing the parent directory anyway since that's always safe, but I want to make sure I have a better grasp of all of the logic in this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realise you are still working on this but, utACK 4894e36 for the current version of the code, which does look good. PR description is a little out of date, though. Part about AC_CHECK_FUNCS
should probably be removed.
@@ -1614,6 +1614,7 @@ void static FlushBlockFile(bool fFinalize = false) | |||
if (fFinalize) | |||
TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nSize); | |||
FileCommit(fileOld); | |||
DirectoryCommit(GetDataDir() / "blocks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realize you are currently working on this, but if you decide to go with this code as is, you should definitely comment here that call this isn't needed in the case where new files aren't being added, in case somebody wants to try to optimize this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it be a reasonably fast noop if the directory hasn't changed?
#elif HAVE_FDATASYNC | ||
fdatasync(fileno(file)); | ||
#else | ||
fsync(fileno(file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aren't changing this, but it would be nice if these functions returned errors so failures could be logged.
void DirectoryCommit(const fs::path &dirname) | ||
{ | ||
#ifndef WIN32 | ||
FILE* file = fsbridge::fopen(dirname, "r"); | ||
fsync(fileno(file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is fsync
really enough here, or should we be calling FileCommit
on it?
Closing with |
ef71229 util: Check for file being NULL in DirectoryCommit (Luke Dashjr) 4574904 Fix possible data race when committing block files (Evan Klitzke) 220bb16 util: Introduce DirectoryCommit commit function to sync a directory (Evan Klitzke) ce5cbae util.h: Document FileCommit function (Evan Klitzke) 844d650 util: Prefer Mac-specific F_FULLSYNC over fdatasync in FileCommit (Evan Klitzke) f6cec0b util: Refactor FileCommit from an #if sequence nested in #else, to a sequence of #elif (Evan Klitzke) Pull request description: Reviving #12696 ACKs for top commit: laanwj: Code review ACK ef71229 Tree-SHA512: 07d650990ef4c18d645dee3f9a199a940683ad17557d79d93979a76c4e710d8d70e6eae01d1a5991494a24a7654eb7db868be0c34a31e70b2509945d95bc9cce
ef71229 util: Check for file being NULL in DirectoryCommit (Luke Dashjr) 4574904 Fix possible data race when committing block files (Evan Klitzke) 220bb16 util: Introduce DirectoryCommit commit function to sync a directory (Evan Klitzke) ce5cbae util.h: Document FileCommit function (Evan Klitzke) 844d650 util: Prefer Mac-specific F_FULLSYNC over fdatasync in FileCommit (Evan Klitzke) f6cec0b util: Refactor FileCommit from an #if sequence nested in #else, to a sequence of #elif (Evan Klitzke) Pull request description: Reviving bitcoin#12696 ACKs for top commit: laanwj: Code review ACK ef71229 Tree-SHA512: 07d650990ef4c18d645dee3f9a199a940683ad17557d79d93979a76c4e710d8d70e6eae01d1a5991494a24a7654eb7db868be0c34a31e70b2509945d95bc9cce
It was recently pointed out to me that calling
fsync()
orfdatasync()
on a new file is not sufficient to ensure it's persisted to disk, as the existence of the file itself is stored in the directory inode. This means that ensuring that a new file is actually committed also requires anfsync()
on the parent directory. There are lots of discussions about this topic online, e.g. here. This only applies to new files, callingfsync()
on an old file is always fine.In theory this means that the block index can race block persistence, as a poorly timed power outage could cause us to commit data to the block index that gets lost in the filesystem. The change here ensures that we call
fsync()
on the blocks directory after committing new block files. I checked the LevelDB source code and they already do this when updating their writeahead log. In theory this could happen at the same time as a chain split and that could cause you to come back online and then miss the block you had committed to the index, which would put you permanently out of sync between the two. This seems pretty far fetched, but we should handle this case correctly anyway.I'm using a new autoconf macro as well,
AC_CHECK_FUNCS()
. It checks that a function is available and then defines aHAVE_*
macro if it is, analogous toAC_CHECK_HEADERS
. Right nowautoscan
complains a lot about the fact that we're not using this, so I figured I might as well start now while I was in this part of the code.Apparently Windows doesn't have an similar method of syncing filesystem metadata---I'm not an expert on that though.
Also not strictly related to this change, but I have been working on a lot of platform-specific PRs recently and want to refactor
util.h
andutil.cpp
so the platform-specific bits are isolated from the generic util stuff. I intend to create an issue later today to describe how I think that should be done so I can get feedback before starting that work.