Skip to content

Conversation

eklitzke
Copy link
Contributor

@eklitzke eklitzke commented Mar 15, 2018

It was recently pointed out to me that calling fsync() or fdatasync() on a new file is not sufficient to ensure it's persisted to disk, as the existence of the file itself is stored in the directory inode. This means that ensuring that a new file is actually committed also requires an fsync() on the parent directory. There are lots of discussions about this topic online, e.g. here. This only applies to new files, calling fsync() on an old file is always fine.

In theory this means that the block index can race block persistence, as a poorly timed power outage could cause us to commit data to the block index that gets lost in the filesystem. The change here ensures that we call fsync() on the blocks directory after committing new block files. I checked the LevelDB source code and they already do this when updating their writeahead log. In theory this could happen at the same time as a chain split and that could cause you to come back online and then miss the block you had committed to the index, which would put you permanently out of sync between the two. This seems pretty far fetched, but we should handle this case correctly anyway.

I'm using a new autoconf macro as well, AC_CHECK_FUNCS(). It checks that a function is available and then defines a HAVE_* macro if it is, analogous to AC_CHECK_HEADERS. Right now autoscan complains a lot about the fact that we're not using this, so I figured I might as well start now while I was in this part of the code.

Apparently Windows doesn't have an similar method of syncing filesystem metadata---I'm not an expert on that though.

Also not strictly related to this change, but I have been working on a lot of platform-specific PRs recently and want to refactor util.h and util.cpp so the platform-specific bits are isolated from the generic util stuff. I intend to create an issue later today to describe how I think that should be done so I can get feedback before starting that work.

@eklitzke
Copy link
Contributor Author

Apparently macOS erroneously reports that it has fdatasync(), discussion here: haskell/unix#37

It was recently pointed out to me that calling fsync() or fdatasync() on a new
file is not sufficient to ensure it's persisted to disk, a the existence of the
file itself is stored in the directory inode. This means that ensuring that a
new file is actually committed also requires an fsync() on the parent directory.

This change ensures that we call fsync() on the blocks directory after
committing new block files.
Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 4894e36.

* Sync directory contents. This is required on some environments to ensure that
* newly created files are committed to disk.
*/
void DirectoryCommit(const fs::path &dirname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, fs:path& dirname. The same in the implementation.

Copy link
Contributor

@donaloconnor donaloconnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 4894e36

@donaloconnor
Copy link
Contributor

Just in case (I'm not very familiar with this part of Bitcoin): is there a reason this isn't done for the undo data also? If it makes sense there then maybe it's okay to do it once at the end of the function instead of in both?

@eklitzke
Copy link
Contributor Author

While I was looking at this again (originally to commit the undo data as @donaloconnor suggested) I decided it would be nice if we could only sync the parent directory if we know there's actually a new block file on disk. The block files are 128 MB, so most of the time we're flushing we don't need to sync the parent directory.

The full logic for all of the paths that can cause a file to be created is really confusing. First I tried adding a global variable that tracks if a new file has been opened, so that we can only sync the directory when that's set. Then I was trying to understand the fFinalize flag so I could add a doxygen doc string to FlushBlockToDisk(). The full code path for how this can get called is pretty confusing and the methods aren't well documented, but it seemed like it was true in the same cases that there would be a new file, since we "finalize" (i.e. truncate) the file when the is not "known to already reside on disk". But why exactly we would need to truncate a file that doesn't exist yet is unclear. I think it's to support reorgs, but it was kind of hard to wrap my head around.

Since this is pretty important logic I feel like some of these code paths at least need better comments. I am going to take another look through the code tomorrow and see if I can work it out. Maybe we end up always syncing the parent directory anyway since that's always safe, but I want to make sure I have a better grasp of all of the logic in this file.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realise you are still working on this but, utACK 4894e36 for the current version of the code, which does look good. PR description is a little out of date, though. Part about AC_CHECK_FUNCS should probably be removed.

@@ -1614,6 +1614,7 @@ void static FlushBlockFile(bool fFinalize = false)
if (fFinalize)
TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nSize);
FileCommit(fileOld);
DirectoryCommit(GetDataDir() / "blocks");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realize you are currently working on this, but if you decide to go with this code as is, you should definitely comment here that call this isn't needed in the case where new files aren't being added, in case somebody wants to try to optimize this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be a reasonably fast noop if the directory hasn't changed?

#elif HAVE_FDATASYNC
fdatasync(fileno(file));
#else
fsync(fileno(file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't changing this, but it would be nice if these functions returned errors so failures could be logged.

void DirectoryCommit(const fs::path &dirname)
{
#ifndef WIN32
FILE* file = fsbridge::fopen(dirname, "r");
fsync(fileno(file));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fsync really enough here, or should we be calling FileCommit on it?

@fanquake
Copy link
Member

Closing with "Up for grabs".

@fanquake fanquake closed this Jul 25, 2018
laanwj added a commit that referenced this pull request Jan 7, 2021
ef71229 util: Check for file being NULL in DirectoryCommit (Luke Dashjr)
4574904 Fix possible data race when committing block files (Evan Klitzke)
220bb16 util: Introduce DirectoryCommit commit function to sync a directory (Evan Klitzke)
ce5cbae util.h: Document FileCommit function (Evan Klitzke)
844d650 util: Prefer Mac-specific F_FULLSYNC over fdatasync in FileCommit (Evan Klitzke)
f6cec0b util: Refactor FileCommit from an #if sequence nested in #else, to a sequence of #elif (Evan Klitzke)

Pull request description:

  Reviving #12696

ACKs for top commit:
  laanwj:
    Code review ACK ef71229

Tree-SHA512: 07d650990ef4c18d645dee3f9a199a940683ad17557d79d93979a76c4e710d8d70e6eae01d1a5991494a24a7654eb7db868be0c34a31e70b2509945d95bc9cce
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 7, 2021
ef71229 util: Check for file being NULL in DirectoryCommit (Luke Dashjr)
4574904 Fix possible data race when committing block files (Evan Klitzke)
220bb16 util: Introduce DirectoryCommit commit function to sync a directory (Evan Klitzke)
ce5cbae util.h: Document FileCommit function (Evan Klitzke)
844d650 util: Prefer Mac-specific F_FULLSYNC over fdatasync in FileCommit (Evan Klitzke)
f6cec0b util: Refactor FileCommit from an #if sequence nested in #else, to a sequence of #elif (Evan Klitzke)

Pull request description:

  Reviving bitcoin#12696

ACKs for top commit:
  laanwj:
    Code review ACK ef71229

Tree-SHA512: 07d650990ef4c18d645dee3f9a199a940683ad17557d79d93979a76c4e710d8d70e6eae01d1a5991494a24a7654eb7db868be0c34a31e70b2509945d95bc9cce
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants