Skip to content

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented Apr 20, 2018

Add logging and error handling inside, and outside of FileCommit.
Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption.
(c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle f[data]sync.

I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

@laanwj laanwj force-pushed the 2018_04_fsync_noignore branch from 54299bf to 5657cc5 Compare April 20, 2018 10:17
@laanwj laanwj requested review from theuni and sipa April 20, 2018 10:18
@practicalswift
Copy link
Contributor

Strong concept ACK! Nice!

Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ACK

src/util.cpp Outdated
if (fflush(file) != 0) { // harmless if redundantly called
LogPrintf("%s: fflush failed: %d\n", __func__, errno);
return false;
}
#ifdef WIN32
HANDLE hFile = (HANDLE)_get_osfhandle(_fileno(file));
FlushFileBuffers(hFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

#ifdef WIN32
HANDLE hFile = (HANDLE)_get_osfhandle(_fileno(file));
FlushFileBuffers(hFile);
#else
#if defined(__linux__) || defined(__NetBSD__)
fdatasync(fileno(file));
if (fdatasync(fileno(file)) != 0 && errno != EINVAL) { // Ignore EINVAL for filesystems that don't support sync
LogPrintf("%s: fdatasync failed: %d\n", __func__, errno);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't come up with a specific reason other than "gut feeling", but potentially logging to the same disk that just failed to sync feels icky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not necessarily the case, e.g. they might be logging to stdout, or piping the log to something else over the network. Logging the error is extremely important for troubleshooting in any case, even if it might fail.

fclose(fileOld);
}

fileOld = OpenUndoFile(posOld);
if (fileOld) {
if (fFinalize)
TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nUndoSize);
FileCommit(fileOld);
flush_failed = FileCommit(fileOld);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|=

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, and the logic for these is backwards. They should be:

 flush_failed |= !FileCommit(fileOld);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes this should be flush_ok


FILE *fileOld = OpenBlockFile(posOld);
if (fileOld) {
if (fFinalize)
TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nSize);
FileCommit(fileOld);
flush_failed = FileCommit(fileOld);
fclose(fileOld);
}

fileOld = OpenUndoFile(posOld);
if (fileOld) {
if (fFinalize)
TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nUndoSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't hurt to check the return value here too, but I guess it's not crucial since any error would just propagate to the sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -49,7 +49,8 @@ bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data

// Serialize
if (!SerializeDB(fileout, data)) return false;
FileCommit(fileout.Get());
if (!FileCommit(fileout.Get()))
return error("%s: Failed to flush file %s", __func__, pathTmp.string());
fileout.fclose();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this attempt to close before returning? I assume we're in bad shape already if we get here, but I'm afraid that some one-off sync failure could lead to multiple copies open and weird fd behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't CAutofile destructor take care of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, nevermind missed that this was a CAutofile.


FILE *fileOld = OpenBlockFile(posOld);
if (fileOld) {
if (fFinalize)
TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nSize);
FileCommit(fileOld);
flush_failed = FileCommit(fileOld);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See fclose comment above.

@@ -4760,7 +4765,8 @@ bool DumpMempool(void)
}

file << mapDeltas;
FileCommit(file.Get());
if (!FileCommit(file.Get()))
throw std::runtime_error("FileCommit failed");
file.fclose();
Copy link
Member

@theuni theuni Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaand here :)

src/util.cpp Outdated
#ifdef WIN32
HANDLE hFile = (HANDLE)_get_osfhandle(_fileno(file));
FlushFileBuffers(hFile);
#else
#if defined(__linux__) || defined(__NetBSD__)
fdatasync(fileno(file));
if (fdatasync(fileno(file)) != 0 && errno != EINVAL) { // Ignore EINVAL for filesystems that don't support sync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, leveldb does not compare the result to EINVAL. Which makes sense for a database, as if writes can't be synchronized at any point, the db can't really ever be trusted.

I think we might want to operate the same way. If a FlushBlockFile() fails due to a disk/filesystem that can't sync, do we really want to continue with updating the index db?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leveldb does compare against EINVAL, we had this issue a long time ago where leveldb didn't work with certain filesystems (e.g. NTFS mounted from Linux). This was eventually fixed by adding the compare against EINVAL.
We certainly don't want to repeat this issue with the block files!

$ git grep fsync
util/env_posix.cc:        if (fsync(fd) < 0 && errno != EINVAL) {
``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. leveldb checks the fsync() in SyncDirIfManifest() against EINVAL, but not the fdatasync() in Sync().

I have no knowledge of the past issue though, so I'll defer to you on that.

Copy link
Member Author

@laanwj laanwj Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as if writes can't be synchronized at any point, the db can't really ever be trusted.

I don't agree. When shutting down bitcoind and the OS cleanly, f(data)sync is a no-op. It's only an issue if there are sudden crashes or power outages. In which case the user will just have to accept potential corruption on crashes when choosing to use a filesystem that doesn't support these things. Certainly for the block files, which are huge and more commonly hosted externally.

My intent here is not to break any current usecases.

Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK after squash.

@jonasschnelli
Copy link
Contributor

utACK 36424a4f09de38e1ab54381aa7844781688f5c75 (squash recommended).

Add logging and error handling inside, and outside of FileCommit.
Functions such as fsync, fdatasync will return error in case of hardware
I/O errors, and ignoring this means it can silently continue through
data corruption.  (c.f.
https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)
@laanwj
Copy link
Member Author

laanwj commented Apr 23, 2018

squashed 36424a4 → cf02779

@laanwj laanwj force-pushed the 2018_04_fsync_noignore branch from 36424a4 to cf02779 Compare April 23, 2018 12:26
@laanwj laanwj merged commit cf02779 into bitcoin:master Apr 23, 2018
laanwj added a commit that referenced this pull request Apr 23, 2018
cf02779 Add logging and error handling for file syncing (Wladimir J. van der Laan)

Pull request description:

  Add logging and error handling inside, and outside of FileCommit.
  Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption.
  (c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

  EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle `f[data]sync`.

  I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

Tree-SHA512: 64cc9bbedca3ecc97ff4bac0a7b7ac6526a7ed763c66f6786d03ca4f2e9e366e42b152cb908299c060448d98ca39ff03395280bffaca51d592e728aa2516f5dd
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Nov 10, 2020
cf02779 Add logging and error handling for file syncing (Wladimir J. van der Laan)

Pull request description:

  Add logging and error handling inside, and outside of FileCommit.
  Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption.
  (c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

  EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle `f[data]sync`.

  I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

Tree-SHA512: 64cc9bbedca3ecc97ff4bac0a7b7ac6526a7ed763c66f6786d03ca4f2e9e366e42b152cb908299c060448d98ca39ff03395280bffaca51d592e728aa2516f5dd
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Nov 12, 2020
cf02779 Add logging and error handling for file syncing (Wladimir J. van der Laan)

Pull request description:

  Add logging and error handling inside, and outside of FileCommit.
  Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption.
  (c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

  EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle `f[data]sync`.

  I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

Tree-SHA512: 64cc9bbedca3ecc97ff4bac0a7b7ac6526a7ed763c66f6786d03ca4f2e9e366e42b152cb908299c060448d98ca39ff03395280bffaca51d592e728aa2516f5dd
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Nov 17, 2020
cf02779 Add logging and error handling for file syncing (Wladimir J. van der Laan)

Pull request description:

  Add logging and error handling inside, and outside of FileCommit.
  Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption.
  (c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

  EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle `f[data]sync`.

  I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

Tree-SHA512: 64cc9bbedca3ecc97ff4bac0a7b7ac6526a7ed763c66f6786d03ca4f2e9e366e42b152cb908299c060448d98ca39ff03395280bffaca51d592e728aa2516f5dd
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Apr 20, 2021
…+ Add file syncing logging and error handling

d593e7e Migrate remaining FLATDATA serialization, we are natively supporting it now. (furszy)
051970d addrdb: Remove temporary files created in SerializeFileDB. Fixes non-determinism in unit tests. (practicalswift)
4be426f Add logging and error handling for file syncing (Wladimir J. van der Laan)
8662fb3 Remove unused double_safe_addition & double_safe_multiplication functions. (furszy)
a7c3885 Add native support for serializing char arrays without FLATDATA (Pieter Wuille)
459ecb9 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille)
a5d2f8a Fix copypasted comment. (Pavel Janík)

Pull request description:

  Digging down the peers and ban databases commits path in upstream found some good stuff, back ported the following PRs:

  * bitcoin#9216.
  * bitcoin#10248.
  * bitcoin#12740.
  * bitcoin#13039.
  * bitcoin#16212.

ACKs for top commit:
  random-zebra:
    ACK d593e7e
  Fuzzbawls:
    ACK d593e7e

Tree-SHA512: 8d567c68a2c36f43c749d5faa4b7f8a299dbfdda133495df434ac642c1a2ac96dc2259123ad85decc9039c62c46602665f6be4aadf6d5bf729344c51692ec60e
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jun 26, 2021
cf02779 Add logging and error handling for file syncing (Wladimir J. van der Laan)

Pull request description:

  Add logging and error handling inside, and outside of FileCommit.
  Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption.
  (c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

  EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle `f[data]sync`.

  I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

Tree-SHA512: 64cc9bbedca3ecc97ff4bac0a7b7ac6526a7ed763c66f6786d03ca4f2e9e366e42b152cb908299c060448d98ca39ff03395280bffaca51d592e728aa2516f5dd
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants