-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Fix locking on WSL using flock instead of fcntl #18700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Gitian builds
|
If this is a bug in WSL, why not wait for them to fix it instead? I don't think it's desirable to change our handling of locks on all UNIX platforms just for a bug in their implementation. If we need is, please add it for WSL only and add a comment that it can be removed again when the upstream bug is fixed (with a link to the upstream bug). |
Can you please test that this fixes your issue @mtrycz. |
@laanwj the bug was reported upstream in 2017 with no fix in sight, so it seems it is up to us |
I've ran my test plan (from the related issue), and I can confirm that the test fails in the latest master However, I'm running more extended tests and some are failing. Please allow some time to make an adequate description. |
Github @wumpus is not bitcoin wumpus. |
src/fs.cpp
Outdated
// Exclusive file locking is broken on WSL using fcntl (issue #18622) | ||
// This workaround can be removed once the bug on WSL is fixed | ||
struct utsname uname_data; | ||
if (uname(&uname_data) == 0 && std::string(uname_data.version).find("Microsoft") != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition should probably be cached in a static bool local variable.
EDIT: and only conditionally compiled for WIN32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sipa it isn't "win32" because its the linux subsystem, so it runs in the non-windows branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, please cache the result.
I have ran The specific issue is solved for WSL on this PR's code. There are also several test failures (on both branches) that are reproducible, and some that are non-deterministic. If this sounds new, should I open a new issue with this? |
src/fs.cpp
Outdated
// Exclusive file locking is broken on WSL using fcntl (issue #18622) | ||
// This workaround can be removed once the bug on WSL is fixed | ||
// Cache the result to avoid multiple system calls | ||
static bool is_wsl_cache_exists = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a crazy edge case, and unlikely on realistic platforms, but this is technically UB if two threads simultaneously reach the if (!is_wsl_cache_exists)
check. The most C++ish solution is this I think: sipa@9470c59.
Is there any reason we can't check/take both locks? |
Co-authored-by: sipa <pieter@wuille.net>
Thanks @sipa, updated :) |
I'm removing this from the 0.20.0 milestone. It's not a regression in 0.20, it's an upstream bug in "Linux emulation", |
Code review ACK e8fa0a3 |
@@ -47,20 +50,38 @@ FileLock::~FileLock() | |||
} | |||
} | |||
|
|||
static bool IsWSL() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized after merging that we could shortcut this check to false
on non-x86 platforms. But it doesn't matter too much.
Tagged with " Needs backport (0.20) " |
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
Summary: Co-authored-by: sipa <pieter@wuille.net> Backport of Core [[bitcoin/bitcoin#18700 | PR18700]] Test Plan: On WSL ninja check ninja check-functional Confirm all tests pass except `feature_proxy.py` (fails), `rpc_bind.py` (skipped), and `interface_zmq.py` (skipped). Using two different consoles: src/bitcoind Verify this fails on the second console with the following error: Error: Cannot obtain a lock on data directory <path>/.bitcoin. Bitcoin ABC is probably already running. Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D6299
Co-authored-by: sipa <pieter@wuille.net> Github-Pull: bitcoin#18700 Rebased-From: e8fa0a3
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
Note that this patch only answers the question WSL2 uts_name.version string: $ uname -a
Linux DESKTOP-TPEL8HN 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux I am running on Win10 WSL2 with Ubuntu 20.04. See the attached C++ test program to convince yourself. #include <iostream>
#include <string>
#include <sys/file.h>
#include <sys/utsname.h>
static bool IsWSL(std::string *out = nullptr) {
struct utsname uname_data;
auto res = uname(&uname_data);
std::string ver{uname_data.version};
bool ret = res == 0 && ver.find("Microsoft") != std::string::npos;
if (out) *out = std::move(ver);
return ret;
}
int main()
{
using std::cout;
std::string ver;
auto b = IsWSL(&ver);
cout << "Ver: " << ver << '\n';
cout << "IsWSL?: " << b << '\n';
return 0;
} I don't know if the bug #18622 is still an issue on WSL2. I'm going to try and determine this and write back here when I do. |
Follow-up: Can confirm that the upstream bug from microsoft on Suggest: Rename this function to |
2b79ac7 Clean up separated ban/discourage interface (Pieter Wuille) 0477348 Replace automatic bans with discouragement filter (Pieter Wuille) e7f06f9 test: remove Cirrus CI FreeBSD job (fanquake) eb6b82a qa: Test concurrent wallet loading (João Barbosa) c9b49d2 wallet: Handle concurrent wallet loading (João Barbosa) cf0b5a9 tests: Check that segwit inputs in psbt have both UTXO types (Andrew Chow) 3228b59 psbt: always put a non_witness_utxo and don't remove it (Andrew Chow) ed5ec30 psbt: Allow both non_witness_utxo and witness_utxo (Andrew Chow) 68e0e6f rpc: show both UTXOs in decodepsbt (Andrew Chow) 27786d0 trivial: Suggested cleanups to surrounding code (Russell Yanofsky) 654420d wallet: Minimal fix to restore conflicted transaction notifications (Russell Yanofsky) febebc4 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) 5c7151a gui: update Qt base translations for macOS release (fanquake) c219d21 build: improved output of configure for build OS (sachinkm77) 0596a6e util: Don't reference errno when pthread fails. (MIZUTA Takeshi) Pull request description: Currently backports the following to the 0.20 branch: * #18700 - Fix locking on WSL using flock instead of fcntl * #18982 - wallet: Minimal fix to restore conflicted transaction notifications * #19059 - gui: update Qt base translations for macOS release * #19152 - build: improve build OS configure output * #19194 - util: Don't reference errno when pthread fails. * #19215 - psbt: Include and allow both non_witness_utxo and witness_utxo for segwit inputs * #19219 - Replace automatic bans with discouragement filter * #19300 - wallet: Handle concurrent wallet loading ACKs for top commit: amitiuttarwar: ACK 0477348 2b79ac7 by comparing to original changes, double checking the diff sipa: utACK 2b79ac7 laanwj: ACK 2b79ac7 Tree-SHA512: e5eb31d08288fa4a6e8aba08e60b16ce1988f14f249238b1cfd18ab2c8f6fcd9f07e3c0c491d32d2361fca26e3037fb0374f37464bddcabecea29069d6737539
Summary: Co-authored-by: sipa <pieter@wuille.net> Backport of Core [[bitcoin/bitcoin#18700 | PR18700]] Test Plan: On WSL ninja check ninja check-functional Confirm all tests pass except `feature_proxy.py` (fails), `rpc_bind.py` (skipped), and `interface_zmq.py` (skipped). Using two different consoles: src/bitcoind Verify this fails on the second console with the following error: Error: Cannot obtain a lock on data directory <path>/.bitcoin. Bitcoin ABC is probably already running. Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D6299
Co-authored-by: sipa <pieter@wuille.net> Github-Pull: #18700 Rebased-From: e8fa0a3
Co-authored-by: sipa <pieter@wuille.net> Github-Pull: bitcoin#18700 Rebased-From: e8fa0a3
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
e8fa0a3 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) Pull request description: Fixes bitcoin#18622 A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible. From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/ If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug. ACKs for top commit: laanwj: Code review ACK e8fa0a3 Tree-SHA512: ca1009e171970101f1dc2332c5e998717aee00eebc80bb586b826927a74bd0d4c94712e46d1396821bc30533d76deac391b6e1c406c406865661f57fa062c702
8c4b365 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson) 189de2f Avoid redefine warning (Peter Bushnell) Pull request description: Straight forward backport of bitcoin#15782 and bitcoin#18700 to address a bug in WSL1 environments that results in improper locking behavior; ie, a directory lock is not made exclusive as intended, thus resulting in multiple instances of the wallet/daemon being able to access the same datadir simultaneously instead of erroring out due to a locking conflict as intended. This is specific to WSL1 environments, as WSL2 (not yet fully supported/documented), standard linux, and macOS environments behave as intended. ACKs for top commit: furszy: good, utACK 8c4b365 random-zebra: utACK 8c4b365 and merging... Tree-SHA512: 7ba7d054a858baae4df2ae6daa4d3ffc694bcff0b48958ba28b3203d6dc0d950d25596eb324dc3973b09a65a251946c29be035959b3da7ecf126f9110b7056d0
Fixes #18622
A bug in WSL means that fcntl does not exclusively lock files, allowing multiple instances of bitcoin to use the same datadir. If we instead use flock, it works correctly. Passes Travis, but testing on some OS variety would be sensible.
From what I can tell, flock and fcntl don't work with each other on linux, so it would still be possible to run a node with this code change and a node before it with the same datadir (this isn't true for Mac/FreeBSD). flock also doesn't support NFS on MacOS and linux<2.6.12 while fcntl did. See here for example: https://gavv.github.io/articles/file-locks/
If changing to flock for all systems is inadvisable, it would also be possible to just detect WSL and use flock when on that platform to avoid the bug.