leveldb: Win32WritableFile without memory mapping #6917

laanwj · 2015-10-30T21:43:14Z

Use a simple Win32WritableFile, equivalent to PosixWritableFile in posix_env.cc

The goal is to solve leveldb corruption issues on windows - this issue has been reported many times, see #5865, #6727, #5610, ...

In my testing with this change I was unable to cause the database to be corrupted when crashing a windows laptop in various ways using notmyfault. Without the change it happened the first time I tried.

Use a simple Win32WritableFile, equivalent to PosixWritableFile in posix_env.cc

sipa · 2015-10-30T21:45:49Z

Please submit to https://github.com/bitcoin/leveldb first?

laanwj · 2015-10-30T21:51:45Z

Thought about that, but I think it gets more exposure and testing here.
(obviously it should still go there when accepted...)

sipa · 2015-10-30T22:06:44Z

@laanwj Sure!

jgarzik · 2015-10-30T22:31:51Z

ut ACK

dcousens · 2015-10-31T00:24:24Z

ut ACK

sipa · 2015-10-31T04:27:28Z

We should probably benchmark a reindex with this?

gmaxwell · 2015-10-31T04:38:55Z

@sipa Yes, and if its much slower it's reason to figure out whats strictly required for the syncs. And if it's not, we could call it done. Reindex time right now is ruined by signature validation, if someone benchmarks that they must be sure to bypass.

laanwj · 2015-10-31T07:40:21Z

This is the same as how it works on POSIX already, directly converted to WIN32 API. By all means, benchmark, but IMO we should apply this nevertheless (if it has no new bugs). We should first get rid of stability issues and only then worry about optimization. This will likely save many people from having to do a reindex at all.

maflcko · 2015-10-31T08:45:27Z

Concept ACK

Flush() is supposed to clear application-side buffers (like fflush). As this writable file implementation uses the OS function directly, there are no buffers to flush.

laanwj · 2015-10-31T09:12:00Z

Last commit is optional and potentially risky. From what I understand, leveldb's WritableFiles are supposed to do their own caching, and flush() is supposed to flush this cache to the OS (its implementation is fflush(FILE*) in POSIX). As this implementation uses OS calls directly, there is no need for that. With that reasoning, Flush() can be empty.

... it's of course also possible to use libc FILE* fwrite fflush on Windows instead of using the WIN32 API directly, then use this hack to sync, this will provide local buffering, but I was afraid that extra level of buffering may introduce syncing issues (see my post below for an implementation of this)

Then again these are all performance concerns. As said above it may be wrong to worry about that here, I'm mostly concerned with this being correct.

Diapolo · 2015-10-31T09:18:28Z

Isn't LevelDB dead at all? Their Github repo seemed to be not actively developed on the last time I checked.

laanwj · 2015-10-31T09:19:56Z

@Diapolo That is a good observation. That's why I'm doing an attempt at maintaining it.
Can you help testing?

Diapolo · 2015-10-31T09:25:29Z

@laanwj Yes, will first do a -reindex on testnet to be able to compare before and after... after that I'm going to kill the testnet instance and see if it's corrupting.

laanwj · 2015-10-31T09:26:02Z

Thanks!

laanwj · 2015-10-31T11:02:23Z

Here is an alternative implementation of Win32WritableFile that uses application-buffered libc primitives (fopen, fwrite, ...): https://github.com/laanwj/bitcoin/tree/2015_10_leveldb_win_nomap2 . Barely tested, and I'd not necessarily suggest using it instead of this, but if someone is going to benchmark it should be included (as it reduces the number of system calls in case of lots of small writes, which we don't do, but leveldb may internally...).

laanwj · 2015-11-01T09:03:55Z

@jonasschnelli it would be useful to have executables here, mind pointing your nightly builder at this?

jonasschnelli · 2015-11-01T14:05:59Z

Here we go: https://bitcoin.jonasschnelli.ch/pulls/6917/

NicolasDorier · 2015-11-01T14:59:38Z

I am so glad about it you can't imagine the time I lost because of this bug. This is the only reason I don't have a full node on my laptop. Trying it right now.

sipa · 2015-11-01T20:20:56Z

src/leveldb/util/env_win.cc

-    return ((x + y - 1) / y) * y;
+    std::wstring path;
+    ToWidePath(fname, path);
+    DWORD Flag = PathFileExistsW(path.c_str()) ? OPEN_EXISTING : CREATE_ALWAYS;


Any reason not to use OPEN_ALWAYS (which should open if exists, and create if not, according to https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx)?

Oh, this code is just moved.

The goal is to have f=fopen(filename, "wb") semantics. It's possible that this could be improved, but indeed I just copied the code from the previous WritableFile implementation.

sipa · 2015-11-01T21:04:01Z

@laanwj We don't do small writes anyway, so I doubt an application-level cache would help.

sipa · 2015-11-01T22:07:16Z

Concept ACK in any case.

laanwj · 2015-11-02T02:29:49Z

@laanwj We don't do small writes anyway, so I doubt an application-level cache would help.

We don't, but maybe leveldb does internally? (WritableFiles are used for all of leveldb's file creation and writing, also when building/rebuilding tables, not just to write out database batches. Not sure how careful it is to not, say, call Append multiple times to write different parts of data structures)

jonasschnelli · 2015-11-02T14:10:24Z

I just have started a fresh node (mainnet / this PR) on a win8.1pro with enabled mcafee antivirus (SSD / 2GB RAM / 1.4 GHZ Core i5) ... will report soon.

jonasschnelli · 2015-11-03T09:25:35Z

Have synced up to 318301,.. twice restarted bitcoin-qt (just to see if stop / start / checkblocks works). No problems with the database so far.

My node is still syncing (very slow, serval blocks per minutes).
But a strange problem appeared.
Qt console window says I'm no block 318301, ... looking in the debug.log i can see that im 318613...
Peers windows is updating... but somehow with missing data.

Peers:

Taskmanager:

laanwj · 2015-11-03T11:09:07Z

@jonasschnelli that's a known issue and unrelated to these changes, see #5664

NicolasDorier · 2015-11-04T06:36:58Z

Pro tip for corruption problem : Attach a disk, index the blockchain data on it, unplug savagely.

I had corruption everytimes my cat played with the ethernet cable between my computer and my network disk.

I did not managed to reproduce the problem so far by just killing the process, I'll try once I find a spare disk somewhere.

sipa · 2015-11-04T06:38:22Z

I'm pretty surprised that having the blockchain database on a network disk works at all...

NicolasDorier · 2015-11-04T06:41:47Z

Mapped Drive, then just pointing to SMB share. I'm not at my office, so I'll try with external drive unplugged savagely instead of real network disk as soon as I can.

sipa · 2015-11-04T06:43:57Z

Well it's good to know that it seems to work even with a network
filesystem. But network filesystems usually offer much weaker
synchronization guarantees, so often are inappropriate for databases with
strong consistency requirements.

NicolasDorier · 2015-11-04T07:15:45Z

Yes I agree, this is why I switched to a normal VM later on Microsoft Azure. But since I've moved to VM they are sometimes rebooted automatically for maintainance purposes, and it got corrupted several time.

The good thing about mapped disk is that thanks to weaker synchronization guarantee, it makes easier to reproduce data corruption problems ! :)

laanwj · 2015-11-04T10:04:50Z

The issue that this change tries to improve is extremely easy to reproduce already: it happens nearly every time bitcoind or bitcoin-qt running on Windows crashes, or the Windows crashes, or the power is pulled. This happens for local filesystems on drives connected through SATA.

The PR does not purport to fix issues with external harddisks, network filesystems, corrupting cables, felines, and other additional sources of complications. Being robust to those would be great, if realistically possible, but is not the immediate goal here.

jonasschnelli · 2015-11-04T12:11:36Z

I have powered down (unclean sudden shutdown) my Win8.1 VM and restarted Windows and Bitcoin-Qt. Bitcoin-Qt started without issues and is ~ on the same height (353900).
I'll wait now a couple of minutes and will power down (unplug) the VM host system (MacMini) and see, what happens then.

Would it be worth to do the same procedure with the current master WITHOUT this PR to compare it?

laanwj · 2015-11-04T12:45:49Z

Would it be worth to do the same procedure with the current master WITHOUT this PR to compare it?

Sure. If you have a VM environment where you can take a snapshot, that's helpful, you can return to that after messing up the db.

jonasschnelli · 2015-11-04T12:47:46Z

I made a snapshot before the first sudden shutdown. But just powered down the host by unplugging the power cable... Best chain could be activated and the node continues syncing. Will do some more shutdowns and see if nothing happens,... then switch to current master and try again.

jonasschnelli · 2015-11-04T13:21:57Z

Now, after a "power off" (sudden shutdown), the db is corrupted (still using this PR).

"Rebuild the block databases" ends with "error opening database":

Log:

laanwj · 2015-11-04T14:13:13Z

Re: "bad undo data", see #6923 - it's a potential issue, but not something that could be solved by this (as it isn't a leveldb problem)

jonasschnelli · 2015-11-04T14:43:38Z

I now did restore serval times back to my snapshot and did some sudden power downs. ~50% it will end up with a corrupt database (100% with "bad undo data").

Will now try to run my snapshot with current master without this PR.

laanwj · 2015-11-04T14:54:06Z

That's strange - I only managed to get the undo data error once in all my tries, whereas (without this PR) it produced consistent leveldb errors.

gmaxwell · 2015-11-04T15:37:55Z

Difference in VM flushing behavior, I guess --- it may just be that the VM buffers and reorders writes and ignores fsync.

Although... we appear to be missing a FileCommit in UndoWriteToDisk-- I think we need to have synced the blocks and undo before calling the insert.

It would probably be better for performance if it wrote the block then undo, then did the syncs on both however.

Edit: oh nevermind, FlushBlockFile hits the undo file too. :-/

jonasschnelli · 2015-11-04T15:50:17Z

With the current master (without this PR), after every sudden shut down, i get "Corruptions: error in middle of record".

This PR looks after an improvement but getting #6923 for roughly 50% of power-off shutdowns.

laanwj · 2015-11-04T16:49:44Z

If it improves things, that is good.

Filed this for the leveldb repository: bitcoin-core/leveldb-old#9

laanwj · 2015-11-04T16:51:39Z

Closing the pull here, as it has been moved to the leveldb repository. It should come back to the bitcoin repository in a subtree update.

leveldb: Win32WritableFile without memory mapping

256bea3

Use a simple Win32WritableFile, equivalent to PosixWritableFile in posix_env.cc

laanwj added the UTXO Db and Indexes label Oct 30, 2015

leveldb: Don't do anything in Flush()

2fc3234

Flush() is supposed to clear application-side buffers (like fflush). As this writable file implementation uses the OS function directly, there are no buffers to flush.

laanwj mentioned this pull request Oct 31, 2015

Backport chainstate obfuscation to 0.11 #6919

Closed

laanwj mentioned this pull request Oct 31, 2015

UndoReadFromDisk: Checksum mismatch #6923

Closed

sipa reviewed Nov 1, 2015
View reviewed changes

leveldb: report filename in errors in Win32WritableFile

d3ad568

laanwj added the Windows label Nov 4, 2015

laanwj mentioned this pull request Nov 4, 2015

leveldb: Win32WritableFile without memory mapping bitcoin-core/leveldb-old#9

Merged

laanwj closed this Nov 4, 2015

This was referenced Nov 5, 2015

Update LevelDB tree to include #6917 (0.11) #6945

Merged

Update LevelDB tree to include #6917 (0.10) #6946

Merged

harding mentioned this pull request Nov 8, 2015

[Docs] First-draft release notes for 0.11.2RC1 #6968

Merged

3 tasks

maflcko mentioned this pull request Dec 19, 2015

Bitcoin Core is unstable in the presence of sudden OS crashes #7233

Closed

laanwj mentioned this pull request Sep 29, 2016

Do not include env_win.cc on non-Windows systems #8826

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

leveldb: Win32WritableFile without memory mapping #6917

leveldb: Win32WritableFile without memory mapping #6917

Uh oh!

Conversation

laanwj commented Oct 30, 2015

Uh oh!

sipa commented Oct 30, 2015

Uh oh!

laanwj commented Oct 30, 2015

Uh oh!

sipa commented Oct 30, 2015

Uh oh!

jgarzik commented Oct 30, 2015

Uh oh!

dcousens commented Oct 31, 2015

Uh oh!

sipa commented Oct 31, 2015 via email

Uh oh!

gmaxwell commented Oct 31, 2015

Uh oh!

laanwj commented Oct 31, 2015

Uh oh!

maflcko commented Oct 31, 2015

Uh oh!

laanwj commented Oct 31, 2015

Uh oh!

Diapolo commented Oct 31, 2015

Uh oh!

laanwj commented Oct 31, 2015

Uh oh!

Diapolo commented Oct 31, 2015

Uh oh!

laanwj commented Oct 31, 2015

Uh oh!

laanwj commented Oct 31, 2015

Uh oh!

laanwj commented Nov 1, 2015

Uh oh!

jonasschnelli commented Nov 1, 2015

Uh oh!

NicolasDorier commented Nov 1, 2015

Uh oh!

sipa Nov 1, 2015

Choose a reason for hiding this comment

Uh oh!

sipa Nov 1, 2015

Choose a reason for hiding this comment

Uh oh!

laanwj Nov 2, 2015

Choose a reason for hiding this comment

Uh oh!

sipa commented Nov 1, 2015

Uh oh!

sipa commented Nov 1, 2015

Uh oh!

laanwj commented Nov 2, 2015

Uh oh!

jonasschnelli commented Nov 2, 2015

Uh oh!

jonasschnelli commented Nov 3, 2015

Uh oh!

laanwj commented Nov 3, 2015

Uh oh!

NicolasDorier commented Nov 4, 2015

Uh oh!

sipa commented Nov 4, 2015 via email

Uh oh!

NicolasDorier commented Nov 4, 2015

Uh oh!

sipa commented Nov 4, 2015

Uh oh!

NicolasDorier commented Nov 4, 2015

Uh oh!

laanwj commented Nov 4, 2015

Uh oh!

jonasschnelli commented Nov 4, 2015

Uh oh!

laanwj commented Nov 4, 2015

Uh oh!

jonasschnelli commented Nov 4, 2015