Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Nov 9, 2023

The code not only modifies block dat files, but also leveldb files, which may be of smaller size. Such corruption may not force leveldb to abort, according to the intermittent test failures.

Fix the intermittent test failures by reverting 5ab6419 .

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 9, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK kevkevinpal, theStack, fjahr

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@DrahtBot DrahtBot added the Tests label Nov 9, 2023
@@ -1,11 +1,10 @@
#!/usr/bin/env python3
# Copyright (c) 2021-2022 The Bitcoin Core developers
# Copyright (c) 2021-present The Bitcoin Core developers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running this command
grep -nr --text "\-present" ./src

I only see these files using present

./src/streams.cpp:1:// Copyright (c) 2009-present The Bitcoin Core developers
./src/kernel/mempool_removal_reason.cpp:1:// Copyright (c) 2016-present The Bitcoin Core developers
./src/kernel/mempool_removal_reason.h:1:// Copyright (c) 2016-present The Bitcoin Core developers

if this is something we're going towards would we want to update the copy right of all files whenever there is a change to the file in any future PR's?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripted-diff for all files should be easy to rebase, see the thread: #26817 (comment)

Let me know if I should drop it here and leave it for later. I don't really mind either way, as I mostly care about not running into intermittent issues.

@kevkevinpal
Copy link
Contributor

lgtm ACK 44445ae

@fanquake
Copy link
Member

cc @L0laL33tz @theStack

@fjahr
Copy link
Contributor

fjahr commented Nov 10, 2023

Can you link to the intermittent failures? Why not reduce the values instead?

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

Can you link to the intermittent failures? Why not reduce the values instead?

You can simply run the test in a loop to see it fail. For example:

./test/functional/feature_init.py --timeout-factor=.1 --randomseed=4861605774237699286

...

AssertionError: [node 0] bitcoind should have exited within 6s with expected error Error opening block database.

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

Why not reduce the values instead?

Yes, this patch is reducing the value.

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

Also, on a second thought, I don't really understand the goal of the randomized byte corruptions. Are they supposed to increase the test coverage for different code paths? If yes, I think it makes sense to list those code paths. Otherwise, if only a single error-code-path is covered, I think a deterministic test is better.

I understand that the good first issue mentioned that advanced perturbations should be tested as well, but may require pulling in a leveldb parser. I don't think it makes sense to pull in a leveldb parser for this, unless there is a good reason. (Recall that Bitcoin Core does not support running on hardware that corrupts the datadir). If there was a "honest" pertubation caused by a user setting or different leveldb version that causes issues, that'd be a different case, and it would be a good reason to check that specifically.

The same holds for mutating block files.

Other than that, I don't think it makes sense to add a test for this. And in any case, adding a test for this, or modifying the existing test, should be a separate follow-up from fixing a bug by restoring the previous test.

@fjahr
Copy link
Contributor

fjahr commented Nov 10, 2023

Yes, this patch is reducing the value.

Maybe as a side-effect but it's rolling back the randomization completely. What I mean is reducing the max value of seek, i.e. 15000 to something lower that is safe.

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

Maybe as a side-effect but it's rolling back the randomization completely. What I mean is reducing the max value of seek, i.e. 15000 to something lower that is safe.

Why? Did you see my previous comment?

@fjahr
Copy link
Contributor

fjahr commented Nov 10, 2023

Why? Did you see my previous comment?

It's fine if you want to remove the randomization as well as you say in the other comment but that's not what I meant when I said "Why not reduce the values instead?". You replied to that "Yes, this patch is reducing the value.". But that is not the only thing that it does and that's why it's not what I suggested. What I suggested is reducing the max value in seek.

@L0laL33tz
Copy link
Contributor

If the test fails because the leveldb is smaller than the block files, I would propose keeping the randomization for the block files and adding a second deterministic test for leveldb?

@fjahr
Copy link
Contributor

fjahr commented Nov 10, 2023

./test/functional/feature_init.py --timeout-factor=.1 --randomseed=4861605774237699286

Hm, this doesn't fail for me on master.
$ ./test/functional/feature_init.py --timeout-factor=.1 --randomseed=4861605774237699286
2023-11-10T13:14:53.906000Z TestFramework (INFO): User supplied random seed 4861605774237699286
2023-11-10T13:14:53.907000Z TestFramework (INFO): PRNG seed is: 4861605774237699286
2023-11-10T13:14:53.907000Z TestFramework (INFO): Initializing test directory /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318
2023-11-10T13:14:54.678000Z TestFramework (INFO): Starting node and will exit after line b'Validating signatures for all blocks'
2023-11-10T13:14:54.848000Z TestFramework (INFO): Starting node and will exit after line b'scheduler thread start'
2023-11-10T13:14:54.971000Z TestFramework (INFO): Starting node and will exit after line b'Starting HTTP server'
2023-11-10T13:14:55.095000Z TestFramework (INFO): Starting node and will exit after line b'Loading P2P addresses'
2023-11-10T13:14:55.219000Z TestFramework (INFO): Starting node and will exit after line b'Loading banlist'
2023-11-10T13:14:55.342000Z TestFramework (INFO): Starting node and will exit after line b'Loading block index'
2023-11-10T13:14:55.483000Z TestFramework (INFO): Starting node and will exit after line b'Checking all blk files are present'
2023-11-10T13:14:55.678000Z TestFramework (INFO): Starting node and will exit after line b'Loaded best chain:'
2023-11-10T13:14:55.889000Z TestFramework (INFO): Starting node and will exit after line b'init message: Verifying blocks'
2023-11-10T13:14:56.121000Z TestFramework (INFO): Starting node and will exit after line b'init message: Starting network threads'
2023-11-10T13:14:56.508000Z TestFramework (INFO): Starting node and will exit after line b'net thread start'
2023-11-10T13:14:56.901000Z TestFramework (INFO): Starting node and will exit after line b'addcon thread start'
2023-11-10T13:14:57.299000Z TestFramework (INFO): Starting node and will exit after line b'initload thread start'
2023-11-10T13:14:57.720000Z TestFramework (INFO): Starting node and will exit after line b'txindex thread start'
2023-11-10T13:14:58.155000Z TestFramework (INFO): Starting node and will exit after line b'block filter index thread start'
2023-11-10T13:14:58.551000Z TestFramework (INFO): Starting node and will exit after line b'coinstatsindex thread start'
2023-11-10T13:14:58.953000Z TestFramework (INFO): Starting node and will exit after line b'msghand thread start'
2023-11-10T13:14:59.381000Z TestFramework (INFO): Starting node and will exit after line b'net thread start'
2023-11-10T13:14:59.815000Z TestFramework (INFO): Starting node and will exit after line b'addcon thread start'
2023-11-10T13:15:00.214000Z TestFramework (INFO): Starting node and will exit after line b'Verifying wallet'
2023-11-10T13:15:00.715000Z TestFramework (INFO): Test startup errors after removing certain essential files
2023-11-10T13:15:00.719000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000048.ldb
2023-11-10T13:15:00.720000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000054.ldb
2023-11-10T13:15:00.720000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000051.ldb
2023-11-10T13:15:00.721000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000046.ldb
2023-11-10T13:15:01.298000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/chainstate/000005.ldb
2023-11-10T13:15:01.305000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/chainstate/000090.ldb
2023-11-10T13:15:01.868000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/blk00000.dat
2023-11-10T13:15:02.440000Z TestFramework (INFO): Test startup errors after perturbing certain essential files
2023-11-10T13:15:02.473000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000061.ldb
2023-11-10T13:15:02.473000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000059.ldb
2023-11-10T13:15:02.473000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/index/000066.ldb
2023-11-10T13:15:02.673000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/chainstate/000005.ldb
2023-11-10T13:15:02.674000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/chainstate/000097.ldb
2023-11-10T13:15:02.674000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/chainstate/000092.ldb
2023-11-10T13:15:02.674000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/chainstate/000090.ldb
2023-11-10T13:15:02.935000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318/node0/regtest/blocks/blk00000.dat
2023-11-10T13:15:03.235000Z TestFramework (INFO): Stopping nodes
2023-11-10T13:15:03.236000Z TestFramework (INFO): Cleaning up /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_oiro5318 on exit
2023-11-10T13:15:03.236000Z TestFramework (INFO): Tests successful

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

Hm, this doesn't fail for me on master.

Jup, that is what I mean when I say we don't want to pull in a leveldb dependency into this test, unless there is a reason. leveldb has different code paths depending on the operating system. You are using macOS and I was using Linux. I presume you'll be able to reproduce if you also use Linux. Alternatively, you can try to run the test in a loop on macOS, but I can't do that, because I don't have macOS.

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

If the test fails because the leveldb is smaller than the block files, I would propose keeping the randomization for the block files and adding a second deterministic test for leveldb?

Yes, I considered this and I think it could make sense. However, it may require reading the file size and using that as the maximum, or other changes. So for now, I made it trivial to review by just reverting, so that the CI and tests are no longer failing intermittently.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test fails because the leveldb is smaller than the block files, I would propose keeping the randomization for the block files and adding a second deterministic test for leveldb?

That was also my initial thought when reading this PR's description. In contrast to the chainstate leveldb files, the block files are fully in our control and possible faulty behaviour caused by perturbation should (hopefully) be identical on all operating systems, as there is no external library involved.

I agree with @maflcko though that the highest priority now is to fix the CI, so reverting seems to be a reasonable first step. If someone is working on a follow-up introducing the random perturbation, feel free to ping me as reviewer.

ACK 44445ae

@maflcko
Copy link
Member Author

maflcko commented Nov 10, 2023

If someone is working on a follow-up introducing the random perturbation, feel free to ping me as reviewer.

Same here, happy to review

@L0laL33tz
Copy link
Contributor

Cool, Im working on a new follow-up excluding the leveldb from randomization @maflcko @theStack

@fjahr
Copy link
Contributor

fjahr commented Nov 11, 2023

Jup, that is what I mean when I say we don't want to pull in a leveldb dependency into this test, unless there is a reason. leveldb has different code paths depending on the operating system. You are using macOS and I was using Linux. I presume you'll be able to reproduce if you also use Linux. Alternatively, you can try to run the test in a loop on macOS, but I can't do that, because I don't have macOS.

I couldn't reproduce the failure on Ubuntu 20.04 either.
root@vmd96750:~/bitcoin# ./test/functional/feature_init.py --timeout-factor=.1 --randomseed=4861605774237699286
2023-11-11T09:42:34.291000Z TestFramework (INFO): User supplied random seed 4861605774237699286
2023-11-11T09:42:34.291000Z TestFramework (INFO): PRNG seed is: 4861605774237699286
2023-11-11T09:42:34.296000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_20yqsr48
2023-11-11T09:42:36.852000Z TestFramework (INFO): Starting node and will exit after line b'Validating signatures for all blocks'
2023-11-11T09:42:37.111000Z TestFramework (INFO): Starting node and will exit after line b'scheduler thread start'
2023-11-11T09:42:37.264000Z TestFramework (INFO): Starting node and will exit after line b'Starting HTTP server'
2023-11-11T09:42:37.422000Z TestFramework (INFO): Starting node and will exit after line b'Loading P2P addresses'
2023-11-11T09:42:37.571000Z TestFramework (INFO): Starting node and will exit after line b'Loading banlist'
2023-11-11T09:42:37.722000Z TestFramework (INFO): Starting node and will exit after line b'Loading block index'
2023-11-11T09:42:37.875000Z TestFramework (INFO): Starting node and will exit after line b'Checking all blk files are present'
2023-11-11T09:42:38.041000Z TestFramework (INFO): Starting node and will exit after line b'Loaded best chain:'
2023-11-11T09:42:38.230000Z TestFramework (INFO): Starting node and will exit after line b'init message: Verifying blocks'
2023-11-11T09:42:38.417000Z TestFramework (INFO): Starting node and will exit after line b'init message: Starting network threads'
2023-11-11T09:42:38.747000Z TestFramework (INFO): Starting node and will exit after line b'net thread start'
2023-11-11T09:42:39.071000Z TestFramework (INFO): Starting node and will exit after line b'addcon thread start'
2023-11-11T09:42:39.365000Z TestFramework (INFO): Starting node and will exit after line b'initload thread start'
2023-11-11T09:42:39.620000Z TestFramework (INFO): Starting node and will exit after line b'txindex thread start'
2023-11-11T09:42:39.858000Z TestFramework (INFO): Starting node and will exit after line b'block filter index thread start'
2023-11-11T09:42:40.189000Z TestFramework (INFO): Starting node and will exit after line b'coinstatsindex thread start'
2023-11-11T09:42:40.494000Z TestFramework (INFO): Starting node and will exit after line b'msghand thread start'
2023-11-11T09:42:40.803000Z TestFramework (INFO): Starting node and will exit after line b'net thread start'
2023-11-11T09:42:41.104000Z TestFramework (INFO): Starting node and will exit after line b'addcon thread start'
2023-11-11T09:42:41.395000Z TestFramework (INFO): Starting node and will exit after line b'Verifying wallet'
2023-11-11T09:42:41.956000Z TestFramework (INFO): Test startup errors after removing certain essential files
2023-11-11T09:42:41.957000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000046.ldb
2023-11-11T09:42:41.957000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000054.ldb
2023-11-11T09:42:41.957000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000048.ldb
2023-11-11T09:42:41.957000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000051.ldb
2023-11-11T09:42:42.540000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/chainstate/000090.ldb
2023-11-11T09:42:42.540000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/chainstate/000005.ldb
2023-11-11T09:42:43.121000Z TestFramework (INFO): Deleting file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/blk00000.dat
2023-11-11T09:42:43.608000Z TestFramework (INFO): Test startup errors after perturbing certain essential files
2023-11-11T09:42:43.648000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000061.ldb
2023-11-11T09:42:43.648000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000066.ldb
2023-11-11T09:42:43.648000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/index/000059.ldb
2023-11-11T09:42:44.044000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/chainstate/000092.ldb
2023-11-11T09:42:44.046000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/chainstate/000090.ldb
2023-11-11T09:42:44.047000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/chainstate/000097.ldb
2023-11-11T09:42:44.047000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/chainstate/000005.ldb
2023-11-11T09:42:44.302000Z TestFramework (INFO): Perturbing file to ensure failure /tmp/bitcoin_func_test_20yqsr48/node0/regtest/blocks/blk00000.dat
2023-11-11T09:42:44.577000Z TestFramework (INFO): Stopping nodes
2023-11-11T09:42:44.577000Z TestFramework (INFO): Cleaning up /tmp/bitcoin_func_test_20yqsr48 on exit
2023-11-11T09:42:44.577000Z TestFramework (INFO): Tests successful

highest priority now is to fix the CI

@theStack Can you link to a CI failure? Maybe I can reproduce it with that info.

@maflcko
Copy link
Member Author

maflcko commented Nov 11, 2023

Did you run the test in a loop on Linux?

So far two people reported local issues. There are also CI failures, for example https://cirrus-ci.com/task/5031659184062464?logs=ci#L4069

(This one also shows a memory leak due to the corruption, so that'd be another reason to not fiddle with levedb files, as it will cause an CI error, even if Bitcoin Core shut down with the correct error message)

@fjahr
Copy link
Contributor

fjahr commented Nov 11, 2023

So I was actually able to reproduce this by just hardcoding the max values of the random range:

                    tf.seek(15000)
                    tf.write(b'1' * 2000)
failure details
$ ./test/functional/feature_init.py
2023-11-11T15:46:50.985000Z TestFramework (INFO): PRNG seed is: 7272134511591997577
2023-11-11T15:46:50.986000Z TestFramework (INFO): Initializing test directory /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm
2023-11-11T15:46:51.480000Z TestFramework (INFO): Starting node and will exit after line b'Validating signatures for all blocks'
2023-11-11T15:46:51.650000Z TestFramework (INFO): Starting node and will exit after line b'scheduler thread start'
2023-11-11T15:46:51.772000Z TestFramework (INFO): Starting node and will exit after line b'Starting HTTP server'
2023-11-11T15:46:51.896000Z TestFramework (INFO): Starting node and will exit after line b'Loading P2P addresses'
2023-11-11T15:46:52.019000Z TestFramework (INFO): Starting node and will exit after line b'Loading banlist'
2023-11-11T15:46:52.142000Z TestFramework (INFO): Starting node and will exit after line b'Loading block index'
2023-11-11T15:46:52.285000Z TestFramework (INFO): Starting node and will exit after line b'Checking all blk files are present'
2023-11-11T15:46:52.483000Z TestFramework (INFO): Starting node and will exit after line b'Loaded best chain:'
2023-11-11T15:46:52.694000Z TestFramework (INFO): Starting node and will exit after line b'init message: Verifying blocks'
2023-11-11T15:46:52.929000Z TestFramework (INFO): Starting node and will exit after line b'init message: Starting network threads'
2023-11-11T15:46:53.312000Z TestFramework (INFO): Starting node and will exit after line b'net thread start'
2023-11-11T15:46:53.709000Z TestFramework (INFO): Starting node and will exit after line b'addcon thread start'
2023-11-11T15:46:54.117000Z TestFramework (INFO): Starting node and will exit after line b'initload thread start'
2023-11-11T15:46:54.551000Z TestFramework (INFO): Starting node and will exit after line b'txindex thread start'
2023-11-11T15:46:54.983000Z TestFramework (INFO): Starting node and will exit after line b'block filter index thread start'
2023-11-11T15:46:55.385000Z TestFramework (INFO): Starting node and will exit after line b'coinstatsindex thread start'
2023-11-11T15:46:55.784000Z TestFramework (INFO): Starting node and will exit after line b'msghand thread start'
2023-11-11T15:46:56.215000Z TestFramework (INFO): Starting node and will exit after line b'net thread start'
2023-11-11T15:46:56.654000Z TestFramework (INFO): Starting node and will exit after line b'addcon thread start'
2023-11-11T15:46:57.052000Z TestFramework (INFO): Starting node and will exit after line b'Verifying wallet'
2023-11-11T15:46:57.555000Z TestFramework (INFO): Test startup errors after removing certain essential files
2023-11-11T15:46:57.557000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000048.ldb
2023-11-11T15:46:57.557000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000054.ldb
2023-11-11T15:46:57.558000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000051.ldb
2023-11-11T15:46:57.559000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000046.ldb
2023-11-11T15:46:58.132000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/000005.ldb
2023-11-11T15:46:58.133000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/000090.ldb
2023-11-11T15:46:58.704000Z TestFramework (INFO): Deleting file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/blk00000.dat
2023-11-11T15:46:59.279000Z TestFramework (INFO): Test startup errors after perturbing certain essential files
2023-11-11T15:46:59.303000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000061.ldb
2023-11-11T15:46:59.303000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000059.ldb
2023-11-11T15:46:59.303000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/blocks/index/000066.ldb
2023-11-11T15:46:59.506000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/000005.ldb
2023-11-11T15:46:59.506000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/000097.ldb
2023-11-11T15:46:59.507000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/000092.ldb
2023-11-11T15:46:59.507000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/000090.ldb
2023-11-11T15:47:59.510000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/Users/FJ/projects/clones/bitcoin/test/functional/test_framework/test_node.py", line 605, in assert_start_raises_init_error
    ret = self.process.wait(timeout=self.rpc_timeout)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 2038, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['/Users/FJ/projects/clones/bitcoin/src/bitcoind', '-datadir=/var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0', '-logtimemicros', '-debug', '-debugexclude=libevent', '-debugexclude=leveldb', '-debugexclude=rand', '-uacomment=testnode0', '-logthreadnames', '-logsourcelocations', '-loglevel=trace', '-txindex=1', '-blockfilterindex=1', '-coinstatsindex=1', '-checkblocks=200', '-checklevel=4']' timed out after 60 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/FJ/projects/clones/bitcoin/test/functional/test_framework/test_framework.py", line 132, in main
    self.run_test()
  File "/Users/FJ/projects/clones/bitcoin/./test/functional/feature_init.py", line 144, in run_test
    start_expecting_error(err_fragment)
  File "/Users/FJ/projects/clones/bitcoin/./test/functional/feature_init.py", line 51, in start_expecting_error
    node.assert_start_raises_init_error(
  File "/Users/FJ/projects/clones/bitcoin/test/functional/test_framework/test_node.py", line 635, in assert_start_raises_init_error
    self._raise_assertion_error(assert_msg)
  File "/Users/FJ/projects/clones/bitcoin/test/functional/test_framework/test_node.py", line 176, in _raise_assertion_error
    raise AssertionError(self._node_msg(msg))
AssertionError: [node 0] bitcoind should have exited within 60s with expected error Error opening block database.
2023-11-11T15:47:59.581000Z TestFramework (INFO): Stopping nodes
2023-11-11T15:47:59.581000Z TestFramework (WARNING): Not cleaning up dir /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm
2023-11-11T15:47:59.582000Z TestFramework (ERROR): Test failed. Test logging available at /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/test_framework.log
2023-11-11T15:47:59.582000Z TestFramework (ERROR):
2023-11-11T15:47:59.583000Z TestFramework (ERROR): Hint: Call /Users/FJ/projects/clones/bitcoin/test/functional/combine_logs.py '/var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm' to consolidate all logs
2023-11-11T15:47:59.583000Z TestFramework (ERROR):
2023-11-11T15:47:59.583000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
2023-11-11T15:47:59.583000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
2023-11-11T15:47:59.583000Z TestFramework (ERROR):

I tried to look at the file sizes of the ldb files involved but the file where the test fails is deleted, not really sure why as of now

ls result
$ la /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_6x2iwdcm/node0/regtest/chainstate/
total 88
drwx------   8 FJ  staff    256 Nov 11 16:46 ./
drwx------  16 FJ  staff    512 Nov 11 16:46 ../
-rw-------   1 FJ  staff  17000 Nov 11 16:46 000005.ldb
-rw-------   1 FJ  staff      0 Nov 11 16:46 000105.log
-rw-------   1 FJ  staff  14919 Nov 11 16:46 000106.ldb
-rw-------   1 FJ  staff     16 Nov 11 16:46 CURRENT
-rw-------   1 FJ  staff      0 Nov 11 16:25 LOCK
-rw-------   1 FJ  staff    347 Nov 11 16:46 MANIFEST-000104

Instead I edited the log message to include a file size for the ldb files:

                self.log.info(f"Perturbing file to ensure failure {target_file}, size: {os.path.getsize(target_file)}")
failure result
2023-11-11T15:56:25.464000Z TestFramework (INFO): Test startup errors after perturbing certain essential files
2023-11-11T15:56:25.506000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/blocks/index/000061.ldb, size: 193
2023-11-11T15:56:25.506000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/blocks/index/000059.ldb, size: 28142
2023-11-11T15:56:25.506000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/blocks/index/000066.ldb, size: 193
2023-11-11T15:56:25.714000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/chainstate/000005.ldb, size: 198
2023-11-11T15:56:25.714000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/chainstate/000097.ldb, size: 451
2023-11-11T15:56:25.714000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/chainstate/000092.ldb, size: 451
2023-11-11T15:56:25.714000Z TestFramework (INFO): Perturbing file to ensure failure /var/folders/9z/n7rz_6cj3bq__11k5kcrsvvm0000gn/T/bitcoin_func_test_i9rjmk54/node0/regtest/chainstate/000090.ldb, size: 14919
2023-11-11T15:57:25.719000Z TestFramework (ERROR): Assertion failed

So there are smaller ldb files that are perturbed first that don't trigger the issue, it's actually a bigger one where the failure happens.

I haven't looked into this deeper yet but my best guess would be that ldb doesn't care about this last, big file and doesn't check it?

@fjahr
Copy link
Contributor

fjahr commented Nov 11, 2023

I haven't looked into this deeper yet but my best guess would be that ldb doesn't care about this last, big file and doesn't check it?

No, that's not it, otherwise the hardcoded test wouldn't fail. If we don't understand what leveldb is doing then I guess we should probably remove perturbing its files altogether.

@maflcko
Copy link
Member Author

maflcko commented Nov 11, 2023

If we don't understand what leveldb is doing then I guess we should probably remove perturbing its files altogether.

sgtm, for a follow-up or alternative pull. The goal of this pull is not to change any test, but only to restore the previous code, which didn't intermittently fail.

@fjahr
Copy link
Contributor

fjahr commented Nov 11, 2023

ACK 44445ae

@maflcko maflcko added this to the 27.0 milestone Nov 12, 2023
@fanquake fanquake merged commit e11b758 into bitcoin:master Nov 13, 2023
@maflcko maflcko deleted the 2311-test-ldb- branch November 13, 2023 10:17
@bitcoin bitcoin locked and limited conversation to collaborators Nov 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants