Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Apr 22, 2020

When a validationinterface has itself unregistered in one thread, but is about to get executed in another thread [1], there is a race:

  • The validationinterface destructing itself
  • The validationinterface getting dereferenced for execution

[1]

REVERSE_LOCK(lock);
f(*it->callbacks);

This happens in the miner. More generally it happens everywhere where at least one thread is generating notifications and another one is unregistering a validationinterface.

This issue has been fixed in commit ab31b9d, but the fix has not been applied to the miner.

Example where this happened in practice: https://travis-ci.org/github/bitcoin/bitcoin/jobs/675322230#L4414

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 22, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@practicalswift
Copy link
Contributor

Oh, great find!

How did you find this one and in what commit was it introduced? :)

@maflcko
Copy link
Member Author

maflcko commented Apr 22, 2020

Good question. This is the traceback:
118/145 - rpc_getblockstats.py failed, Duration: 3 s

stdout:

2020-04-15T15:10:09.960000Z TestFramework (INFO): Initializing test directory /home/travis/build/bitcoin/bitcoin/ci/scratch/test_runner/test_runner_₿_🏃_20200415_145914/rpc_getblockstats_25

2020-04-15T15:10:12.198000Z TestFramework (ERROR): Unexpected exception caught during testing

Traceback (most recent call last):

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 108, in _request

    return self._get_response()

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 166, in _get_response

    http_response = self.__conn.getresponse()

  File "/usr/lib/python3.6/http/client.py", line 1346, in getresponse

    response.begin()

  File "/usr/lib/python3.6/http/client.py", line 307, in begin

    version, status, reason = self._read_status()

  File "/usr/lib/python3.6/http/client.py", line 276, in _read_status

    raise RemoteDisconnected("Remote end closed connection without"

http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 112, in main

    self.run_test()

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_getblockstats.py", line 97, in run_test

    self.load_test_data(test_data)

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_getblockstats.py", line 89, in load_test_data

    self.nodes[0].submitblock(b)

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__

    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 142, in __call__

    response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 113, in _request

    self.__conn.request(method, path, postdata, headers)

  File "/usr/lib/python3.6/http/client.py", line 1254, in request

    self._send_request(method, url, body, headers, encode_chunked)

  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request

    self.endheaders(body, encode_chunked=encode_chunked)

  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders

    self._send_output(message_body, encode_chunked=encode_chunked)

  File "/usr/lib/python3.6/http/client.py", line 1036, in _send_output

    self.send(msg)

  File "/usr/lib/python3.6/http/client.py", line 974, in send

    self.connect()

  File "/usr/lib/python3.6/http/client.py", line 946, in connect

    (self.host,self.port), self.timeout, self.source_address)

  File "/usr/lib/python3.6/socket.py", line 724, in create_connection

    raise err

  File "/usr/lib/python3.6/socket.py", line 713, in create_connection

    sock.connect(sa)

ConnectionRefusedError: [Errno 111] Connection refused

2020-04-15T15:10:12.261000Z TestFramework (INFO): Stopping nodes

2020-04-15T15:10:12.261000Z TestFramework.node0 (ERROR): Unable to stop node.

Traceback (most recent call last):

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_node.py", line 266, in stop_node

    self.stop(wait=wait)

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__

    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 142, in __call__

    response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 107, in _request

    self.__conn.request(method, path, postdata, headers)

  File "/usr/lib/python3.6/http/client.py", line 1254, in request

    self._send_request(method, url, body, headers, encode_chunked)

  File "/usr/lib/python3.6/http/client.py", line 1265, in _send_request

    self.putrequest(method, url, **skips)

  File "/usr/lib/python3.6/http/client.py", line 1118, in putrequest

    raise CannotSendRequest(self.__state)

http.client.CannotSendRequest: Request-sent

[node 0] Cleaning up leftover process

stderr:

Traceback (most recent call last):

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_getblockstats.py", line 165, in <module>

    GetblockstatsTest().main()

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 135, in main

    exit_code = self.shutdown()

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 242, in shutdown

    self.stop_nodes()

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 453, in stop_nodes

    node.stop_node(wait=wait)

  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_node.py", line 280, in stop_node

    raise AssertionError("Unexpected stderr {} != {}".format(stderr, expected_stderr))

AssertionError: Unexpected stderr =================================================================

==3550==ERROR: AddressSanitizer: stack-use-after-return on address 0x7fdbfbb7dc40 at pc 0x5567a688f34f bp 0x7fdbfee66f10 sp 0x7fdbfee66f08

READ of size 8 at 0x7fdbfbb7dc40 thread T4 (b-httpworker.0)

    #0 0x5567a688f34e in CMainSignals::NewPoWValidBlock(CBlockIndex const*, std::shared_ptr<CBlock const> const&)::$_15::operator()(CValidationInterface&) const /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/validationinterface.cpp:246:75

    #1 0x5567a688f34e in void MainSignalsInstance::Iterate<CMainSignals::NewPoWValidBlock(CBlockIndex const*, std::shared_ptr<CBlock const> const&)::$_15>(CMainSignals::NewPoWValidBlock(CBlockIndex const*, std::shared_ptr<CBlock const> const&)::$_15&&) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/validationinterface.cpp:83

    #2 0x5567a688f34e in CMainSignals::NewPoWValidBlock(CBlockIndex const*, std::shared_ptr<CBlock const> const&) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/validationinterface.cpp:246

    #3 0x5567a67a6cf1 in CChainState::AcceptBlock(std::shared_ptr<CBlock const> const&, BlockValidationState&, CChainParams const&, CBlockIndex**, bool, FlatFilePos const*, bool*) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/validation.cpp:3758:26

    #4 0x5567a67aa38c in ProcessNewBlock(CChainParams const&, std::shared_ptr<CBlock const>, bool, bool*) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/validation.cpp:3798:40

    #5 0x5567a64467b6 in submitblock(JSONRPCRequest const&) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/rpc/mining.cpp:947:21

    #6 0x5567a63fe3e6 in CRPCCommand::CRPCCommand(char const*, char const*, UniValue (*)(JSONRPCRequest const&), std::initializer_list<char const*>)::'lambda'(JSONRPCRequest const&, UniValue&, bool)::operator()(JSONRPCRequest const&, UniValue&, bool) const /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/./rpc/server.h:104:94

    #7 0x5567a63fdce8 in std::_Function_handler<bool (JSONRPCRequest const&, UniValue&, bool), CRPCCommand::CRPCCommand(char const*, char const*, UniValue (*)(JSONRPCRequest const&), std::initializer_list<char const*>)::'lambda'(JSONRPCRequest const&, UniValue&, bool)>::_M_invoke(std::_Any_data const&, JSONRPCRequest const&, UniValue&, bool&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:301:9

    #8 0x5567a602f02e in std::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(JSONRPCRequest const&, UniValue&, bool) const /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14

    #9 0x5567a654f18f in ExecuteCommand(CRPCCommand const&, JSONRPCRequest const&, UniValue&, bool) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/rpc/server.cpp:452:20

    #10 0x5567a654ea20 in CRPCTable::execute(JSONRPCRequest const&) const /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/rpc/server.cpp:435:17

    #11 0x5567a6942a88 in HTTPReq_JSONRPC(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/httprpc.cpp:208:40

    #12 0x5567a6386af0 in std::_Function_handler<bool (HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), bool (*)(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_M_invoke(std::_Any_data const&, HTTPRequest*&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:301:9

    #13 0x5567a69631e9 in std::function<bool (HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::operator()(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14

    #14 0x5567a6962e2b in HTTPWorkItem::operator()() /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/httpserver.cpp:55:9

    #15 0x5567a696c777 in WorkQueue<HTTPClosure>::Run() /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/httpserver.cpp:114:13

    #16 0x5567a695507f in HTTPWorkQueueRun(WorkQueue<HTTPClosure>*, int) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/httpserver.cpp:342:12

    #17 0x5567a6972c29 in void std::__invoke_impl<void, void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>(std::__invoke_other, void (*&&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&&, int&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:60:14

    #18 0x5567a6972aba in std::__invoke_result<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>::type std::__invoke<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>(void (*&&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&&, int&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:95:14

    #19 0x5567a6972881 in decltype(std::__invoke(_S_declval<0ul>(), _S_declval<1ul>(), _S_declval<2ul>())) std::thread::_Invoker<std::tuple<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:234:13

    #20 0x5567a6971fe1 in std::thread::_Invoker<std::tuple<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:243:11

    #21 0x5567a6971fe1 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:186

    #22 0x7fdc0a3a76de  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbd6de)

    #23 0x7fdc0b3436da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)

    #24 0x7fdc0965888e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)

Address 0x7fdbfbb7dc40 is located in stack of thread T6 (b-httpworker.2) at offset 3136 in frame

    #0 0x5567a6445a2f in submitblock(JSONRPCRequest const&) /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/rpc/mining.cpp:896

  This frame has 57 object(s):

    [32, 33) 'ref.tmp.i'

    [48, 192) 'ref.tmp' (line 898)

    [256, 288) 'agg.tmp'

    [320, 321) 'ref.tmp1' (line 898)

    [336, 368) 'agg.tmp6'

    [400, 401) 'ref.tmp7' (line 898)

    [416, 440) 'agg.tmp16'

    [480, 864) 'ref.tmp18' (line 898)

    [928, 960) 'agg.tmp19'

    [992, 993) 'ref.tmp20' (line 898)

    [1008, 1048) 'agg.tmp29'

    [1088, 1092) 'ref.tmp30' (line 898)

    [1104, 1136) 'agg.tmp37'

    [1168, 1169) 'ref.tmp38' (line 898)

    [1184, 1216) 'agg.tmp47'

    [1248, 1249) 'ref.tmp48' (line 898)

    [1264, 1288) 'agg.tmp57'

    [1328, 1360) 'agg.tmp64'

    [1392, 1393) 'ref.tmp65' (line 898)

    [1408, 1448) 'agg.tmp74'

    [1488, 1520) 'agg.tmp81'

    [1552, 1553) 'ref.tmp82' (line 898)

    [1568, 1600) 'agg.tmp91'

    [1632, 1633) 'ref.tmp92' (line 898)

    [1648, 1672) 'agg.tmp101'

    [1712, 1713) 'ref.tmp108' (line 898)

    [1728, 1752) 'agg.tmp117'

    [1792, 1928) 'agg.tmp118'

    [2000, 2032) 'agg.tmp119'

    [2064, 2065) 'ref.tmp120' (line 898)

    [2080, 2112) 'agg.tmp129'

    [2144, 2145) 'ref.tmp130' (line 898)

    [2160, 2184) 'agg.tmp139'

    [2224, 2256) 'agg.tmp150'

    [2288, 2320) 'agg.tmp151'

    [2352, 2384) 'ref.tmp152' (line 898)

    [2416, 2448) 'ref.tmp153' (line 898)

    [2480, 2481) 'ref.tmp154' (line 898)

    [2496, 2528) 'ref.tmp165' (line 898)

    [2560, 2561) 'ref.tmp166' (line 898)

    [2576, 2608) 'ref.tmp181' (line 898)

    [2640, 2672) 'ref.tmp182' (line 898)

    [2704, 2705) 'ref.tmp183' (line 898)

    [2720, 2752) 'ref.tmp194' (line 898)

    [2784, 2785) 'ref.tmp195' (line 898)

    [2800, 2816) 'blockptr' (line 912)

    [2832, 2864) 'ref.tmp326' (line 915)

    [2896, 2897) 'ref.tmp327' (line 915)

    [2912, 2944) 'ref.tmp369' (line 919)

    [2976, 2977) 'ref.tmp370' (line 919)

    [2992, 3024) 'hash' (line 922)

    [3056, 3072) 'criticalblock37' (line 924)

    [3088, 3104) 'criticalblock38' (line 937)

    [3120, 3121) 'new_block' (line 944)

    [3136, 3256) 'sc' (line 945) <== Memory access at offset 3136 is inside this variable

    [3296, 3328) 'ref.tmp469' (line 945)

    [3360, 3376) 'agg.tmp491'

HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork

      (longjmp and C++ exceptions *are* supported)

Thread T6 (b-httpworker.2) created by T0 here:

    #0 0x5567a5fbe02d in pthread_create (/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/bitcoind+0x1bc202d)

    #1 0x7fdc0a3a7994 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbd994)

    #2 0x5567a69717a0 in void __gnu_cxx::new_allocator<std::thread>::construct<std::thread, void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&>(std::thread*, void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23

    #3 0x5567a695c31b in void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&>(void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/vector.tcc:105:4

SUMMARY: AddressSanitizer: stack-use-after-return /home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/validationinterface.cpp:246:75 in CMainSignals::NewPoWValidBlock(CBlockIndex const*, std::shared_ptr<CBlock const> const&)::$_15::operator()(CValidationInterface&) const

Shadow bytes around the buggy address:

  0x0ffbff767b30: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767b40: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767b50: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767b60: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767b70: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

=>0x0ffbff767b80: f5 f5 f5 f5 f5 f5 f5 f5[f5]f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767b90: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767ba0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767bb0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767bc0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

  0x0ffbff767bd0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5

Shadow byte legend (one shadow byte represents 8 application bytes):

  Addressable:           00

  Partially addressable: 01 02 03 04 05 06 07 

  Heap left redzone:       fa

  Freed heap region:       fd

  Stack left redzone:      f1

  Stack mid redzone:       f2

  Stack right redzone:     f3

  Stack after return:      f5

  Stack use after scope:   f8

  Global redzone:          f9

  Global init order:       f6

  Poisoned by user:        f7

  Container overflow:      fc

  Array cookie:            ac

  Intra object redzone:    bb

  ASan internal:           fe

  Left alloca redzone:     ca

  Right alloca redzone:    cb

  Shadow gap:              cc

Thread T4 (b-httpworker.0) created by T0 here:

    #0 0x5567a5fbe02d in pthread_create (/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-pc-linux-gnu/src/bitcoind+0x1bc202d)

    #1 0x7fdc0a3a7994 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbd994)

    #2 0x5567a69717a0 in void __gnu_cxx::new_allocator<std::thread>::construct<std::thread, void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&>(std::thread*, void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23

    #3 0x5567a695c31b in void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&>(void (&)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*&, int&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/vector.tcc:105:4

==3550==ABORTING != 

@maflcko
Copy link
Member Author

maflcko commented Apr 22, 2020

I might write a test tomorrow with steps to reproduce

@maflcko
Copy link
Member Author

maflcko commented Apr 23, 2020

@practicalswift This is a race, so I couldn't find a test that is reproducible. Though, I have attached a unit test, that fails when run long enough in a loop.

All you need to do is ./configure --with-sanitizers=address and then run the tests with

export ASAN_OPTIONS=detect_leaks=0
make
while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done

@maflcko
Copy link
Member Author

maflcko commented Apr 23, 2020

If someone has issues reproducing the crash in the unit test, the following diff might help:

diff --git a/src/validationinterface.cpp b/src/validationinterface.cpp
index 11000774c0..6a311eeb44 100644
--- a/src/validationinterface.cpp
+++ b/src/validationinterface.cpp
@@ -11,6 +11,7 @@
 #include <primitives/block.h>
 #include <primitives/transaction.h>
 #include <scheduler.h>
+#include <util/time.h>
 
 #include <future>
 #include <unordered_map>
@@ -80,6 +81,7 @@ public:
             ++it->count;
             {
                 REVERSE_LOCK(lock);
+                UninterruptibleSleep(std::chrono::microseconds(500));
                 f(*it->callbacks);
             }
             it = --it->count ? std::next(it) : m_list.erase(it);

@fanquake
Copy link
Member

Concept ACK. Running just fa9f3a5 (no additional patch), I generally see a crash between 5 - 20 iterations:

/bitcoin# while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done
Running 1 test case...

*** No errors detected
Running 1 test case...

*** No errors detected
Running 1 test case...

*** No errors detected
Running 1 test case...

*** No errors detected
Running 1 test case...
=================================================================
==95792==ERROR: AddressSanitizer: unknown-crash on address 0x7f8f04f60dc0 at pc 0x55abc9388623 bp 0x7f8f057777f0 sp 0x7f8f057777e8
READ of size 8 at 0x7f8f04f60dc0 thread T4
    #0 0x55abc9388622 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14::operator()(CValidationInterface&) const /bitcoin/src/validationinterface.cpp:241:75
    #1 0x55abc9388622 in void MainSignalsInstance::Iterate<CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14>(CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14&&) /bitcoin/src/validationinterface.cpp:83
    #2 0x55abc9388622 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&) /bitcoin/src/validationinterface.cpp:241
    #3 0x55abc8b9c94c in validationinterface_tests::unregister_validation_interface_race::test_method()::$_0::operator()() const /bitcoin/src/test/validationinterface_tests.cpp:28:30
    #4 0x55abc8b9c94c in void std::__invoke_impl<void, validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>(std::__invoke_other, validationinterface_tests::unregister_validation_interface_race::test_method()::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60
    #5 0x55abc8b9c94c in std::__invoke_result<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>::type std::__invoke<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>(validationinterface_tests::unregister_validation_interface_race::test_method()::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95
    #6 0x55abc8b9c94c in decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244
    #7 0x55abc8b9c94c in std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253
    #8 0x55abc8b9c94c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196
    #9 0x7f8f0c6d4b2e  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbb2e)
    #10 0x7f8f0cccefa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
    #11 0x7f8f0c3a34ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)

Address 0x7f8f04f60dc0 is located in stack of thread T5 at offset 32 in frame
    #0 0x55abc8b9d1af in std::thread::_State_impl<std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_1> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196

  This frame has 1 object(s):
    [32, 40) 'sub.i.i.i.i.i' <== Memory access at offset 32 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T5 created by T0 here:
    #0 0x55abc7d1002d in pthread_create (/bitcoin/src/test/test_bitcoin+0x37702d)
    #1 0x7f8f0c6d4db4 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbdb4)
    #2 0x55abc8b97b1e in validationinterface_tests::unregister_validation_interface_race_invoker() /bitcoin/src/test/validationinterface_tests.cpp:19:1
    #3 0x55abc7ded77f in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118:11

SUMMARY: AddressSanitizer: unknown-crash /bitcoin/src/validationinterface.cpp:241:75 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14::operator()(CValidationInterface&) const
Shadow bytes around the buggy address:
  0x0ff2609e4160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e4170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e4180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e4190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e41a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff2609e41b0: 00 00 00 00 00 00 00 00[00]00 00 00 00 00 00 00
  0x0ff2609e41c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e41d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e41e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e41f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff2609e4200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
Thread T4 created by T0 here:
    #0 0x55abc7d1002d in pthread_create (/bitcoin/src/test/test_bitcoin+0x37702d)
    #1 0x7f8f0c6d4db4 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbdb4)
    #2 0x55abc8b97b1e in validationinterface_tests::unregister_validation_interface_race_invoker() /bitcoin/src/test/validationinterface_tests.cpp:19:1
    #3 0x55abc7ded77f in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118:11

==95792==ABORTING

@maflcko
Copy link
Member Author

maflcko commented Apr 26, 2020

cc @promag , @ryanofsky You seem qualified to review this?

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK bbbbf38.

Like @ryanofsky suggested, we should only have RegisterValidationInterface(shared_ptr) so this is a step in that direction.

bool accepted = ProcessNewBlock(Params(), blockptr, /* fForceProcessing */ true, /* fNewBlock */ &new_block);
UnregisterValidationInterface(&sc);
UnregisterSharedValidationInterface(sc);
if (!new_block && accepted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to call SyncWithValidationInterfaceQueue here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative to what? The notification we are listening to is never put in the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that after UnregisterValidationInterface + SyncWithValidationInterfaceQueue there is no pointer to sc in validation queue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does SyncWithValidationInterfaceQueue prevent new notifications from being scheduled for sc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New notifications won't be scheduled because the UnregisterValidationInterface above. Only problem is a concurrent notification being called, so SyncWithValidationInterfaceQueue ensures that case doesn't happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand. Yeah, correct.

@ryanofsky
Copy link
Contributor

Could you revert or move formatting changes in validationinterface.h/cpp files to a separate commit? I'm starting at that looking for actual changes but is it all reformatting?

@ryanofsky
Copy link
Contributor

Concept ACK. This is a good find and all changes look ok, but there's so much going on it's a headache to reverse engineer everything that's happening. Suggest making one change per commit or perhaps dropping some of the changes

@ryanofsky
Copy link
Contributor

It is also interesting if the bug described #18742 (comment) started happening recently. Unless I missed something, it seems like the bug has been present a very long time, and I wonder if #18524 made it more likely to happen.

@maflcko maflcko force-pushed the 2004-minerNoCrash branch from bbbbf38 to fa33a63 Compare April 27, 2020 14:23
@maflcko
Copy link
Member Author

maflcko commented Apr 27, 2020

Addressed feedback by @ryanofsky. No code changes, just more commits.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK fa33a63. Great find and changes are very clear now!

@promag
Copy link
Contributor

promag commented May 13, 2020

@jnewbery in this case it's the test that's fixed so what you suggest is to just add the file test and then explain how it could be broken.

@maflcko
Copy link
Member Author

maflcko commented May 13, 2020

I can put the test in /** ... */ in the first commit. This way:

  • People can still reproduce by removing the /** and */ and then running the test in a loop until it fails.
  • The diff in the last commit is applied to the test in the same way that it is applied to the miner, "proving" the fix is correct.
  • It doesn't break git bisect (well to some extent, as the bug will always exist in the functional tests in previous commit until the fix commit, and the unit test passes anyway most of the time with address sanitizer enabled and probably passes all the time without sanitizers enabled)

MarcoFalke added 4 commits May 13, 2020 19:57
This commit is (intentionally) adding a broken test. The test is broken
because it registering a subscriber object that can go out of scope
while events are still being sent.

To run the broken test and reproduce the bug:
  - Remove comment /** and */
  - ./configure --with-sanitizers=address
  - export ASAN_OPTIONS=detect_leaks=0
  - make
  - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done
…hValidationInterfaceQueue

For the purpose of this test the two have the same outcome, but this one
is shorter and avoids a sleep for 0.1 seconds.
This is achieved by switching to a shared_ptr.

Also, switch the validationinterfaces in the tests to use shared_ptrs
for the same reason.
@maflcko maflcko force-pushed the 2004-minerNoCrash branch from fa6d1a0 to 7777f2a Compare May 13, 2020 23:58
@maflcko
Copy link
Member Author

maflcko commented May 14, 2020

Pushed an empty diff to defuse the test. Added instructions to the commit message on how to add the fuse back in.

$ git diff fa6d1a04ed 7777f2a4bb | wc
      0       0       0

@promag
Copy link
Contributor

promag commented May 14, 2020

@MarcoFalke did you see #18742 (comment)?

@maflcko
Copy link
Member Author

maflcko commented May 14, 2020

@promag Yes, but that causes a deadlock on shutdown. Also the shutdown sequence should be bug free right now because the message handler is stopped before any unregisters should happen. I still like your idea, but I think it should be prepared, reviewed and merged for 0.21, not as a backport.

For the backport we should aim at a fix that has little chance of breaking stuff. The fix here should satisfy this because the same fix has already been applied to the wallet in #18338, which is also in 0.20

@laanwj
Copy link
Member

laanwj commented May 14, 2020

Code review ACK 7777f2a

@promag
Copy link
Contributor

promag commented May 14, 2020

Code review ACK 7777f2a.

@fanquake fanquake merged commit b9c504c into bitcoin:master May 14, 2020
@maflcko maflcko deleted the 2004-minerNoCrash branch May 14, 2020 12:43
@maflcko
Copy link
Member Author

maflcko commented May 14, 2020

For backport, only the first and last commit are needed.

@fanquake
Copy link
Member

For backport, only the first and last commit are needed.

Thanks. Will have this up in #18973 shortly.

@fjahr
Copy link
Contributor

fjahr commented May 14, 2020

post merge code review ACK 7777f2a

fanquake pushed a commit to fanquake/bitcoin that referenced this pull request May 14, 2020
This commit is (intentionally) adding a broken test. The test is broken
because it registering a subscriber object that can go out of scope
while events are still being sent.

To run the broken test and reproduce the bug:
  - Remove comment /** and */
  - ./configure --with-sanitizers=address
  - export ASAN_OPTIONS=detect_leaks=0
  - make
  - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done

Github-Pull: bitcoin#18742
Rebased-From: fab6d06
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request May 14, 2020
This is achieved by switching to a shared_ptr.

Also, switch the validationinterfaces in the tests to use shared_ptrs
for the same reason.

Github-Pull: bitcoin#18742
Rebased-From: 7777f2a
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request May 15, 2020
This commit is (intentionally) adding a broken test. The test is broken
because it registering a subscriber object that can go out of scope
while events are still being sent.

To run the broken test and reproduce the bug:
  - Remove comment /** and */
  - ./configure --with-sanitizers=address
  - export ASAN_OPTIONS=detect_leaks=0
  - make
  - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done

Github-Pull: bitcoin#18742
Rebased-From: fab6d06
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request May 15, 2020
This is achieved by switching to a shared_ptr.

Also, switch the validationinterfaces in the tests to use shared_ptrs
for the same reason.

Github-Pull: bitcoin#18742
Rebased-From: 7777f2a
maflcko pushed a commit that referenced this pull request May 15, 2020
245c862 test: disable script fuzz tests (fanquake)
9a8fb4c fuzz: Remove enumeration of expected deserialization exceptions in ProcessMessage(...) fuzzer (practicalswift)
6161c94 [net processing] Only send a getheaders for one block in an INV (John Newbery)
cf2a6e2 test: Remove const to work around compiler error on xenial (Wladimir J. van der Laan)
cc7d344 miner: Avoid stack-use-after-return in validationinterface (MarcoFalke)
37a6207 test: Add unregister_validation_interface_race test (MarcoFalke)
ff4dc20 gui: Fix manual coin control with multiple wallets loaded (João Barbosa)
ed0afe8 test: Add test for conflicted wallet tx notifications (Russell Yanofsky)
251e321 rpc: Relock wallet only if most recent callback (João Barbosa)
ca4dac4 rpc: Add mutex to guard deadlineTimers (João Barbosa)
a3fe458 [docs] Improve commenting in ProcessGetData() (John Newbery)
011532e [test] test that an invalid GETDATA doesn't prevent processing of future messages (Amiti Uttarwar)
1e73d72 [net processing] ignore unknown INV types in GETDATA messages (Amiti Uttarwar)
fb82173 [net processing] ignore tx GETDATA from blocks-only peers (Amiti Uttarwar)
315ae14 gui: Fix itemWalletAddress leak when not tree mode (João Barbosa)

Pull request description:

  Backports the following PRs to the 0.20 branch:

  * #18578: gui: Fix leak in CoinControlDialog::updateView
  * #18808: [net processing] Drop unknown types in getdata
  * #18814: rpc: Relock wallet only if most recent callback
  * #18878: test: Add test for conflicted wallet tx notifications
  * #18894: gui: Fix manual coin control with multiple wallets loaded
  * #18742: miner: Avoid stack-use-after-return in validationinterface
  * #18962: net processing: Only send a getheaders for one block in an INV
  * #18975: test: Remove const to work around compiler error on xenial

ACKs for top commit:
  promag:
    Tested ACK 245c862 coin control with multiple wallets.
  laanwj:
    ACK 245c862
  MarcoFalke:
    ACK 245c862 solved the conflicts myself as a sanity check. Did not re-review 🍷

Tree-SHA512: 285e5a5fad5bbeba6032742c65dc68836e8eccfcceda9e69fec4ddd162a3f61679a96f9bbe3d434267835af67c21ac4c05accf6f63e827c2eb47203c6daafe31
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 28, 2021
Summary:
This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [1/3]
bitcoin/bitcoin@fa770ce

Test Plan: `ninja`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D9103
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 28, 2021
…hValidationInterfaceQueue

Summary:
For the purpose of this test the two have the same outcome, but this one
is shorter and avoids a sleep for 0.1 seconds.

This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [2/3]
bitcoin/bitcoin@fa5ceb2

Depends on D9103

Test Plan: `ninja check`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D9104
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 28, 2021
Summary:
This is a squash of 2 commits, to add the unbroken test and the solution
at the same time, and a revert of D5140 (different solution to the same problem)

> test: Add unregister_validation_interface_race test
bitcoin/bitcoin@fab6d06

> miner: Avoid stack-use-after-return in validationinterface
>
> This is achieved by switching to a shared_ptr.
>
> Also, switch the validationinterfaces in the tests to use shared_ptrs
> for the same reason.

bitcoin/bitcoin@7777f2a

This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [3/3]

This replaces the solution in D5140
See comment https://reviews.bitcoinabc.org/D5140#124338

Depends on D9104

Test Plan:
Test suggested by this PR:
```
export ASAN_OPTIONS=detect_leaks=0
ninja && ninja check
while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done
```

Test from D5140:
```
CC=clang CXX=clang++ cmake -GNinja .. -DCMAKE_BUILD_TYPE=Debug
ninja check-extended
```

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D9105
backpacker69 referenced this pull request in peercoin/peercoin Mar 28, 2021
This commit is (intentionally) adding a broken test. The test is broken
because it registering a subscriber object that can go out of scope
while events are still being sent.

To run the broken test and reproduce the bug:
  - Remove comment /** and */
  - ./configure --with-sanitizers=address
  - export ASAN_OPTIONS=detect_leaks=0
  - make
  - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no  ; do true; done

Github-Pull: #18742
Rebased-From: fab6d06
backpacker69 referenced this pull request in peercoin/peercoin Mar 28, 2021
This is achieved by switching to a shared_ptr.

Also, switch the validationinterfaces in the tests to use shared_ptrs
for the same reason.

Github-Pull: #18742
Rebased-From: 7777f2a
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants