test: Fix and clean p2p_invalid_messages functional tests #19177

troygiorshev · 2020-06-05T13:20:33Z

This PR touches only the p2p_invalid_messages.py functional test module. There are two main goals accomplished here. First, it fixes the "invalid message size" test, which previously made a message that was invalid for multiple reasons. Second, it refactors the file into a single consistent style. This file appears to have originally had two authors, with different styles and some test duplication.

It should now be easier and quicker to understand this module, anticipating the upcoming BIP324 and AltNet changes.

This should probably go in ahead of #19107, but the two are not strictly related.

maflcko

Thanks for splitting out the test changes. Though I am not sure about removing the _tweak_msg_data_size without replacement.
We should test the following cases:

header indicates size a, but payload is of size a-1
header indicates size a, but payload is of size a+1

Is this tested anywhere else?

Concept ACK 2719499 📇

Show signature and timestamp

Signature:

Timestamp of file with hash d78ad05b778745908e2ab5b35cdd924d160c6acceba435f073770d6a7bdc6aac -

test/functional/p2p_invalid_messages.py

jnewbery · 2020-06-05T22:00:36Z

@MarcoFalke

We should test the following cases:

header indicates size a, but payload is of size a-1

header indicates size a, but payload is of size a+1

These aren't strictly header errors. In (1), the message won't finish being read, because the deserializer is waiting for another byte. When the following message is received:

the first byte of the header will be read as the final byte of the payload of the previous message
that payload may or may not succeed deserialization and processing in net_processing
the following message header will fail deserialization because it's misaligned. It'll fail the message start bytes check and be disconnected.

In (2), the following message header will be misaligned for similar reasons and will result in disconnect.

In both cases, it's the subsequent message that causes the disconnect. I don't think we necessarily need to test this, because invalid headers are tested elsewhere.

Test 1 is a duplicate of test_size() later in the file. Inexplicably, this test does not work on macOS, whereas test_size() does. Test 2 is problematic for two reasons. First, it always fails with an invalid checksum, which is probably not what was intended. Second, it's not defined at this layer what the behavior should be. Hypothetically, if this test was fixed so that it gave messages with valid checksums, then the message would pass successfully thought the network layer and fail only in the processing layer. A priori the network layer has no idea what the size of a message "actually" is. The "Why does behavior change at 78 bytes" is because of the following: print(len(node.p2p.build_message(msg))) # 125 => Payload size = 125 - 24 = 101 If we take 77 bytes, then there are 101 - 77 = 24 left That's exactly the size of a header So, bitcoind deserializes the header and rejects it for some other reason (Almost always an invalid size (too large)) But, if we take 78 bytes, then there are 101 - 78 = 23 left That's not enough to fill a header, so the socket stays open waiting for more data. That's why we sometimes have to push additional data in order for the peer to disconnect. Additionally, both of these tests use the "conn" variable. For fun, go look at where it's declared. (Hint: test_large_inv(). Don't we all love python's idea of scope?)

As well, this renames those variables to match PEP8 and this clears up the comment relating to VALID_DATA_LIMIT. Admittedly, this commit is mainly to make the following ones cleaner.

This test originally made a message with an invalid stated length, and an invalid checksum. This was because only the header was changed, but the checksum stayed the same. This was fine for now because we check the header first to see if it has a valid stated size, and we disconnect if it does not, so we never end up checking for the checksum. If this behavior was to change, this test would become a problem. (Indeed I discovered this when playing around with this behavior). By instead creating a message with an oversized payload from the start, we create a message with an invalid stated length but a valid checksum, as intended. Additionally, this takes advantage to the newly module-global VALID_DATA_LIMIT as opposed to the magic 0x02000000. Yes, 4MB < 32MiB, but at the moment when receiving a message we check both, so this makes the test tighter.

troygiorshev · 2020-06-08T05:00:47Z

Rebased and made suggested changes. Will continue thinking about what the correct test should be for a malicious indicated size.

test/functional/p2p_invalid_messages.py

This is a simple refactor of the specified test. It is now brought in line with the rest of the tests in the module. This should make things easier to debug, as all of the tests are now grouped together at the top.

maflcko · 2020-06-08T12:59:48Z

I don't see the downside of having a test that does:

Send a corrupt msg with wrongly indicates size (off by one in either way)
Send a valid msg and observe a disconnect

It should be possible to write such a test in 5-10 lines of python. Is there any reason I am missing that such a test would be inappropriate?

troygiorshev · 2020-06-08T17:03:13Z

Regarding the situation where the payload is too large, consider the following test. A node can not tell the difference between these two "messages", they're equivalent. As opposed to testing for an oversized payload, we should test separately for:

Checking that a node responds well to a message with an invalid checksum
Checking that a node responds well to <24 bytes of complete junk

Sending >=24 bytes of junk (enough to fill a header or more) is covered by the rest of the tests in this module.

The situation where the payload is too small is effectively the same. The beginning of the next message will be lost, and we'll be back to receiving junk (the rest of the next message).

An incorrect payload size is IMO too high up the chain of events to be included. It's what results from an incorrect payload size that we should be checking for.

def test_equivalent_large(self):
    '''Make a message with an oversized payload
    Or, make a smaller message with the wrong checksum, then send a junk byte
    '''
    conn = self.nodes[0].add_p2p_connection(P2PDataStore())
    msg_base = msg_generic(b'badmsg', data=b'd'*5)
    msg_1 = conn.build_message(msg_base)
    cut_len = (
        4 +  # magic
        12  # msgtype
    )
    msg_1 = msg_1[:cut_len] + struct.pack("<I", len(msg_base.data) - 1) + msg_1[cut_len + 4:]
    self.nodes[0].p2p.send_raw_message(msg_1)
    with self.nodes[0].assert_debug_log(['CHECKSUM ERROR (badmsg, 4 bytes), expected a359d53a was cdbcd284']):
        # The usual connected check no longer works because the messages are misaligned
        #conn.sync_with_ping(timeout=1)
        assert self.nodes[0].p2p.is_connected
        self.nodes[0].disconnect_p2ps()
    #
    # Equivalent
    #
    conn = self.nodes[0].add_p2p_connection(P2PDataStore())
    msg_base = msg_generic(b'badmsg', data=b'd'*(5-1))
    msg_1 = conn.build_message(msg_base)
    cut_len = (
        4 +  # magic
        12 +  # msgtype
        4 # size
    )
    msg_1 = msg_1[:cut_len] + b'\xcd\xbc\xd2\x84' + msg_1[cut_len + 4:]
    self.nodes[0].p2p.send_raw_message(msg_1)
    self.nodes[0].p2p.send_raw_message(b'd')
    with self.nodes[0].assert_debug_log(['CHECKSUM ERROR (badmsg, 4 bytes), expected a359d53a was cdbcd284']):
        # The usual connected check no longer works because the messages are misaligned
        #conn.sync_with_ping(timeout=1)
        assert self.nodes[0].p2p.is_connected
        self.nodes[0].disconnect_p2ps()

troygiorshev · 2020-06-08T21:00:31Z

Just for fun, I'll point out that we wont always disconnect on a corrupted payload size. I've made a proof of concept where the corruption neither results in a disconnect nor a misalignment. All it needs is two normal bitcoin messages and two single bit flips (one in the first message's payload size, and the other in the second message's command). It can be found here.

DrahtBot · 2020-06-10T02:12:59Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

p2p: Move all header verification into the network layer, extend logging #19107 (p2p: Refactor, move all header verification into the network layer, without changing behavior by troygiorshev)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Coverage

Coverage	Change (pull 19177, `d8ef251`)	Reference (master, `19e9192`)
Lines	+0.0373 %	90.6714 %
Functions	+0.0000 %	85.6033 %
Branches	+0.0194 %	51.9388 %

^{Updated at: 2020-06-12T18:13:45.902164.}

jnewbery · 2020-06-12T15:13:47Z

ACK af2a145

This is a good change and removes a lot of confused and confusing test code that isn't testing what it purports to.

Future PRs could clean this test further:

the "resource_exhaustion" test should be removed/renamed from resource exhaustion. This was originally added to verify that receiving large messages wouldn't result in memory blowup, but that was a bad idea (tests: add invalid P2P message tests #14522 (comment)) and had to be removed.
the "even though the node is being hammered by nonsense from one connection, it can still service other peers in a timely way." comment is wrong. All outgoing messages from the Python message thread are serial, so by the time we get to sending the pings, all of the previous large messages have been processed already.
I can't see any reason for launching a new async task to test the magic bytes. Why not just send a message with invalid magic bytes using send_raw_message? I see this was merged without review (qa: Make swap_magic_bytes in p2p_invalid_messages atomic #15697) and there's no indication of why it's needed.
the msg_unrecognized class can be removed and replaced with a msg_generic. The serialized length of a msg_unrecognized message seems to have caused confusion in the past (see VALID_DATA_LIMIT ) because people haven't realised the length of the length field for a serialized string

EDIT: struck out point 2. It was untrue as pointed out by Marco below.

maflcko · 2020-06-12T17:32:15Z

all of the previous large messages have been processed already.

I don't think this is true. At least on travis the test failed when the ping was sent on a different connection. See #19177 (comment)

jnewbery · 2020-06-12T18:22:47Z

I don't think this is true. At least on travis the test failed when the ping was sent on a different connection. See #19177 (comment)

You're right. I was wrong. When I run this locally, the message sending appears to be serial because bitcoind is able to keep up with the messages as quickly as they're delivered. When bitcoind is running more slowly, such as under sanitizers, it can't keep up and Python's asyncio is able to write to both the sockets for conn and conn2 while bitcoind is processing the large invalid messages. See https://travis-ci.org/github/bitcoin/bitcoin/jobs/695869826#L3577 for example.

maflcko · 2020-06-12T18:33:43Z

I don't think we necessarily need to test this, because invalid headers are tested elsewhere.

Where?

In general, I still don't understand why it is ok to remove test coverage without any replacement. We have never done this in the past and I don't see why this pull should be an exception. Don't get me wrong, I like the cleanups here and I'd like to see it merged, but I think this should and can be done without removing coverage.

If the functional tests are the wrong place to put such a test, great then move it to the unit tests, but simply removing should be the option of last resort.

For reference (master):

this pull:

maflcko · 2020-06-12T19:21:50Z

Sorry, false alarm. The "red" line will only become "blue" when the socket buffer is filled in two chunks or more. This might happen non-deterministically based on network-congestion on the test machine and has nothing to do with the changes in this pull request.

maflcko · 2020-06-12T19:29:33Z

re-ACK af2a145 🍦

Show signature and timestamp

Signature:

Timestamp of file with hash c5c47520e34b1feab503ae488c8195cfa5311aa8dfb1311b7826eb27cebc4497 -

jnewbery · 2020-06-13T02:57:35Z

I can't see any reason for launching a new async task to test the magic bytes. Why not just send a message with invalid magic bytes using send_raw_message?

Done in #19264

…nal tests af2a145 Refactor resource exhaustion test (Troy Giorshev) 5c4648d Fix "invalid message size" test (Troy Giorshev) ff1e7b8 Move size limits to module-global (Troy Giorshev) 57890ab Remove two unneeded tests (Troy Giorshev) Pull request description: This PR touches only the p2p_invalid_messages.py functional test module. There are two main goals accomplished here. First, it fixes the "invalid message size" test, which previously made a message that was invalid for multiple reasons. Second, it refactors the file into a single consistent style. This file appears to have originally had two authors, with different styles and some test duplication. It should now be easier and quicker to understand this module, anticipating the upcoming [BIP324](bitcoin#18242) and [AltNet](bitcoin#18989) changes. This should probably go in ahead of bitcoin#19107, but the two are not strictly related. ACKs for top commit: jnewbery: ACK af2a145 MarcoFalke: re-ACK af2a145 🍦 Tree-SHA512: 9b57561e142c5eaefac5665f7355c8651670400b4db1a89525d2dfdd20e872d6873c4f6175c4222b6f5a8e5210cf5d6a52da69b925b673a2e2ac30a15d670d1c

…tests Summary: af2a145e575f23c64909e6cf1fb323c603bda7ca Refactor resource exhaustion test (Troy Giorshev) 5c4648d17ba18e4194959963994cc6b37053f127 Fix "invalid message size" test (Troy Giorshev) ff1e7b884447a5ba10553b2d964625f94e255bdc Move size limits to module-global (Troy Giorshev) 57890abf2c7919eddfec36178b1136cd44ffe883 Remove two unneeded tests (Troy Giorshev) Pull request description: This PR touches only the p2p_invalid_messages.py functional test module. There are two main goals accomplished here. First, it fixes the "invalid message size" test, which previously made a message that was invalid for multiple reasons. Second, it refactors the file into a single consistent style. This file appears to have originally had two authors, with different styles and some test duplication. It should now be easier and quicker to understand this module, anticipating the upcoming [BIP324](bitcoin/bitcoin#18242) and [AltNet](bitcoin/bitcoin#18989) changes. This should probably go in ahead of #19107, but the two are not strictly related. --- Depends on D9511 Backport of [[bitcoin/bitcoin#19177 | core#19177]] Test Plan: ninja all && ./test/functional/test_runner.py p2p_invalid_messages Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Subscribers: PiRK Differential Revision: https://reviews.bitcoinabc.org/D9519

…t framework update) 7b04c0a [Test] Clean duplicate connections creation in mining_pos_coldStaking.py (furszy) 15a799e [Test] MAX_PROTOCOL_MESSAGE_LENGTH PIVXified. (furszy) 0aedf35 [tests] Don't import asyncio to test magic bytes (John Newbery) 9bb5309 Refactor resource exhaustion test (Troy Giorshev) 589a780 Fix "invalid message size" test (Troy Giorshev) 8a0c12b Move size limits to module-global (Troy Giorshev) b3391c5 Remove two unneeded tests (Troy Giorshev) 1404e68 test: Add various low-level p2p tests (furszy) 8aaf7e1 test: Remove fragile assert_memory_usage_stable (MarcoFalke) f68e22c [Test] p2p_invalid_messages.py do not change msg.command in test_command and raise sync_with_ping timeout (furszy) c851c0b Test framework: Wait for verack by default on every new connection (furszy) c02e9a0 [Test] move framework subversion to string (furszy) 3472a39 Replace coroutine with async def in p2p_invalid_messages.py (Elichai Turkel) 33c7b19 net_processing: align msg checksum check properly. (furszy) 7f71b1b Hash P2P messages as they are received instead of at process-time (furszy) 215a638 test: Skip flaky p2p_invalid_messages test on macOS (Fabian Jahr) c11f565 qa: Make swap_magic_bytes in p2p_invalid_messages atomic (MarcoFalke) be979ad test: Fix race in p2p_invalid_messages (MarcoFalke) 6a72f0c qa: Add tests for invalid message headers (MarcoFalke) 51ddd3d Introduce p2p_invalid_messages.py test (furszy) 55a37b5 net: fix missing jump line in "Oversized message from peer.." error. (furszy) 0edfce5 test_node: get_mem_rss fixups (MarcoFalke) 6f21213 tests: add P2PConnection.send_raw_message (James O'Beirne) ae68c6e tests: add utility to assert node memory usage hasn't increased (James O'Beirne) 8469afe test: forward timeouts properly in send_blocks_and_test (James O'Beirne) db28a53 Skip is_closing() check when not available. (Daniel Kraft) be9dacb tests: fixes mininode's P2PConnection sending messages on closing transport (marcoagner) 53599c8 qa: Avoid start/stop of the network thread mid-test (furszy) 688190c [qa] mininode: Expose connection state through is_connected (MarcoFalke) Pull request description: Part of the deep and long net and ser work that I'm doing (and Tor v3 onion addresses support). Friend of #2359. Focused on the end goal of implementing the `p2p_invalid_messages` functional test which validates that invalid msg headers, msg types, oversized payloads and inventory msgs aren't accepted nor can cause a resource exhaustion. And an extra covered scenario, in `p2p_invalid_tx.py`, for the orphan pool overflow. Plus, to get up to the goal, had to work over the functional test framework as well. So.. adapted list: * bitcoin#9045. * bitcoin#13512. * bitcoin#13517. * bitcoin#13715. * bitcoin#13747. * bitcoin#14456. * bitcoin#14522. * bitcoin#14672. * bitcoin#14693. * bitcoin#14812. * bitcoin#15246. * bitcoin#15330. * bitcoin#15697. * bitcoin#16445. * bitcoin#17931. * bitcoin#17469. * bitcoin#18628 (only `p2p_invalid_tx.py` and `p2p_invalid_messages.py`. We don't support the other tests yet). * bitcoin#19177. * bitcoin#19264. ACKs for top commit: random-zebra: utACK 7b04c0a and merging... Tree-SHA512: 48d1f1a6acc24a9abce033fbf1f281ba4392147da39d118a694a09d63c3e0610cc1a29d711b434b16cc0d0e030dd9f1a89843870091b6999b862b9ab18a20679

fanquake added the Tests label Jun 5, 2020

maflcko changed the title ~~p2p: Fix and clean p2p_invalid_messages functional tests~~ test: Fix and clean p2p_invalid_messages functional tests Jun 5, 2020

maflcko reviewed Jun 5, 2020

View reviewed changes

test/functional/p2p_invalid_messages.py Outdated Show resolved Hide resolved

test/functional/p2p_invalid_messages.py Outdated Show resolved Hide resolved

test/functional/p2p_invalid_messages.py Outdated Show resolved Hide resolved

troygiorshev force-pushed the p2p-refactor-fix-tests branch 2 times, most recently from eaaa04e to da3a874 Compare June 8, 2020 04:46

troygiorshev added 3 commits June 8, 2020 00:54

Move size limits to module-global

ff1e7b8

As well, this renames those variables to match PEP8 and this clears up the comment relating to VALID_DATA_LIMIT. Admittedly, this commit is mainly to make the following ones cleaner.

troygiorshev force-pushed the p2p-refactor-fix-tests branch from da3a874 to f7a2ef0 Compare June 8, 2020 04:57

maflcko reviewed Jun 8, 2020

View reviewed changes

test/functional/p2p_invalid_messages.py Outdated Show resolved Hide resolved

Refactor resource exhaustion test

af2a145

This is a simple refactor of the specified test. It is now brought in line with the rest of the tests in the module. This should make things easier to debug, as all of the tests are now grouped together at the top.

troygiorshev force-pushed the p2p-refactor-fix-tests branch from f7a2ef0 to af2a145 Compare June 8, 2020 12:52

gabbieian38 approved these changes Jun 8, 2020

View reviewed changes

troygiorshev mentioned this pull request Jun 9, 2020

p2p: Move all header verification into the network layer, extend logging #19107

Merged

maflcko merged commit f154071 into bitcoin:master Jun 12, 2020

furszy mentioned this pull request Jun 4, 2021

[Test] Add p2p invalid messages functional test (and test framework update) PIVX-Project/PIVX#2410

Merged

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

test: Fix and clean p2p_invalid_messages functional tests #19177

test: Fix and clean p2p_invalid_messages functional tests #19177

Uh oh!

Conversation

troygiorshev commented Jun 5, 2020

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnewbery commented Jun 5, 2020

Uh oh!

troygiorshev commented Jun 8, 2020

Uh oh!

Uh oh!

maflcko commented Jun 8, 2020

Uh oh!

troygiorshev commented Jun 8, 2020

Uh oh!

troygiorshev commented Jun 8, 2020

Uh oh!

DrahtBot commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Coverage

Uh oh!

jnewbery commented Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Jun 12, 2020

Uh oh!

jnewbery commented Jun 12, 2020

Uh oh!

maflcko commented Jun 12, 2020

Uh oh!

maflcko commented Jun 12, 2020

Uh oh!

maflcko commented Jun 12, 2020

Uh oh!

jnewbery commented Jun 13, 2020

Uh oh!

Uh oh!

DrahtBot commented Jun 10, 2020 •

edited

Loading

jnewbery commented Jun 12, 2020 •

edited

Loading