Do not answer GETDATA for to-be-announced tx #18861

sipa · 2020-05-04T07:20:31Z

This PR intends to improve transaction-origin privacy.

In general, we should try to not leak information about what transactions we have (recently) learned about before deciding to announce them to our peers. There is a controlled transaction dissemination process that reveals our transactions to peers that has various safeguards for privacy (it's rate-limited, delayed & batched, deterministically sorted, ...), and ideally there is no way to test which transactions we have before that controlled process reveals them. The handling of the mempool BIP35 message has protections in this regard as well, as it would be an obvious way to bypass these protections (handled asynchronously after a delay, also deterministically sorted).

However, currently, if we receive a GETDATA for a transaction that we have not yet announced to the requester, we will still respond to it if it was announced to some other peer already (because it needs to be in mapRelay, which only happens on the first announcement). This is a slight privacy leak.

Thankfully, this seems easy to solve: setInventontoryTxToSend keeps track of the txids we have yet to announce to a peer - which almost(*) exactly corresponds to the transactions we know of that we haven't revealed to that peer. By checking whether a txid is in that set before responding to a GETDATA, we can filter these out.

(*) Locally resubmitted or rebroadcasted transactions may end up in setInventoryTxToSend while the peer already knows we have them, which could result in us incorrectly claiming we don't have such transactions if coincidentally requested right after we schedule reannouncing them, but before they're actually INVed. This is made even harder by the fact that filterInventoryKnown will generally keep known reannouncements out of setInventoryTxToSend unless it overflows (which needs 50000 INVs in either direction before it happens).

The condition for responding now becomes:

  (not in setInventoryTxToSend) AND
  (
    (in relay map) OR
    (
      (in mempool) AND
      (old enough that it could have expired from relay map) AND
      (older than our last getmempool response)
    )
  )

DrahtBot · 2020-05-04T07:52:14Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

laanwj · 2020-05-04T13:35:27Z

Concept ACK

practicalswift · 2020-05-04T14:03:45Z

Concept ACK

amitiuttarwar · 2020-05-04T18:16:59Z

Concept ACK. code looks reasonable, just checks for presence on setInventoryTxToSend before sending the TX. I'll review in more depth later this week, but one piece of feedback for the description

I got a bit confused by this part:

I believe it is however still possible to GETDATA a transaction that we have just learned about (from another peer, or from our local wallet), and answer it before we have announced it to the requesting peer.

An attempt to reword based on my understanding:
if we recently learned about a transaction & a peer sends us a GETDATA before we announce it to them, we should not fulfill the request. That is a privacy leak.

luke-jr

Concept ACK, but I think this implementation is buggy:

luke-jr · 2020-05-04T20:06:32Z

src/net_processing.cpp

+                // Check if the requested transaction is so recent that we're just
+                // about to announce it to the peer; if so, they certainly shouldn't
+                // know we already have it.


setInventoryTxToSend isn't only for initial broadcasts, though. At least the RPC method sendrawtransaction can add a transaction to it that has already been relayed previously.

That's a good point.

One possibility is making setInventoryTxToSend track a boolean determining if it's a possible re-submission.

Another is just not caring. The entries in setInventoryTxToSend are filtered by filterInventoryKnown, so it won't contain anything we know the other party already knows about. Of course, that filter is limited in size, but with 50000 entries I believe it can't expire entries faster than after 47 minutes (50000 invs / 35 (invs / msg) * (2s / msg) = 2857s), long after the entries are gone from the relay map (from which they expire after 15 minutes).

In other words: this is worrying about the situation where (a) you announced an INV (b) at least 47 minutes passed, during which you respond to a BIP35 mempool request from the peer (c) you try to rebroadcast (d) in the few seconds after the rebroadcast is scheduled but before it's sent out, you receive a GETDATA.

naumenkogs · 2020-05-05T02:13:48Z

Concept ACK.

jnewbery · 2020-05-06T22:04:50Z

Concept ACK. I agree with @amitiuttarwar that the PR description is confusing (you switch the perspective of 'we' being the node sending the GETDATA and 'we' being the node receiving it). PR descriptions end up in the merge commit, so it'd be good to fix that up.

I think that rather than adding another bool to the already complex ProcessGetData which only exists to skip over a code block, it'd be clearer to refactor the getdata tx processing into its own function. That turns your change into a two line fix.

I've done that here: https://github.com/jnewbery/bitcoin/tree/pr18861.1. Feel free to use it if you think it's useful.

(The function signature for ProcessGetTransactionData() isn't great, and I'd prefer to send individual NOTFOUND messages within that function instead of batching them, but that's no longer a pure refactor and could be done later.)

sipa · 2020-05-07T08:30:16Z

(The function signature for ProcessGetTransactionData() isn't great, and I'd prefer to send individual NOTFOUND messages within that function instead of batching them, but that's no longer a pure refactor and could be done later.)

It would be unfortunate to make the protocol implementation less efficient just because it results in slightly cleaner code. How about an alternative, where a helper function is introduced that just returns true or false, which ProcessGetData then calls to determine which request should be turned into a TX, and which batched into NOTFOUND (where the actual construction/sending of those messages remains in ProcessGetData)?

jnewbery · 2020-05-07T16:16:10Z

(apologies that this conversation has wandered off-topic for the fix in this PR)

It would be unfortunate to make the protocol implementation less efficient just because it results in slightly cleaner code.

Let me turn that around, and say "it would be unfortunate to make less clean code, in order to make the wire protocol infinitesimally more efficient."

Remember that the notfound path is the rare, failure case. How rare? Here are some stats from my node's first two peers:

> bcli getpeerinfo
[
  {
    "id": 0,
    [...]
    "bytessent_per_msg": {
      [...]
      "notfound": 122,
      [...]
      "tx": 212229,
    },
    "bytesrecv_per_msg": {
      [...]
      "notfound": 427,
      [...]
      "tx": 31755274,
    }
  },
  {
    "id": 1,
    [...]
    "bytessent_per_msg": {
      [...]
      "notfound": 1098,
      [...]
      "tx": 108708,
    },
    "bytesrecv_per_msg": {
      [...]
      "notfound": 926,
      [...]
      "tx": 93436726,
    }
  },
[...]

Almost all notfound messages are 61 bytes (24 bytes header + 37 bytes data). Let's assume that an average tx message is 250 bytes for header+data.

That means for my first two peers I sent/received ~42 notfound messages and sent/received ~500,000 tx messages.

Looking through my debug logs, the vast majority of notfound messages are 37 bytes, indicating a single item. So, for almost all calls of ProcessGetData(), there are zero or one notfound items, and batching has no effect on wire efficiency. Batching notfounds would save <100 messages per day for me (run grep "sending notfound" .bitcoin/debug.log | grep -v "37 bytes" to see for yourself).

In the adversarial case, it's more expensive for us to serve transactions than notfounds, so being more efficient doesn't protect us from any attacks.

Sending notfounds individually also has the nice effect of making getdata responses serial in the order we received them.

How about [...] a helper function [...] that just returns true or false

I think that's better than the status quo, but I prefer delgating out to a function to do all the transaction GETDATA processing and message sending, so that it matches the responsibilities of ProcessGetBlockData() exactly.

sipa · 2020-05-07T18:28:43Z

@jnewbery I've abstracted out a function for determining what tx requests to respond to. We can talk about further abstractions later.

@amitiuttarwar I've also rewritten the PR description with a bunch of background information, and summarized the issue @luke-jr's brought up above.

jnewbery

utACK a00dae0

src/net_processing.cpp

jnewbery · 2020-05-08T01:27:45Z

utACK 353a391

naumenkogs · 2020-05-08T13:41:49Z

utACK 353a391
I like that we got rid of the push variable, looks cleaner now.

Do you think there can be timing analysis, to distinguish "it's in the setInventoryTxToSend, not sharing with you" and "I couldn't find it in the mempool".
Maybe I'm overthinking and mempool lookup is fast enough to not make any difference (should be same order as internet latencies or lower), but is that true for full mempools?

If really only applies to txs left MapRelay (but still in the mempool. (because I assume MapRelay lookups are very fast, unlike maybe mempool lookups).

jnewbery · 2020-05-08T15:25:25Z

Maybe I'm overthinking and mempool lookup is fast enough to not make any difference (should be same order as internet latencies or lower), but is that true for full mempools?

If really only applies to txs left MapRelay (but still in the mempool. (because I assume MapRelay lookups are very fast, unlike maybe mempool lookups).

Both mapRelay and mempool lookups should be fast. mapRelay is a map, so lookups are O(logn), but the map is small. Mempool lookups are by an unordered map of txid, so are O(1). We could make mapRelay an unordered map for O(1) lookups, but I think we probably just want to remove it entirely eventually (#17303).

amitiuttarwar

code review ACK 353a391

a couple non-blocking nits / questions. probably not worth fixing unless you touch the code for other reasons
definitely like the code cleanup, its easier to reason about with less nested conditionals (although still a bit tricky.. I can't wait to get rid of mapRelay! )
checked all the code paths of how a transaction could end up on setInventoryTxToSend
thought through possible unexpected cases that would lead to us incorrectly claiming we don't have the transaction. agree that the likelihood is minuscule & don't think the consequence would be dramatically negative, so the benefit far outweighs.
nice PR description :)

thanks!

src/net_processing.cpp

jnewbery · 2020-05-11T18:29:37Z

This probably about ready for merge now, so no need to make any changes now, but would it make sense to just take cs_main inside FindTxForGetData()? It would mean taking and releasing the lock multilple times, but would avoid holding onto it while serializing and sending the TX message.

jonatack

ACK 353a391

I agree with Amiti's and John's latest comments for a follow-up or happy to re-ACK.

sipa · 2020-05-12T20:31:27Z

@jnewbery Done. Added a middle commit (review with -w) that pushes the cs_main lock down.

Additionally, got rid of the Optional; CTransactionRefs can already store a nullptr.

jonatack · 2020-05-12T22:03:27Z

re-ACK 2b3f101

naumenkogs · 2020-05-12T22:14:06Z

The solution doesn't help if an attacker connects to us right after we put the transaction in MapRelay (by announcing to someone else).

I think the fix should be made as a follow-up. We discussed several alternatives with @sipa and it's not a super-trivial fix to add it here right away.

This PR is still an improvement over existing behavior (now an attacker has to reconnect), and a nice refactor.
utACK 2b3f101

sipa · 2020-05-12T22:21:57Z

Good find, @naumenkogs.

My thinking is that the best solution is actually introducing a separate rolling bloom filter (like filterInventoryKnown) for just outgoing announcements. It can be much smaller (we control the rate we announce things at, so given a limit of say 15 minutes to respond, there is an upper bound on the size), and then only permit GETDATA'ing things in that filter. I believe that can replace the setInventoryTxToSend test introduced here (while solving @luke-jr's concern above). It would also make me more comfortable with dropping mapRelay (after fixing the rescheduling of downloads after a NOTFOUND).

I think the changes here are still an improvement, so I suggest going ahead with this if reviewers agree, and work on the above as a follow-up.

naumenkogs · 2020-05-12T22:35:37Z

utACK 2896c41

jonatack · 2020-05-12T22:53:27Z

ACK 2896c41 per git diff 2b3f101 2896c41 only change since previous review is moving the recency check up to be verified first in FindTxForGetData, as it was originally in 353a391 (good catch), before looking up the transaction in the relay pool.

sipa · 2020-05-12T22:58:58Z

@jonatack It was originally in the right place, I accidentally moved it in the push-cs_main-down change, which @naumenkogs discovered, so I quickly pushed again. Happy to see you and him discovering that; that's a good sign for our review process.

jnewbery · 2020-05-13T00:51:49Z

code review ACK 2896c41

Very nice cleanup :)

amitiuttarwar · 2020-05-14T21:07:51Z

src/net_processing.cpp

@@ -1608,6 +1608,37 @@ void static ProcessGetBlockData(CNode* pfrom, const CChainParams& chainparams, c
    }
 }

+//! Determine whether or not a peer can request a transaction, and return it (or nullptr if not found or not allowed).
+CTransactionRef static FindTxForGetData(CNode* peer, const uint256& txid, const std::chrono::seconds mempool_req, const std::chrono::seconds longlived_mempool_time) LOCKS_EXCLUDED(cs_main)


I'm trying to better understand the lock enforcements & have some questions..

I noticed that ProcessGetData has both LOCKS_EXCLUDED thread safety annotation as well as an AssertLockNotHeld check. I suspect its a product of development over time & is now redundant, but want to confirm my understanding.

LOCKS_EXCLUDED enables clang to do static analysis when we compile with the -Wthread-safety option. Does the travis build that says sanitizers: thread (TSan) run with this option?

I was trying to understand the limitations of the clang analysis. All I found was a lack of negative capabilities (aka can't check if a function doesn't have the excluded annotation), which doesn't seem relevant here.

And then AssertLockNotHeld() looks like our custom logic that would do runtime assertions.

From a kind-of-clumsy search through git history, it looks like AssertLockNotHeld was added to ProcessGetData before the LOCKS_EXCLUDED annotation was. But it looks like threadsafety.h has defined LOCKS_EXCLUDED since 2014. So I can't color the context of decisions.

To bring this back to this PR, my fundamental question is: what is the difference of guarantees between LOCKS_EXCLUDED and AssertLockNotHeld? Would there be a reason to add the second check as well?

They're somewhat orthogonal, I think.

Annotations:

[+] Is a compile-time check, and guarantee absence of issues in every possible code path

[-] Only works in clang

[-] Can't be used in some more advanced locking scenarios

Assertions:

[+] Works in GCC and Clang

[+] Isn't restricted to analyzable cases

[-] Is only a runtime check; it needs test cases that actually exercise the bug

[-] Needs building with -DDEBUG_LOCKORDER

Perhaps at some point it's time to evaluate if we can pick just one over the other (which I guess would be the annotations approach), but they're not exactly identical.

ok gotcha. thanks thats very helpful.

so assuming we regularly run the compile-time checks, for this specific case the guarantee from the annotations > guarantee from the assertion.

I don't see any travis jobs that are running -Wthread-safety (and since my previous comment learned that clang thread safety analysis is different than the clang thread sanitizer)

does that mean if somebody added code that acquired cs_main and invoked FindTxForGetData, we'd be relying on a reviewer to either notice or compile with -Wthread-safety to raise an error?

I do see travis jobs that pass through -DDEBUG_LOCKORDER, so assuming we have good functional test coverage, throwing the assertion could cause a travis failure in that circumstance?

It might not be super important / worth invalidating ACKs to add the additional check. The lock expectations have been made very apparent and if new code tried to call this function with the lock acquired, I think we'd catch it in the review process. But on the other hand if we have tooling that reduces dependency on careful review, might as well use it?

I'm about ready to ACK this PR, just would like to make sure my understanding is clear

I think configure.ac turns on -Wthread-safety-analysis and -Werror=thread-safety-analysis by default if the compiler supports it.

ah, thank you.

amitiuttarwar · 2020-05-14T22:33:47Z

code review ACK 2896c41

ajtowns · 2020-05-19T05:11:07Z

ACK 2896c41

fanquake · 2020-05-19T07:12:39Z

Thanks for following up @ajtowns.

2896c41 Do not answer GETDATA for to-be-announced tx (Pieter Wuille) f2f32a3 Push down use of cs_main into FindTxForGetData (Pieter Wuille) c6131bf Abstract logic to determine whether to answer tx GETDATA (Pieter Wuille) Pull request description: This PR intends to improve transaction-origin privacy. In general, we should try to not leak information about what transactions we have (recently) learned about before deciding to announce them to our peers. There is a controlled transaction dissemination process that reveals our transactions to peers that has various safeguards for privacy (it's rate-limited, delayed & batched, deterministically sorted, ...), and ideally there is no way to test which transactions we have before that controlled process reveals them. The handling of the `mempool` BIP35 message has protections in this regard as well, as it would be an obvious way to bypass these protections (handled asynchronously after a delay, also deterministically sorted). However, currently, if we receive a GETDATA for a transaction that we have not yet announced to the requester, we will still respond to it if it was announced to *some* other peer already (because it needs to be in `mapRelay`, which only happens on the first announcement). This is a slight privacy leak. Thankfully, this seems easy to solve: `setInventontoryTxToSend` keeps track of the txids we have yet to announce to a peer - which almost(*) exactly corresponds to the transactions we know of that we haven't revealed to that peer. By checking whether a txid is in that set before responding to a GETDATA, we can filter these out. (*) Locally resubmitted or rebroadcasted transactions may end up in setInventoryTxToSend while the peer already knows we have them, which could result in us incorrectly claiming we don't have such transactions if coincidentally requested right after we schedule reannouncing them, but before they're actually INVed. This is made even harder by the fact that filterInventoryKnown will generally keep known reannouncements out of setInventoryTxToSend unless it overflows (which needs 50000 INVs in either direction before it happens). The condition for responding now becomes: ``` (not in setInventoryTxToSend) AND ( (in relay map) OR ( (in mempool) AND (old enough that it could have expired from relay map) AND (older than our last getmempool response) ) ) ``` ACKs for top commit: naumenkogs: utACK 2896c41 ajtowns: ACK 2896c41 amitiuttarwar: code review ACK 2896c41 jonatack: ACK 2896c41 per `git diff 2b3f101 2896c41` only change since previous review is moving the recency check up to be verified first in `FindTxForGetData`, as it was originally in 353a391 (good catch), before looking up the transaction in the relay pool. jnewbery: code review ACK 2896c41 Tree-SHA512: e7d5bc006e626f60a2c108a9334f3bbb67205ace04a7450a1e4d4db1d85922a7589e0524500b7b4953762cf70554c4a08eec62c7b38b486cbca3d86321600868

narula · 2020-07-08T16:18:43Z

src/net_processing.cpp

+    // Process as many TX items from the front of the getdata queue as
+    // possible, since they're common and it's efficient to batch process
+    // them.
+    while (it != pfrom->vRecvGetData.end() && (it->type == MSG_TX || it->type == MSG_WITNESS_TX)) {


You are no longer holding cs_main. Why is it OK to access vRecvGetData here without a lock?

vRecvGetData isn't guarded by cs_main. In fact, it's not guarded by anything 😱

All access of vRecvGetData is in ProcessMessages(), which only ever runs single threaded in the Message Handler thread, so this is safe, but ideally it'd be guarded by some synchronization method in case that ever changes.

Right. Previous to this commit, the mempool, vRecvGetData, and m_tx_relay were all accessed atomically under cs_main. This changes that and uses finer-grained locking.

I think all the other accesses are safe but I wasn't certain about vRecvGetData.

@narula Nice catch. Indeed, as @jnewbery says, it isn't formally protected by anything right now. That's fine because (a) it's only accessed from one thread (b) every access to it actually holds cs_vRecv (just checked).

Could you point out where cs_vRecv is taken before vRecvGetData is accessed here? I only see it locked in two places in net.cpp, and I can't connect either of them to ProcessMessages. The non-stats one is in ReceiveMsgBytes.

~~net locks cs_vRecv and holds it while calling ProcessMessages. I checked by adding the annotation and recompiling.~~

EDIT: i'm wrong

ok! based on discussion in PR review club today I'll PR a change to guard it by cs_vRecv, and look at orphan_work_set too. To be clear: there's no data race now, but it's worth fixing up in case these fields are accessed from multiple threads in the future.

f32c408 Make sure unconfirmed parents are requestable (Pieter Wuille) c4626bc Drop setInventoryTxToSend based filtering (Pieter Wuille) 43f02cc Only respond to requests for recently announced transactions (Pieter Wuille) b24a17f Introduce constant for mempool-based relay separate from mapRelay caching (Pieter Wuille) a9bc563 Swap relay pool and mempool lookup (Pieter Wuille) Pull request description: This implements the follow-up suggested here: #18861 (comment) . Instead of checking `setInventoryTxToSend`, maintain an explicit bloom filter with the 3500 most recently announced invs, and permit fetching any of these as long as they're in the relay pool or the mempool. In addition, permit relay from the mempool after just 2 minutes instead of 15. This: * Fixes the brief opportunity an attacker has to request unannounced invs just after the connection is established (pointed out by naumenkogs, see #18861 (comment)). * Guarantees that locally resubmitted invs after `filterInventoryKnown` rolls over can still be requested (pointed out by luke-jr, see #18861 (comment)). It adds 37 KiB of filter per peer. This is also a step towards dropping the relay pool entirely and always relaying from the mempool directly (see #17303), but that is still blocked by dealing properly with NOTFOUNDs (see #18238). ACKs for top commit: jnewbery: reACK f32c408 jonatack: re-ACK f32c408 per `git range-diff f7c19e8 2da7ee3 f32c408` and redid the following: code review, thought about motivation, DoS and privacy aspects, debug build to check for warnings after updating Clang from 6 to 11 since last review. ajtowns: re-ACK f32c408 Tree-SHA512: aa05b9fd01bad59581c4ec91836a52d7415dc933fa49d4c4adced79aa25aaad51e11166357e8c8b29fbf6021a7401b98c21b850b5d8e8ad773fdb5d6608e1e85

f32c408 Make sure unconfirmed parents are requestable (Pieter Wuille) c4626bc Drop setInventoryTxToSend based filtering (Pieter Wuille) 43f02cc Only respond to requests for recently announced transactions (Pieter Wuille) b24a17f Introduce constant for mempool-based relay separate from mapRelay caching (Pieter Wuille) a9bc563 Swap relay pool and mempool lookup (Pieter Wuille) Pull request description: This implements the follow-up suggested here: bitcoin#18861 (comment) . Instead of checking `setInventoryTxToSend`, maintain an explicit bloom filter with the 3500 most recently announced invs, and permit fetching any of these as long as they're in the relay pool or the mempool. In addition, permit relay from the mempool after just 2 minutes instead of 15. This: * Fixes the brief opportunity an attacker has to request unannounced invs just after the connection is established (pointed out by naumenkogs, see bitcoin#18861 (comment)). * Guarantees that locally resubmitted invs after `filterInventoryKnown` rolls over can still be requested (pointed out by luke-jr, see bitcoin#18861 (comment)). It adds 37 KiB of filter per peer. This is also a step towards dropping the relay pool entirely and always relaying from the mempool directly (see bitcoin#17303), but that is still blocked by dealing properly with NOTFOUNDs (see bitcoin#18238). ACKs for top commit: jnewbery: reACK f32c408 jonatack: re-ACK f32c408 per `git range-diff f7c19e8 2da7ee3 f32c408` and redid the following: code review, thought about motivation, DoS and privacy aspects, debug build to check for warnings after updating Clang from 6 to 11 since last review. ajtowns: re-ACK f32c408 Tree-SHA512: aa05b9fd01bad59581c4ec91836a52d7415dc933fa49d4c4adced79aa25aaad51e11166357e8c8b29fbf6021a7401b98c21b850b5d8e8ad773fdb5d6608e1e85

…et with g_cs_orphans da0988d scripted-diff: rename vRecvGetData (Neha Narula) ba95181 Guard vRecvGetData (now in net processing) with its own mutex (Neha Narula) 2d9f2fc Move vRecvGetData to net processing (Neha Narula) 673247b Lock before checking if orphan_work_set is empty; indicate it is guarded (Neha Narula) 8803aee Move m_orphan_work_set to net_processing (Neha Narula) 9c47cb2 [Rename only] Rename orphan_work_set to m_orphan_work_set. (Neha Narula) Pull request description: Add annotations to guard `vRecvGetData` and `orphan_work_set` and fix up places where they were accessed without a lock. There is no current data race because they happen to be accessed by only one thread, but this might not always be the case. Original discussion: bitcoin/bitcoin#18861 (comment) ACKs for top commit: MarcoFalke: review ACK da0988d 🐬 jnewbery: Code review ACK da0988d hebasto: ACK da0988d, I have reviewed the code and it looks correct, I agree it can be merged. Tree-SHA512: 31cadd319ddc9273a87e77afc4db7339fd636e816b5e742eba5cb32927ac5cc07a672b2268d2d38a75a0f1b17d93836adab9acf7e52f26ea9a43f54efa57257e

…_work_set with g_cs_orphans da0988d scripted-diff: rename vRecvGetData (Neha Narula) ba95181 Guard vRecvGetData (now in net processing) with its own mutex (Neha Narula) 2d9f2fc Move vRecvGetData to net processing (Neha Narula) 673247b Lock before checking if orphan_work_set is empty; indicate it is guarded (Neha Narula) 8803aee Move m_orphan_work_set to net_processing (Neha Narula) 9c47cb2 [Rename only] Rename orphan_work_set to m_orphan_work_set. (Neha Narula) Pull request description: Add annotations to guard `vRecvGetData` and `orphan_work_set` and fix up places where they were accessed without a lock. There is no current data race because they happen to be accessed by only one thread, but this might not always be the case. Original discussion: bitcoin#18861 (comment) ACKs for top commit: MarcoFalke: review ACK da0988d 🐬 jnewbery: Code review ACK da0988d hebasto: ACK da0988d, I have reviewed the code and it looks correct, I agree it can be merged. Tree-SHA512: 31cadd319ddc9273a87e77afc4db7339fd636e816b5e742eba5cb32927ac5cc07a672b2268d2d38a75a0f1b17d93836adab9acf7e52f26ea9a43f54efa57257e

Summary: 2896c412fadbc03916a33028f4f50fd87ac48edb Do not answer GETDATA for to-be-announced tx (Pieter Wuille) f2f32a3dee9a965c8198f9ddd3aaebc627c273e4 Push down use of cs_main into FindTxForGetData (Pieter Wuille) c6131bf407c1ada78a0e5509a702bc7da0bfd57d Abstract logic to determine whether to answer tx GETDATA (Pieter Wuille) Pull request description: This PR intends to improve transaction-origin privacy. In general, we should try to not leak information about what transactions we have (recently) learned about before deciding to announce them to our peers. There is a controlled transaction dissemination process that reveals our transactions to peers that has various safeguards for privacy (it's rate-limited, delayed & batched, deterministically sorted, ...), and ideally there is no way to test which transactions we have before that controlled process reveals them. The handling of the `mempool` BIP35 message has protections in this regard as well, as it would be an obvious way to bypass these protections (handled asynchronously after a delay, also deterministically sorted). However, currently, if we receive a GETDATA for a transaction that we have not yet announced to the requester, we will still respond to it if it was announced to *some* other peer already (because it needs to be in `mapRelay`, which only happens on the first announcement). This is a slight privacy leak. Thankfully, this seems easy to solve: `setInventontoryTxToSend` keeps track of the txids we have yet to announce to a peer - which almost(*) exactly corresponds to the transactions we know of that we haven't revealed to that peer. By checking whether a txid is in that set before responding to a GETDATA, we can filter these out. (*) Locally resubmitted or rebroadcasted transactions may end up in setInventoryTxToSend while the peer already knows we have them, which could result in us incorrectly claiming we don't have such transactions if coincidentally requested right after we schedule reannouncing them, but before they're actually INVed. This is made even harder by the fact that filterInventoryKnown will generally keep known reannouncements out of setInventoryTxToSend unless it overflows (which needs 50000 INVs in either direction before it happens). The condition for responding now becomes: ``` (not in setInventoryTxToSend) AND ( (in relay map) OR ( (in mempool) AND (old enough that it could have expired from relay map) AND (older than our last getmempool response) ) ) ``` --- Backport of Core [[bitcoin/bitcoin#18861 | PR18861]] notes: * uses references to CNode rather than pointers as in the original PR, see D8480 * changed the CNode* parameter in FindTxForGet Data to const CNode& because that's what it is Test Plan: CC=clang CXX=clang++ cmake -GNinja -DENABLE_SANITIZERS=thread ninja check-all cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ninja check-all Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Differential Revision: https://reviews.bitcoinabc.org/D9162

fanquake added the P2P label May 4, 2020

sipa added the Privacy label May 4, 2020

DrahtBot mentioned this pull request May 4, 2020

p2p: Stop relaying non-mempool txs, improve tx privacy slightly #17303

Closed

luke-jr suggested changes May 4, 2020

View reviewed changes

sipa force-pushed the 202004_private_getdata branch from 998d436 to a00dae0 Compare May 7, 2020 17:31

jnewbery reviewed May 7, 2020

View reviewed changes

src/net_processing.cpp Outdated Show resolved Hide resolved

sipa force-pushed the 202004_private_getdata branch 2 times, most recently from 58f9d5d to 353a391 Compare May 7, 2020 22:54

amitiuttarwar reviewed May 9, 2020

View reviewed changes

src/net_processing.cpp Outdated Show resolved Hide resolved

src/net_processing.cpp Outdated Show resolved Hide resolved

jonatack reviewed May 11, 2020

View reviewed changes

sipa and others added 2 commits May 12, 2020 13:16

Abstract logic to determine whether to answer tx GETDATA

c6131bf

Push down use of cs_main into FindTxForGetData

f2f32a3

sipa force-pushed the 202004_private_getdata branch from 353a391 to 2b3f101 Compare May 12, 2020 20:20

Do not answer GETDATA for to-be-announced tx

2896c41

sipa force-pushed the 202004_private_getdata branch from 2b3f101 to 2896c41 Compare May 12, 2020 22:35

fanquake requested a review from amitiuttarwar May 13, 2020 01:06

amitiuttarwar reviewed May 14, 2020

View reviewed changes

fanquake merged commit c73bd00 into bitcoin:master May 19, 2020

sipa mentioned this pull request May 29, 2020

Only allow getdata of recently announced invs #19109

Merged

narula reviewed Jul 8, 2020

View reviewed changes

narula mentioned this pull request Sep 7, 2020

net: guard vRecvGetData with cs_vRecv and orphan_work_set with g_cs_orphans #19911

Merged

lehuuhieu7777 approved these changes Oct 14, 2020

View reviewed changes

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Do not answer GETDATA for to-be-announced tx #18861

Do not answer GETDATA for to-be-announced tx #18861

Uh oh!

Conversation

sipa commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

laanwj commented May 4, 2020

Uh oh!

practicalswift commented May 4, 2020

Uh oh!

amitiuttarwar commented May 4, 2020

Uh oh!

luke-jr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sipa May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naumenkogs commented May 5, 2020

Uh oh!

jnewbery commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnewbery commented May 7, 2020

Uh oh!

sipa commented May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnewbery left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jnewbery commented May 8, 2020

Uh oh!

naumenkogs commented May 8, 2020

Uh oh!

jnewbery commented May 8, 2020

Uh oh!

amitiuttarwar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jnewbery commented May 11, 2020

Uh oh!

jonatack left a comment

Choose a reason for hiding this comment

Uh oh!

sipa commented May 12, 2020

Uh oh!

jonatack commented May 12, 2020

Uh oh!

naumenkogs commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naumenkogs commented May 12, 2020

Uh oh!

jonatack commented May 12, 2020

Uh oh!

sipa commented May 12, 2020

Uh oh!

jnewbery commented May 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sipa commented May 4, 2020 •

edited

Loading

DrahtBot commented May 4, 2020 •

edited

Loading

sipa May 4, 2020 •

edited

Loading

jnewbery commented May 6, 2020 •

edited

Loading

sipa commented May 7, 2020 •

edited

Loading

sipa commented May 7, 2020 •

edited

Loading

jnewbery left a comment •

edited

Loading

naumenkogs commented May 12, 2020 •

edited

Loading

sipa commented May 12, 2020 •

edited

Loading

sipa May 14, 2020 •

edited

Loading

sipa Jul 8, 2020 •

edited

Loading