Skip to content

Conversation

davidgumberg
Copy link
Contributor

@davidgumberg davidgumberg commented Feb 11, 2023

As compact block completion works currently, nodes reveal precisely the subset of transactions from published blocks that they already have in their mempool when they make a GETBLOCKTXN request for the transactions that they are missing during compact block relay. The greatest danger here is that nodes will never request their own transactions. Given a "sufficient number" of GETBLOCKTXN's from a single peer, it will become possible to identify their wallet addresses with some degree of confidence.

Assuming that all transactions except for a node's own, have a nonzero probability of not being in the node's mempool when a block is discovered, an attacker with an infinite set of GETBLOCKTXN's from a single peer that reuses a finite number of pubkeys will have 100% confidence about what addresses belong to that peer.

I am not a statistician, but I am actively trying to see if I can work out how large, and whether the "sufficient number" that gives a reasonable degree of confidence about a peer-pubkey correlation is a realistic scenario or not.

This PR prevents mempool fingerprinting by randomly adding ~ 1 in 200 (0.5%) transactions from our mempool to our GETBLOCKTXN. Nodes that have less complete mempools (worse connections) will have fewer excess txn's to relay. (Nodes with 50% of block missing from mempool will tend to have about 5 excess transactions requested if there are 2000 txn's in a block) 0.5% is a number I mostly pulled out of thin air but a maximum impact of 0.5% seems like a reasonable price to pay if the fingerprinting attack described is realistic.

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 11, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept NACK naumenkogs
Approach NACK sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@davidgumberg davidgumberg force-pushed the wip-rndtxinclude branch 2 times, most recently from 7dc8ea1 to 1d80598 Compare February 11, 2023 22:46
In order to prevent fingerprinting, especially of our own txn's,
this adds a ~0.5% chance that transactions already in our mempool
get added to our GETBLOCKTXN request Nodes that have less
complete mempools are likely to have fewer excess txn's to relay.
@sipa
Copy link
Member

sipa commented Feb 12, 2023

It's an interesting observation that our responses to compact block announcements reveal something about our mempool, but I'm not sure it's worth the cost of addressing that:

  • Blocks are rare, and very expensive to produce, meaning that per block only a few of our peers even get the chance to query us about it (and it's unaffordable to produce more close-to-tip blocks to trigger that).
  • Increasing the size of compact block responses may actually add to propagation latency, especially when it results in a response that now need more TCP packets (the bandwidth isn't the concern here).
  • Just empirically, compact block relay works very well (on my well-connected node without wallets, 91% of blocks are reconstructed without asking for any transactions; 3.6% need 1 transaction; 3.2% need 2 transactions; 0.5% need 3 transactions; 1.1% need several). So even when our peers get a chance to learn something, there generally is very little to learn.

If we wanted to do something about this information leak nonetheless, I believe the right approach would be using the m_recently_announced_invs filter which we maintain for all our peers, and just add all transactions to the compact block response that we haven't told our peer about yet (and if there are too many, perhaps just immediately fall back to standard block relay).

@naumenkogs
Copy link
Member

I agree with @sipa, with a stronger emphasis that I would probably NACK this change because the cost of this fix is too high, and the privacy gain is too low.

You may be interested in contributing to some SPV client implementation instead :) I'm curious how well they preserve privacy when they request transactions/blocks (that subset which is of interest to them specifically). E.g. whether they ask the same node to provide everything — then the node can correlate.

@maflcko
Copy link
Member

maflcko commented Feb 13, 2023

Wouldn't it be better to not add wallet transactions to the mempool if we don't want peers to query our mempool for wallet transactions?

See also #11887 (comment) (and all in- and out- links in this issue)

@glozow glozow added the P2P label Feb 13, 2023
@petertodd
Copy link
Contributor

  • Just empirically, compact block relay works very well

Note that it's very easy for an adversary to change that by simply broadcasting simultaneous double-spends with the same fee. Indeed, n-way double spends broadcast to n different nodes is easy to do. So I don't think the observation that it works well right now is relevant to the adversarial case.

@sipa
Copy link
Member

sipa commented Feb 13, 2023

I agree with @sipa, with a stronger emphasis that I would probably NACK this change because the cost of this fix is too high, and the privacy gain is too low.

Yeah, Approach NACK. I may be convinced that doing something to avoid mempool fingerprinting through GETBLOCKTXN is worth it, but if we want that, there are better ways than this.

Note that it's very easy for an adversary to change that by simply broadcasting simultaneous double-spends with the same fee. Indeed, n-way double spends broadcast to n different nodes is easy to do. So I don't think the observation that it works well right now is relevant to the adversarial case.

That's fair; the other arguments are stronger.

Wouldn't it be better to not add wallet transactions to the mempool if we don't want peers to query our mempool for wallet transactions?

I don't think that's a good idea. The point is that we shouldn't treat wallet transactions any differently from transactions received from other peers. If we don't add wallet transactions to the mempool but still relay them (because otherwise nobody will ever know about them), we're adding a giant fingerprint to identify our transactions (relayed but not in mempool...).

I think the focus of this PR on wallet transactions in general is distracting. The issue, if any, is mempool fingerprinting. That might be used by attackers to learn about our wallet transactions, but also about many other things. But the solution isn't specific to wallet things; it should just be to prevent attackers from learning anything about our mempool transactions that haven't been announced to them.

@maflcko
Copy link
Member

maflcko commented Feb 13, 2023

If we don't add wallet transactions to the mempool but still relay them

Yeah, I didn't mention this, but obviously we wouldn't relay them with the mempool. Doing a one-shot (tor-only) outbound connection to fan-out the tx (one-hop dandelion) without adding it to the mempool shouldn't leave a fingerprint, other than the one left by the tor-only connection, no?

@sipa
Copy link
Member

sipa commented Feb 13, 2023

@MarcoFalke Oh, fair enough, that's a good idea (though it'd probably still need a fallback to normal relay after some delay if we don't observe the transaction being rumoured back to us). I also think it's orthogonal to the idea here, because even absent "first mile" wallet broadcast leakage, we still want the P2P network to obscure transaction relay beyond that.

@maflcko
Copy link
Member

maflcko commented Feb 13, 2023

we still want the P2P network to obscure transaction relay beyond that

I wonder if that is worth it. Given this issue here (and past ones), it just seems hard to think about and any guarantees are at best brittle in an evolving P2P network. So, long term, assuming the private "first mile" privacy-preserving fan out stuff is available, users and wallets caring about it will probably use that. Attempts to optimize the normal relay to be equally privacy-preserving will always have a taste of a false promise and it might be more honest to just tell people to not rely on that.

@sipa
Copy link
Member

sipa commented Feb 13, 2023

We can't rely on Tor for all wallet privacy, especially given that it's a centralized service that might just fail completely one day (and before that, it's hard to bound how much sufficiently powerful attackers can learn from traffic analysis in Tor).

Privacy on a public network is always multi-faceted, and it's fair we can't make strong guarantees. But on the other hand, we go through pretty substantial efforts to hide lots of things on a best-effort basis, especially involving transaction relay. And they're not all reducible to protecting wallet privacy (there is eclipse attack protection, fingerprinting for connection graph information, ...).

@achow101
Copy link
Member

This PR does not seem to have conceptual support. Please leave a comment if you would like this to be reopened.

@achow101 achow101 closed this Apr 25, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Apr 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants