Skip to content

Conversation

naumenkogs
Copy link
Member

This is a very simple code change with a big p2p privacy benefit.

It’s currently trivial to scrape any reachable node’s AddrMan (a database of all nodes known to them along with the timestamps).
We do have a limit of one GETADDR per connection, but a spy can disconnect and reconnect even from the same IP, and send GETADDR again and again.

Since we respond with 1,000 random records at most, depending on the AddrMan size it takes probably up to 100 requests for an spy to make sure they scraped (almost) everything.
I even have a script for that. It is totally doable within couple minutes.

Then, with some extra protocol knowledge a spy can infer the direct peers of the victim, and other topological stuff.

I suggest to cache responses to GETADDR on a daily basis, so that an attacker gets at most 1,000 records per day, and can’t track the changes in real time. I will be following up with more improvements to addr relay privacy, but this one alone is a very effective. And simple!

I doubt any of the real software does reconnect to get new addrs from a given peer, so we shouldn’t be cutting anyone.
I also believe it doesn’t have any negative implications on the overall topology quality. And the records being “outdated” for at most a day doesn’t break any honest assumptions either.

@naumenkogs
Copy link
Member Author

cc @EthanHeilman

@fanquake fanquake added the P2P label May 17, 2020
@naumenkogs naumenkogs force-pushed the 2020-05-addr-response-caching branch 2 times, most recently from f3b3128 to 2427af6 Compare May 17, 2020 01:42
@tryphe
Copy link
Contributor

tryphe commented May 17, 2020

Concept ACK

Suggestion: Instead of modifying CConnman and net.h, add all these changes into CAddrMan and add a new function CAddrMan::GetAddrCache, so the functionality is local to that class, since there's a lot of similar stuff going on in there.

Other thoughts: I wonder what implications this might have in advertising new peers, i.e. suppose the 24 hour window rolls over right before your daemon sees a new peer. Nobody would be able to get that peer from you for at least 24 hours. But I'm not sure if that's anything to be worried about, because everyone's 24 hour window will expire at different times, so some peers would always have the chance to advertise it soon.

@naumenkogs
Copy link
Member Author

naumenkogs commented May 17, 2020

Suggestion: Instead of modifying CConnman and net.h, add all these changes into CAddrMan and add a new function CAddrMan::GetAddrCache, so the functionality is local to that class, since there's a lot of similar stuff going on in there.

To me it feels like CConman is the perfect place for this cache. It's really something making sense only in the p2p context. We would never use this cache for any local stuff. But let's see what other reviewers think about it?

I wonder what implications this might have in advertising new peers, i.e. suppose the 24 hour window rolls over right before your daemon sees a new peer. Nobody would be able to get that peer from you for at least 24 hours. But I'm not sure if that's anything to be worried about, because everyone's 24 hour window will expire at different times, so some peers would always have the chance to advertise it soon.

True, but I think the role of this aspect in advertising new node is minor. Nodes don't send many GETADDRs, and one should be lucky to get that new node randomly.
I think the primary method of advertising new peers is them calling AdvertiseLocal and forwarding unsolicited ADDR messages. And that one remains unaffected by this change.

In addition, in future I'm planning to suggest doing AdvertiseLocal for feeler connections or other random connections, it should help both privacy and better addresss relay.

@naumenkogs naumenkogs force-pushed the 2020-05-addr-response-caching branch from 2427af6 to 1f469b0 Compare May 17, 2020 02:47
@DrahtBot
Copy link
Contributor

DrahtBot commented May 17, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@practicalswift
Copy link
Contributor

Concept ACK: nice simple idea - that's the best kind of idea! :)

@practicalswift
Copy link
Contributor

practicalswift commented May 17, 2020

Would it make sense to have one cache per network type so that the answer cannot be used to fingerprint a node across network types?

(I know there are other ways to do such fingerprinting, but no reason to add yet another one :))

@naumenkogs
Copy link
Member Author

naumenkogs commented May 17, 2020

@practicalswift good observation re: across net fingerprinting!
To be clear, my PR as it is now doesn't introduce another fingerprinting, but just makes it a little easier for a spy. Previously they would have to make 100 requests and compare across nodes, now they'd have to make just one.

But it would be indeed nice to not make it easier, and ideally even mitigate the fingerprinting you mention too.
For starters, we may have 2 caches: for regular (ipv4/v6) and non-regular (onion) networks, based on where the requester is located. I will put together a commit for that.
It could be further improved when we integrate addrv2.

This would be another big improvement, now cross-node fingerprinting via GETADDR would take days or months, instead of just couple hours at most (a spy currently still have to ask a lot of nodes to map Tor nodes to regular nodes).

@naumenkogs naumenkogs force-pushed the 2020-05-addr-response-caching branch 2 times, most recently from 9c70ee9 to 6dfa7ec Compare May 17, 2020 16:18
@maflcko
Copy link
Member

maflcko commented May 21, 2020

What effect does this have on nodes that explicitly want to fan out onion addresses on the ipv4 net? Do they not exist?

@naumenkogs
Copy link
Member Author

naumenkogs commented May 22, 2020

What effect does this have on nodes that explicitly want to fan out onion addresses on the ipv4 net? Do they not exist?

I don't touch that behaviour at all! I introduce two separate caches, but both of them are filled the same way as before. The separation is based on who requests (onion or regular), not the type of AddrMan records.

So, if this relay was possible before, it still is.
If we previously already had unsatisfactory onion to ipv4 relay, then it remains, or maybe a little worse. But this should be addressed in a separate PR.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On network topology, the change I can foresee is a decrease in freshness of received address and therefore for bootstrapping nodes nudge towards connecting to older nodes ? By older I mean the ones we were aware of before AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL period. I don't say this side-effect is bad, it maybe even good to favor connection to "mature" nodes but maybe it should be documented.

To me it feels like CConman is the perfect place for this cache. It's really something making sense only in the p2p context. We would never use this cache for any local stuff. But let's see what other reviewers think about it?

I think we should avoid introducing a timer in AddrMan, it makes it harder to reason on? And agree cache is only due to "untrustness" of query in this context.

I doubt any of the real software does reconnect to get new addrs from a given peer, so we shouldn’t be cutting anyone.

In case of doubt, add a release note ? But even if there is such software I would qualify them of broken, or non-distinguishable from spy.

@ariard
Copy link

ariard commented May 22, 2020

With regards to fingerprint, I think we have a more signaling descriptor, i.e address, and changing this one would force cleaning the cache, thus it doesn't introduce a more severe one ?

@naumenkogs
Copy link
Member Author

On network topology, the change I can foresee is a decrease in freshness of received address and therefore for bootstrapping nodes nudge towards connecting to older nodes ?

I talk about this here.

With regards to fingerprint, I think we have a more signaling descriptor, i.e address, and changing this one would force cleaning the cache, thus it doesn't introduce a more severe one ?

Can you elaborate? Not sure I'm following.

@naumenkogs naumenkogs force-pushed the 2020-05-addr-response-caching branch from 328788c to 0d6ae10 Compare May 30, 2020 16:25
@naumenkogs naumenkogs force-pushed the 2020-05-addr-response-caching branch from 20de792 to 3bd67ba Compare July 30, 2020 11:39
@jnewbery
Copy link
Contributor

reACK 3bd67ba

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 3bd67ba.

{
const auto current_time = GetTime<std::chrono::microseconds>();
if (m_addr_response_caches.find(requestor_network) == m_addr_response_caches.end() ||
m_addr_response_caches[requestor_network].m_update_addr_response < current_time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acd6135

Use iterator instead:

auto i = m_addr_response_caches.find(requestor_network);
if (i == m_addr_response_caches.end() || i->m_update_addr_response < current_time)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really work because it requires "i->second" :(

if (m_addr_response_caches.find(requestor_network) == m_addr_response_caches.end() ||
m_addr_response_caches[requestor_network].m_update_addr_response < current_time) {
m_addr_response_caches[requestor_network].m_addrs_response_cache = GetAddresses();
m_addr_response_caches[requestor_network].m_update_addr_response = current_time + std::chrono::hours(21) + GetRandMillis(std::chrono::hours(6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acd6135

I think you could explain this expiration duration. Looks like it could be a lot less and still prevent spying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to prevent peers scraping our addrman. The shorter the cache duration, the less time it would take a peer to get an (almost) complete scan of addrman.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, it's hard to justify 24h vs 1h cache without a good analytical model of the whole addr relay.

Right now cache lifetime is 0, so it takes like 0 seconds to scrape it. If cache was 1h, it would take about 100h (because ~100 requests are usually needed). However, during those 100h sensitive data might be already overwritten, so an attacker might not get useful info. It feels like 100h is too short, but maybe 1000h would be fine then?

On the other hand, shorter lifetime would keep addr records a bit more fresh (if we consider GETADDR relay aspect), so that's the benefit/tradeoff. But again, how much does the freshness of GETADDR matter? And especially since with 24h cache the records are at most 24h outdated, while we look at 1 week when considering IsTerrible(). That's why I don't think it's a big of a concern and a tradeoff to consider.

* with fresh timestamps (per self-announcement).
*/
struct CachedAddrResponse {
std::vector<CAddress> m_addrs_response_cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acd6135

nit, no need for m_ prefix in structs (?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that all member variables (of classes and structs) should have the m_ prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah Ive got the same impression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some recently added structs. Nit anyway.

Copy link
Member Author

@naumenkogs naumenkogs Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see e.g. TxDownloadState has m_ prefix :)

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 3bd67ba

Mostly minors, I think commit messages/comments could be more explicit about assumptions and resume fruitful PR/IRC discussions but can be done as follow-up.

return addrman.GetAddr();
std::vector<CAddress> addresses = addrman.GetAddr();
if (m_banman) {
addresses.erase(std::remove_if(addresses.begin(), addresses.end(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that's a slight perfomance regression as you will iterate twice on the non-erased vector elements. Likely not to matter as it's not a hot point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. remove-erase removes unwanted elements in a single pass.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, in net_processing, we iterate on vAddr and filter out discouraged/banned elements before to push them in vAddrToSend. So that's a single pass.

Now, in net.cpp we iterate on addresses on every element, then in net_processing we iterate again on vAddr to push elements in vAddrToSend. So that's two pass on the valid set ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. Yes, you're right that for peers with "addr" permissions, we'll iterate through the list twice (although the list will be max size 1000 instead of 2500, so still fewer items). However, in the common case, we'll only iterate through the cached list once, and not have to call IsDiscouraged() or IsBanned() on that list.

*/
struct CachedAddrResponse {
std::vector<CAddress> m_addrs_response_cache;
std::chrono::microseconds m_update_addr_response{0};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds you compare a verb m_update_addr_response to a noun current_time. IMO, m_cache_expiration_time would be more meaningful

* to easily detect that it is the same node by comparing responses.
* The used memory equals to 1000 CAddress records (or around 32 bytes) per
* distinct Network (up to 5) we have/had an inbound peer from,
* resulting in at most ~160 KB.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may underscore that the cache is global and not per-peer thus being uniform to who ever is querying during the cache period. I think a per-peer cache would be worst, unless you have first a global and announce random subset from it.

That said, an area of research could be to tie the cache in function of the subnet/ASN of requestors to nudge their connections towards a more diversified graph. Or sort/cap them by subnet. But likely such mechanism could be a double-edge :)

@naumenkogs
Copy link
Member Author

Thanks for the reviews!
It's currently Code review ACKs from ariard, jnewbery and promag on the up-to-date commit, several of Concept ACKs before, and several tested ACKs on the version (before minor refactoring) we discussed at the review club.

The latest suggestions seem to be couple nits and discussion points, so for the sake of review efficiency let's not address them for now :)

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest suggestions seem to be couple nits and discussion points, so for the sake of review efficiency let's not address them for now :)

I agree. I think this is ready for merge now. Minor review points can be fixed in a follow-up.

return addrman.GetAddr();
std::vector<CAddress> addresses = addrman.GetAddr();
if (m_banman) {
addresses.erase(std::remove_if(addresses.begin(), addresses.end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. remove-erase removes unwanted elements in a single pass.

@promag
Copy link
Contributor

promag commented Aug 3, 2020

Don't forget #18991 (comment), seems it wasn't previously discussed.

@laanwj laanwj merged commit 14ceddd into bitcoin:master Aug 3, 2020
@laanwj
Copy link
Member

laanwj commented Aug 3, 2020

Don't forget #18991 (comment), seems it wasn't previously discussed.

Sorry, hadn't noticed this before merging. As it's a question, I think discussion can simply continue here, and if further changes are necessary it can be in a follow-up PR?

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 3, 2020
…eaks

3bd67ba Test addr response caching (Gleb Naumenko)
cf1569e Add addr permission flag enabling non-cached addr sharing (Gleb Naumenko)
acd6135 Cache responses to addr requests (Gleb Naumenko)
7cc0e81 Remove useless 2500 limit on AddrMan queries (Gleb Naumenko)
ded742b Move filtering banned addrs inside GetAddresses() (Gleb Naumenko)

Pull request description:

  This is a very simple code change with a big p2p privacy benefit.

  It’s currently trivial to scrape any reachable node’s AddrMan (a database of all nodes known to them along with the timestamps).
  We do have a limit of one GETADDR per connection, but a spy can disconnect and reconnect even from the same IP, and send GETADDR again and again.

  Since we respond with 1,000 random records at most, depending on the AddrMan size it takes probably up to 100 requests for an spy to make sure they scraped (almost) everything.
  I even have a script for that. It is totally doable within couple minutes.

  Then, with some extra protocol knowledge a spy can infer the direct peers of the victim, and other topological stuff.

  I suggest to cache responses to GETADDR on a daily basis, so that an attacker gets at most 1,000 records per day, and can’t track the changes in real time. I will be following up with more improvements to addr relay privacy, but this one alone is a very effective. And simple!

  I doubt any of the real software does *reconnect to get new addrs from a given peer*, so we shouldn’t be cutting anyone.
  I also believe it doesn’t have any negative implications on the overall topology quality. And the records being “outdated” for at most a day doesn’t break any honest assumptions either.

ACKs for top commit:
  jnewbery:
    reACK 3bd67ba
  promag:
    Code review ACK 3bd67ba.
  ariard:
    Code Review ACK 3bd67ba

Tree-SHA512: dfa5d03205c2424e40a3f8a41af9306227e1ca18beead3b3dda44aa2a082175bb1c6d929dbc7ea8e48e01aed0d50f0d54491caa1147471a2b72a46c3ca06b66f
@sipa
Copy link
Member

sipa commented Aug 3, 2020

Random thought: there is a (rather uncommon, I suspect) scenario in which this definitely worsens node identity discoverability: if a peer has multiple listening addresses associated with one network. I don't think this is a big concern, but it's also fairly easy to address: make the cache map not indexed by just Network, but also add the local socket address (see getsockname).

@naumenkogs
Copy link
Member Author

scenario in which this definitely worsens node identity discoverability: if a peer has multiple listening addresses associated with one network

You probably mean makes it easier to discover [...]. True, although again, it was already trivial (just make 100 GETADDR even from the same host) and see timestamps, so I won't say this degradation is a big deal at all. I believe @ariard raised this concern somewhere above.

But yeah, it would be great to address this issue... Then it would be another good improvement over the pre-cache code. Seems like a good candidate for inclusion in a follow-up.

@sipa should we bound number of caches in this case? If getsockname for some reason returns too many values we shouldn't blow up memory. Just not sure how valid this concern is. Alternative is just simple "new cache every time"

laanwj added a commit that referenced this pull request Aug 12, 2020
…cords to addrman

37a480e [net] Add addpeeraddress RPC method (John Newbery)
ae8051b [test] Test that getnodeaddresses() can return all known addresses (John Newbery)
f26502e [addrman] Specify max addresses and pct when calling GetAddresses() (John Newbery)

Pull request description:

  Currently addrman only allows a maximum of 1000 records or 23% of all records to be returned in a call to `GetAddr()`. Relax this limit and have the client specify the max records they want. For p2p, behaviour is unchanged (but the rate limiting is set inside net_processing, where it belongs). For RPC, `getnodeaddresses` can now return the complete addrman, which is helpful for testing and monitoring.

  Also add a test-only RPC `addpeeraddress`, which adds an IP address:port to addrman. This is helpful for testing (eg #18991).

ACKs for top commit:
  naumenkogs:
    utACK 37a480e
  laanwj:
    Code review and lightly manually tested ACK 37a480e

Tree-SHA512: f86dcd410aaebaf6e9ca18ce6f23556e5e4649c1325577213d873aa09967298e65ab2dc19a72670641ae92211a923afda1fe124a82e9d2c1cad73d478ef27fdc
luke-jr added a commit to bitcoinknots/bitcoin that referenced this pull request Aug 15, 2020
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Aug 15, 2020
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Aug 15, 2020
laanwj added a commit that referenced this pull request Sep 21, 2020
0d04784 Refactor the functional test (Gleb Naumenko)
83ad65f Address nits in ADDR caching (Gleb Naumenko)
81b00f8 Add indexing ADDR cache by local socket addr (Gleb Naumenko)
42ec558 Justify the choice of ADDR cache lifetime (Gleb Naumenko)

Pull request description:

  This is a follow-up on #18991 which does 3 things:
  - improves privacy of a node listening to multiple addresses via adding cache index by local socket addr (suggested [here](#18991 (comment)))
  - documents on the choice of 24h cache lifetime
  - addresses nits from #18991

ACKs for top commit:
  jnewbery:
    utACK 0d04784
  vasild:
    ACK 0d04784
  jonatack:
    Code review ACK 0d04784

Tree-SHA512: bb65a34dd1ce2811186d3e4469bc33e8399cebaaa494ce13041c7cff23275870e4176a719f7a72f8d779c49f8b2344bf4fa1aeb3ea4e2626d5ae76514f00a750
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 21, 2020
0d04784 Refactor the functional test (Gleb Naumenko)
83ad65f Address nits in ADDR caching (Gleb Naumenko)
81b00f8 Add indexing ADDR cache by local socket addr (Gleb Naumenko)
42ec558 Justify the choice of ADDR cache lifetime (Gleb Naumenko)

Pull request description:

  This is a follow-up on bitcoin#18991 which does 3 things:
  - improves privacy of a node listening to multiple addresses via adding cache index by local socket addr (suggested [here](bitcoin#18991 (comment)))
  - documents on the choice of 24h cache lifetime
  - addresses nits from bitcoin#18991

ACKs for top commit:
  jnewbery:
    utACK 0d04784
  vasild:
    ACK 0d04784
  jonatack:
    Code review ACK 0d04784

Tree-SHA512: bb65a34dd1ce2811186d3e4469bc33e8399cebaaa494ce13041c7cff23275870e4176a719f7a72f8d779c49f8b2344bf4fa1aeb3ea4e2626d5ae76514f00a750
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 7, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#18991 | core#18991]] [1/5]
bitcoin/bitcoin@ded742b

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Subscribers: majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10053
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 7, 2021
Summary:
bitcoin/bitcoin@7cc0e81
This is a backport of [[bitcoin/bitcoin#18991 | core#18991]] [2/5]

Depends on D10053

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10054
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 7, 2021
Summary:
Prevents a spy from scraping victim's AddrMan by
reconnecting and re-requesting addrs.
This is a backport of [[bitcoin/bitcoin#18991 | core#18991]] [3/5]
bitcoin/bitcoin@acd6135

Depends on D10054

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10055
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 7, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#18991 | core#18991]] [4/5]
bitcoin/bitcoin@cf1569e

Depends on D10055

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10056
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 7, 2021
Summary:
This is a backport of [[bitcoin/bitcoin#18991 | core#18991]] [5/5]
bitcoin/bitcoin@3bd67ba

Depends on D10056

Backport note: The only changes from the original PR are the removal of `NODE_WITNESS` from `nServices` and the renaming of `mininode` to `p2p`

Test Plan: `ninja check-functional`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D10057
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.