Skip to content

Conversation

amitiuttarwar
Copy link
Contributor

For the full context & motivation of this patch, see #27213

This is joint work with mzumsande.

This PR adds functionality to AddrMan::Select to enable callers to specify a network they are interested in.

Along the way, it refactors the function to deduplicate the logic, updates the local variables to match modern conventions, adds test coverage for both the new and existing Select logic, and adds bench tests for the worst case performance of both the new and existing Select logic.

This functionality is used in the parent PR.

Extract the logic that decides whether the new or the tried table is going to
be searched to the beginning of the function.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 6, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild, brunoerg, ajtowns, mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27213 (p2p: Diversify automatic outbound connections with respect to networks by amitiuttarwar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot changed the title addrman: Enable selecting addresses by network addrman: Enable selecting addresses by network Mar 6, 2023
@DrahtBot DrahtBot added the P2P label Mar 6, 2023
Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

going to review soon

@amitiuttarwar amitiuttarwar force-pushed the 2023-03-select-by-network branch from fdfeac9 to 3a93327 Compare March 7, 2023 03:01
@amitiuttarwar
Copy link
Contributor Author

thanks for the review @brunoerg ! responded to all your feedback

@amitiuttarwar amitiuttarwar force-pushed the 2023-03-select-by-network branch from 3a93327 to 25a64a2 Compare March 7, 2023 22:52
Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crACK 25a64a2

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 25a64a2

Feel free to ignore the nits below. I will try to assess the performance of the new Select(network).

Thanks for working on this!

});
}

static void AddrManSelectByNetwork(benchmark::Bench& bench)
Copy link
Contributor

@vasild vasild Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is excellent to see benchmark tests added! My only concern with this PR is that it will iterate through a lot of entries in addrman before finding one from the requested network. This in theory is O(addrman size). And the benchmarks will help us assess that. So, it would be nice to compare, on the same, full addrman (e.g. 70k addresses):

  1. the speed of a regular Select()
  2. the speed of a Select(network) where the searched for address is near the end rare. In other words, worst case scenario, when ~70k addresses are iterated before finding the result.

Somehow I don't see that from the added benchmarks. I will play with them to see if that's possible (how to put an address "at the end"?, that is nonsense, there is no end).

Btw the results are unstable:

:wavy_dash: `AddrManSelectByNetwork` (Unstable with ~1.2 iters. Increase `minEpochIterations` to e.g. 12)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me elaborate why I think comparing 1. and 2. above is important (the benchmarks in this PR don't do that):

So, we will add security at the cost of making it a bit slower. To assess how much slower I tweaked the benchmark as this:

benchmark tweaks
diff --git i/src/bench/addrman.cpp w/src/bench/addrman.cpp
index 9fe50f4ec2..d1def9e520 100644
--- i/src/bench/addrman.cpp
+++ w/src/bench/addrman.cpp
@@ -12,13 +12,13 @@
 
 #include <optional>
 #include <vector>
 
 /* A "source" is a source address from which we have received a bunch of other addresses. */
 
-static constexpr size_t NUM_SOURCES = 64;
+static constexpr size_t NUM_SOURCES = 512; // fills addrman with ~55k addresses, instead of ~15k
 static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256;
 
 static NetGroupManager EMPTY_NETGROUPMAN{std::vector<bool>()};
 static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0};
 
 static std::vector<CAddress> g_sources;
@@ -137,16 +137,20 @@ static void AddrManSelectByNetwork(benchmark::Bench& bench)
     CAddress i2p_address(i2p_service, NODE_NONE);
     i2p_address.nTime = Now<NodeSeconds>();
     CNetAddr source = ResolveIP("252.2.2.2");
     addrman.Add({i2p_address}, source);
 
     FillAddrMan(addrman);
-
-    bench.run([&] {
-        const auto& address = addrman.Select(/*new_only*/false, NET_I2P);
-        assert(address.first.ToStringAddrPort() == "udhdrtrcetjm5sxzskjyr5ztpeszydbh4dpl3pl4utgqqw2v4jna.b32.i2p:0");
+    fprintf(stderr, "addrman size: %zu\n", addrman.Size());
+
+    bench.minEpochIterations(300).run([&] {
+#if 0
+        addrman.Select(/*new_only=*/false);
+#else
+        addrman.Select(/*new_only=*/false, NET_I2P);
+#endif
     });
 }
 
 static void AddrManGetAddr(benchmark::Bench& bench)
 {
     AddrMan addrman{EMPTY_NETGROUPMAN, /*deterministic=*/false, ADDRMAN_CONSISTENCY_CHECK_RATIO};

The results are:

  1. Select() takes 0.18 microseconds
  2. Select(network) takes 6000 microseconds

Now, this is a big difference, but maybe that is ok. We don't Select() in a tight loop. However, it can be improved relatively easy. I observed that in 2. we visit between ~100 and ~600k bucket positions before finding a match. Given that there are 80k in addrman it means that we visit some multiple times, which is a waste of resources. This is avoided if we don't visit already visited buckets (but still visit buckets in random order). The patch below does that (on top of this PR):

don't visit already visited buckets
diff --git i/src/addrman.cpp w/src/addrman.cpp
index 1023c3cbdb..0e196050b9 100644
--- i/src/addrman.cpp
+++ w/src/addrman.cpp
@@ -16,12 +16,13 @@
 #include <streams.h>
 #include <tinyformat.h>
 #include <uint256.h>
 #include <util/check.h>
 #include <util/time.h>
 
+#include <array>
 #include <cmath>
 #include <optional>
 
 /** Over how many buckets entries with tried addresses from a single group (/16 for IPv4) are spread */
 static constexpr uint32_t ADDRMAN_TRIED_BUCKETS_PER_GROUP{8};
 /** Over how many buckets entries with new addresses originating from a single group are spread */
@@ -745,17 +746,39 @@ std::pair<CAddress, NodeSeconds> AddrManImpl::Select_(bool new_only, std::option
     } else {
         search_tried = insecure_rand.randbool();
     }
 
     const int bucket_count{search_tried ? ADDRMAN_TRIED_BUCKET_COUNT : ADDRMAN_NEW_BUCKET_COUNT};
 
+    // We want to search buckets in random order and also prefer visiting a
+    // bucket that we have not visited before instead of visiting an already
+    // visited bucket. Thus shuffle the buckets and visit that shuffled list
+    // in order.
+
+    std::array<uint16_t, std::max(ADDRMAN_TRIED_BUCKET_COUNT, ADDRMAN_NEW_BUCKET_COUNT)>
+        shuffled_bucket_indexes;
+    for (uint16_t i = 0; i < bucket_count; ++i) {
+        shuffled_bucket_indexes[i] = i;
+    }
+    Shuffle(shuffled_bucket_indexes.begin(),
+            shuffled_bucket_indexes.begin() + bucket_count,
+            insecure_rand);
+    // If we visit a bucket and find 0 matching addresses in it, mark it with
+    // this and never visit it again.
+    static constexpr uint16_t ALREADY_VISITED_AND_BORING{std::numeric_limits<uint16_t>::max()};
+
+    static_assert(ADDRMAN_TRIED_BUCKET_COUNT < ALREADY_VISITED_AND_BORING);
+    static_assert(ADDRMAN_NEW_BUCKET_COUNT < ALREADY_VISITED_AND_BORING);
+
     //  Loop through the addrman table until we find an appropriate entry
     double chance_factor = 1.0;
-    while (1) {
-        // Pick a bucket, and an initial position in that bucket.
-        int bucket = insecure_rand.randrange(bucket_count);
+    for (uint16_t b = 0;; b = (b + 1) % bucket_count) {
+        if (shuffled_bucket_indexes[b] == ALREADY_VISITED_AND_BORING) {
+            continue;
+        }
+        const size_t bucket{shuffled_bucket_indexes[b]};
         int initial_position = insecure_rand.randrange(ADDRMAN_BUCKET_SIZE);
 
         // Iterate over the positions of that bucket, starting at the initial one,
         // and looping around.
         int i;
         for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {
@@ -769,14 +792,19 @@ std::pair<CAddress, NodeSeconds> AddrManImpl::Select_(bool new_only, std::option
                 } else {
                     break;
                 }
             }
         }
 
-        // If the bucket is entirely empty, start over with a (likely) different one.
-        if (i == ADDRMAN_BUCKET_SIZE) continue;
+        // Start over with a different bucket if this one is entirely empty or
+        // specific network was requested and it does not contain any addresses
+        // from that network.
+        if (i == ADDRMAN_BUCKET_SIZE) {
+            shuffled_bucket_indexes[b] = ALREADY_VISITED_AND_BORING;
+            continue;
+        }
 
         // Find the entry to return.
         int position = (initial_position + i) % ADDRMAN_BUCKET_SIZE;
         int nId = LookupAddrmanEntry(search_tried, bucket, position);
         const auto it_found{mapInfo.find(nId)};
         assert(it_found != mapInfo.end());

It changes Select(network) to about 3000 microseconds (down from 6000), but that result from the benchmark is averaged between executions that take anything between 100 and 600k iterations. What is more important is that with the change above it never takes more than ~60k iterations, that is - anything between 100 and 60k. Worst case lowered ~10 times.

The optimization brings some complexity but IMO it is worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice speedup in the case where we only have a few addresses - however, due to the constant overhead of building the shuffled list of buckets in the beginning I'd expect performance to go down a bit for the case where we have many addresses to choose from. Do you see this in your benchmark?
Why does this need ALREADY_VISITED_AND_BORING? If we do a for-loop through a pre-shuffled list of buckets instead of a while loop, doesn't that already guarantee that we visit each bucket at most once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I am understanding this thread properly, it seems like there are two concepts being discussed:

  1. changes to the bench tests
  2. optimization to AddrManImpl::Select_

RE 1, I'm slightly confused as to the desired coverage. Without the network parameter, I would expect the worst case of the current Select_ function to happen when addrman is practically empty, which is why we introduced AddrManSelectFromAlmostEmpty. On mainnet, this case would be unlikely, which is represented by AddrManSelect. With the network parameter, I would expect worst case to be represented by AddrManSelectByNetwork where theres just 1 I2P address on an addrman filled with other addresses. @vasild can you clarify what feels absent or misrepresentative?

also happy to discuss 2 further, but a question there - does it feel relevant to these patches to improve performance, or is it an orthogonal improvement (since we see similar worst cases right now)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzumsande,

I'd expect performance to go down a bit for the case where we have many addresses to choose from. Do you see this in your benchmark?

Yes, that slows down a bit the fast Select() (any network is ok). I observed that and think it is ok because it is still super fast with or without the shuffling.

Why does this need ALREADY_VISITED_AND_BORING?

The for loop could still repeat the whole array from the start if some address was found but was skipped. ALREADY_VISITED_AND_BORING is a further optimization to completely skip that bucket on the second and further passes through the array.

@amitiuttarwar, you are right that I modified the newly added AddrManSelectByNetwork() so that, in general, it tests the same thing as AddrManSelect() which already exists in master. I did that in order to be sure that everything else is identical: the way addrman is filled, minEpochIterations(300) and the GetPort() call. Way too many times I have been bitten by chasing noise in benchmarks, so I wanted to be sure that this is the only difference in the two things I am comparing:

#if 0 // flip to 1 to test the other case
        addrman.Select(/*new_only=*/false);
#else
        addrman.Select(/*new_only=*/false, NET_I2P);
#endif

also happy to discuss 2 further, but a question there - does it feel relevant to these patches to improve performance, or is it an orthogonal improvement

I think the performance improvement would be a nice addition to the work done by this PR (or maybe even a "must have"?)

(since we see similar worst cases right now)?

Hmm? Now on master on mainnet it takes 0.18 microseconds and with this PR on mainnet it will take 6000 microseconds. That is the average case (averaged by the benchmark executing Select about 3500 times). The difference in the worst case is even more pronounced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to asses how much is the slowdown. Use a full addrman (15k is not full) because that is what people out there are running on. If the slowdown is ok, then no further improvements are necessary on this PR or its parent.

agreed that this is the fundamental thing we are trying to evaluate - is the performance difference significant & acceptable

I will try to compare on a snapshot of an addrman from my public full node, not on artificially filled addrman from the bench. Last time I checked it had ~70k addresses and also it will have at least a bunch of addresses from each network, not just 1 address. That would be more representative.

would be awesome if you could compare based on your addrman snapshot from mainnet!

my understanding of martin's benches is its trying to observe "normal" as well as "worst case". so the examples with 1 address are aiming towards "worst case". but of course "normal" would vary widely from node to node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your "Test 5" doing? Does it call 20 times Select(network)?

No, it calls Select(network) for an addrman that has 20 addresses of the selected network.

I'll try to explain the idea behind my benchmarks better:
Starting point is the observation that the suggested change makes Select slower if we already know a lot of addresses of the desired type, while making it faster if addrman has very few addresses of this type. This is true for botch Select() and Select(network), which is why it is not obvious at first sight if the suggestion is an overall performance improvement or a degradation.

So my idea was to find the cutoff - how many addresses do we need such that both algorithms are equally fast - in situations where we have more addrs than this, the change would be a slowdown, when we have less it's a speedup. This cutoff turned out to be very small for Select() (3 addresses), while being a bit larger for Select(network) - it was approximately 20. It would be helpful to know how things change with a full addrman!

So if we used an addrman with counts similar to those reported by @jonatack, my results would suggest that the pre-shufffling change suggested here would be a speedup for CJDNS-specific queries, while being a slowdown for clearnet, onion, i2p and unspecific queries (all compared to this PR).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on a mainnet peers.dat:

addrman size total: 72947
addrman size IPv4:  47252
addrman size IPv6:  11491
addrman size Tor:   13305
addrman size I2P:   893
addrman size CJDNS: 6
[patch] How to load /tmp/peers.dat from bench_bitcoin
diff --git i/src/bench/addrman.cpp w/src/bench/addrman.cpp
index 8a5cab443f..b8ec4a4071 100644
--- i/src/bench/addrman.cpp
+++ w/src/bench/addrman.cpp
@@ -1,17 +1,22 @@
 // Copyright (c) 2020-2022 The Bitcoin Core developers
 // Distributed under the MIT software license, see the accompanying
 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
 
+#include <addrdb.h>
 #include <addrman.h>
 #include <bench/bench.h>
+#include <chainparams.h>
+#include <chainparamsbase.h>
 #include <netbase.h>
 #include <netgroup.h>
 #include <random.h>
 #include <util/check.h>
+#include <util/system.h>
 #include <util/time.h>
+#include <util/translation.h>
 
 #include <optional>
 #include <vector>
 
 /* A "source" is a source address from which we have received a bunch of other addresses. */
 
@@ -21,12 +26,23 @@ static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256;
 static NetGroupManager EMPTY_NETGROUPMAN{std::vector<bool>()};
 static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0};
 
 static std::vector<CAddress> g_sources;
 static std::vector<std::vector<CAddress>> g_addresses;
 
+std::unique_ptr<AddrMan> LoadAddrmanFromPeersDat()
+{
+    SelectBaseParams("main");
+    SelectParams("main");
+    gArgs.ForceSetArg("-datadir", "/tmp");
+    std::unique_ptr<AddrMan> addrman;
+    auto err = LoadAddrman(EMPTY_NETGROUPMAN, gArgs, addrman);
+    assert(!err.has_value());
+    return addrman;
+}
+
 static void CreateAddresses()
 {
     if (g_sources.size() > 0) { // already created
         return;
     }
 
@@ -69,19 +85,12 @@ static void FillAddrMan(AddrMan& addrman)
 {
     CreateAddresses();
 
     AddAddressesToAddrMan(addrman);
 }
 
-static CNetAddr ResolveIP(const std::string& ip)
-{
-    CNetAddr addr;
-    LookupHost(ip, addr, false);
-    return addr;
-}
-
 static CService ResolveService(const std::string& ip, uint16_t port = 0)
 {
     CService serv;
     Lookup(ip, serv, port, false);
     return serv;
 }
@@ -125,26 +134,24 @@ static void AddrManSelectFromAlmostEmpty(benchmark::Bench& bench)
         (void)addrman.Select();
     });
 }
 
 static void AddrManSelectByNetwork(benchmark::Bench& bench)
 {
-    AddrMan addrman{EMPTY_NETGROUPMAN, /*deterministic=*/false, ADDRMAN_CONSISTENCY_CHECK_RATIO};
-
-    // add single I2P address to new table
-    CService i2p_service;
-    i2p_service.SetSpecial("udhdrtrcetjm5sxzskjyr5ztpeszydbh4dpl3pl4utgqqw2v4jna.b32.i2p");
-    CAddress i2p_address(i2p_service, NODE_NONE);
-    i2p_address.nTime = Now<NodeSeconds>();
-    CNetAddr source = ResolveIP("252.2.2.2");
-    addrman.Add({i2p_address}, source);
-
-    FillAddrMan(addrman);
+    auto addrman = LoadAddrmanFromPeersDat();
+    std::cout << "addrman size total: " << addrman->Size() << std::endl
+              << "addrman size IPv4:  " << addrman->Size(NET_IPV4) << std::endl
+              << "addrman size IPv6:  " << addrman->Size(NET_IPV6) << std::endl
+              << "addrman size Tor:   " << addrman->Size(NET_ONION) << std::endl
+              << "addrman size I2P:   " << addrman->Size(NET_I2P) << std::endl
+              << "addrman size CJDNS: " << addrman->Size(NET_CJDNS) << std::endl;
 
     bench.run([&] {
-        (void)addrman.Select(/*new_only=*/false, NET_I2P);
+        //addrman->Select(/*new_only=*/false);
+        //addrman->Select(/*new_only=*/false, NET_I2P);
+        addrman->Select(/*new_only=*/false, NET_CJDNS);
     });
 }
 
 static void AddrManGetAddr(benchmark::Bench& bench)
 {
     AddrMan addrman{EMPTY_NETGROUPMAN, /*deterministic=*/false, ADDRMAN_CONSISTENCY_CHECK_RATIO};

Results:

  • Select(/*new_only=*/false): 0.265 microseconds, this is what master is doing before this PR

After this PR:

  • Select(/*new_only=*/false, NET_IPV4): 0.5 microseconds
  • Select(/*new_only=*/false, NET_IPV6): 1.7 microseconds
  • Select(/*new_only=*/false, NET_ONION): 3.1 microseconds
  • Select(/*new_only=*/false, NET_I2P): 27 microseconds
  • Select(/*new_only=*/false, NET_CJDNS): 1600 microseconds

Is this slowdown acceptable? Maybe yes. We don't call Select in a tight loop, so maybe even the 1600 microseconds is ok.

What happens if we mark and subsequently skip buckets we have seen to contain 0 interesting addresses, without the shuffling? I guess that should bring most of the improvement from the above patch without the downside of the shuffling. Here is the change on top of this PR:

[patch] Skip boring
diff --git i/src/addrman.cpp w/src/addrman.cpp
index cdfd079fcd..46f4de99be 100644
--- i/src/addrman.cpp
+++ w/src/addrman.cpp
@@ -747,15 +747,19 @@ std::pair<CAddress, NodeSeconds> AddrManImpl::Select_(bool new_only, std::option
     }
 
     const int bucket_count{search_tried ? ADDRMAN_TRIED_BUCKET_COUNT : ADDRMAN_NEW_BUCKET_COUNT};
 
     // Loop through the addrman table until we find an appropriate entry
     double chance_factor = 1.0;
+    std::unordered_set<int> already_visited_and_boring_buckets;
     while (1) {
         // Pick a bucket, and an initial position in that bucket.
         int bucket = insecure_rand.randrange(bucket_count);
+        if (already_visited_and_boring_buckets.count(bucket) > 0) {
+            continue;
+        }
         int initial_position = insecure_rand.randrange(ADDRMAN_BUCKET_SIZE);
 
         // Iterate over the positions of that bucket, starting at the initial one,
         // and looping around.
         int i;
         for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {
@@ -770,14 +774,19 @@ std::pair<CAddress, NodeSeconds> AddrManImpl::Select_(bool new_only, std::option
                 } else {
                     break;
                 }
             }
         }
 
-        // If the bucket is entirely empty, start over with a (likely) different one.
-        if (i == ADDRMAN_BUCKET_SIZE) continue;
+        // Start over with a different bucket if this one is entirely empty or
+        // specific network was requested and it does not contain any addresses
+        // from that network.
+        if (i == ADDRMAN_BUCKET_SIZE) {
+            already_visited_and_boring_buckets.insert(bucket);
+            continue;
+        }
 
         // Find the entry to return.
         int position = (initial_position + i) % ADDRMAN_BUCKET_SIZE;
         int nId = GetEntry(search_tried, bucket, position);
         const auto it_found{mapInfo.find(nId)};
         assert(it_found != mapInfo.end());

Result for NET_CJDNS where the slowdown is most pronounced: 1200 microseconds (vs 1600). Those numbers are averaged over many runs and they do not tell the whole story. Here are the distributions without "skip boring" (the height of a bar indicates how many addresses were checked before finding the result, e.g. if the bar that spans between 2500 and 3500 is tall 280, it means 280 of the runs checked 2500-3500 addresses before finding the result).

1

with "skip boring":

2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why 1.6 milliseconds to choose an address would be a concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler approach to skipping already tried buckets might be something like this:

    int n = 1 << 8; // power of 2
    int initial_pos = random(n);
    int step = random(n/2) * 2 + 1; // odd number
    int i = initial_pos;
    do {
        if (check_bucket(i)) break;
        i = (i + step) % n; // oops, try a new bucket
    } while (i != initial_pos);

That doesn't need a full shuffle, still hits every bucket, and only hits each bucket once (provided step and n are co-prime).

Based on vasild's stats above, I think this is fine to merge without any improvement here, but it might be interesting to rebench with this approach -- performance should benefit a little from needing less randomness as well as avoiding repeated work...

If we wanted to making things actually reliably fast for cjdns-like networks lost in a sea of ipv4 addresses, then having separate addrman tables for each network might be worth exploring. That would probably help prevent an attacker from blocking valid ipv4 addresses with a mass of valid-but-sybiled tor addresses, which might be worthwhile.

@amitiuttarwar amitiuttarwar force-pushed the 2023-03-select-by-network branch from 25a64a2 to 09d5145 Compare March 11, 2023 17:05
@amitiuttarwar
Copy link
Contributor Author

thank you for the in-depth review @vasild ! I have addressed all your review comments (incorporated most, left questions on a couple)

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 09d5145

Suggestions remaining (non-blocker):

Thanks!

@DrahtBot DrahtBot requested a review from brunoerg March 15, 2023 14:36
amitiuttarwar and others added 10 commits March 17, 2023 17:59
Unused until later commit.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
in preparation for consolidating the logic for searching the new and tried
tables, generalize the call paths for both

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
-BEGIN VERIFY SCRIPT-
sed -i 's/fChanceFactor/chance_factor/g' src/addrman.cpp
sed -i 's/nBucketPos/initial_position/g' src/addrman.cpp
sed -i 's/nBucket/bucket/g' src/addrman.cpp src/addrman_impl.h
sed -i 's/newOnly/new_only/g' src/addrman.cpp src/addrman_impl.h src/addrman.h src/test/addrman_tests.cpp
-END VERIFY SCRIPT-

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Add an optional parameter to the addrman Select function that allows callers to
specify which network the returned address should be on. Ensure that the proper
table is selected with different cases of whether the new or tried table has
network addresses that match.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
this adds coverage for the 7 different cases of which table should be selected
when the network is specified. the different cases are the result of new_only
being true or false and whether there are network addresses on both, neither,
or one of new vs tried tables. the only case not covered is when new_only is
false and the only network addresses are on the new table.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
if an addr matching the network requirements is only on the new table and
select is invoked with new_only = false, ensure that the code selects the new
table.

in order to test this case, we use a non deterministic addrman. this means we
cannot have more than one address in any addrman table, or risk sporadic
failures when the second address happens to conflict.

if the code chose a table at random, the test would fail 50% of the time

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
to evaluate the worst case performance with the network parameter passed
through, fill the new table with addresses then add a singular I2P address to
retrieve

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
the addrman select function will demonstrate it's worst case performance when
it is almost empty, because it might have to linearly search several buckets.
add a bench test to cover this case

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@amitiuttarwar amitiuttarwar force-pushed the 2023-03-select-by-network branch from 09d5145 to 17e7054 Compare March 18, 2023 01:04
@amitiuttarwar
Copy link
Contributor Author

two small updates:

  • added assertions for bounds checking in GetEntry
  • added the suggested test


// Iterate over the positions of that bucket, starting at the initial one,
// and looping around.
int i;
Copy link
Contributor

@brunoerg brunoerg Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion to make the code a lit bit cleaner in case you have to touch it again, perhaps we don't need to create position and nId again after the loop.

diff --git a/src/addrman.cpp b/src/addrman.cpp
index cdfd079fc..30ce2cadc 100644
--- a/src/addrman.cpp
+++ b/src/addrman.cpp
@@ -757,10 +757,10 @@ std::pair<CAddress, NodeSeconds> AddrManImpl::Select_(bool new_only, std::option
 
         // Iterate over the positions of that bucket, starting at the initial one,
         // and looping around.
-        int i;
+        int i, position, node_id;
         for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {
-            int position = (initial_position + i) % ADDRMAN_BUCKET_SIZE;
-            int node_id = GetEntry(search_tried, bucket, position);
+            position = (initial_position + i) % ADDRMAN_BUCKET_SIZE;
+            node_id = GetEntry(search_tried, bucket, position);
             if (node_id != -1) {
                 if (network.has_value()) {
                     const auto it{mapInfo.find(node_id)};
@@ -777,9 +777,7 @@ std::pair<CAddress, NodeSeconds> AddrManImpl::Select_(bool new_only, std::option
         if (i == ADDRMAN_BUCKET_SIZE) continue;
 
         // Find the entry to return.
-        int position = (initial_position + i) % ADDRMAN_BUCKET_SIZE;
-        int nId = GetEntry(search_tried, bucket, position);
-        const auto it_found{mapInfo.find(nId)};
+        const auto it_found{mapInfo.find(node_id)};
         assert(it_found != mapInfo.end());
         const AddrInfo& info{it_found->second};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #27745

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 17e7054

Maybe other reviewers would be interested in the performance discussion:
#27214 (comment)

@DrahtBot DrahtBot requested a review from brunoerg March 25, 2023 05:15
Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 17e7054

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 17e7054

* @param[in] new_only Whether to only select addresses from the new table. Passing `true` returns
* an address from the new table or an empty pair. Passing `false` will return an
* address from either the new or tried table (it does not guarantee a tried entry).
* @param[in] network Select only addresses of this network (nullopt = all)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth documenting that passing in a network here rather than nullopt can make the search significantly slower (though still fast)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #27745

if (node_id != -1) {
if (network.has_value()) {
const auto it{mapInfo.find(node_id)};
assert(it != mapInfo.end());
Copy link
Contributor

@ajtowns ajtowns Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want an unconditional assertion failure here, vs say:

if (Assume(it != mapInfo.end()) && it->second.GetNetwork() == *network) break;

(maybe a similar question for GetEntry)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #27745

bool new_selected{false};
bool tried_selected{false};

while (!new_selected || !tried_selected) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like having:

int counter = 256;
while (--counter > 0 && (!new_selected || !tried_selected)) { ... }
BOOST_CHECK(new_selected);
BOOST_CHECK(tried_selected);

would be better here than an infinite loop leading to CI timeout on error. A one-in-2**256 chance of failure is already the security of bitcoin as a whole, after all; and if our randomness generator is biassed enough that it fails more than that, it's probably better that the test case fails anyway? That we use the deterministic addrman presumably means any failures here will always be deterministic, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #27745

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 17e7054

I reviewed the PR again - although I'm a coauthor, so not sure about the etiquette of ack'ing.

I think there is a small bug in the non-deterministic test that could make it fail on rare occasions and should be fixed - here or in a follow-up.

* @param[in] new_only Whether to only select addresses from the new table.
* @param[in] new_only Whether to only select addresses from the new table. Passing `true` returns
* an address from the new table or an empty pair. Passing `false` will return an
* address from either the new or tried table (it does not guarantee a tried entry).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mentioning the "empty pair" only in the first case makes it sound as if this wasn't one of the possible outcomes in the second case (but it is).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #27745

@amitiuttarwar
Copy link
Contributor Author

thanks for the reviews! I'll address the outstanding comments in a followup

@fanquake fanquake added this to the 26.0 milestone Apr 18, 2023
@fanquake
Copy link
Member

This will be merged after branch-off. It would be good to either get the follow up opened in advance, so we can avoid any time between merging, and the potential for intermittent CI failures, or, if that change is straight-forward enough, you could push an additional commit here, leaving the rest of the branch as-is.

@amitiuttarwar amitiuttarwar force-pushed the 2023-03-select-by-network branch from 03e996c to 17854ae Compare April 19, 2023 02:07
@amitiuttarwar
Copy link
Contributor Author

added an extra commit to ensure the non-deterministic test will not fail intermittently. will incorporate the outstanding feedback separately

@amitiuttarwar amitiuttarwar force-pushed the 2023-03-select-by-network branch 2 times, most recently from 5da828b to 17e7054 Compare April 19, 2023 17:35
@achow101 achow101 merged commit 3a93957 into bitcoin:master Apr 20, 2023
@amitiuttarwar
Copy link
Contributor Author

opened #27506 for the test fix

@maflcko maflcko removed the CI failed label Apr 21, 2023
fanquake added a commit to bitcoin-core/gui that referenced this pull request Apr 21, 2023
10a354f test: prevent intermittent failures (Amiti Uttarwar)

Pull request description:

  Follow up to #27214 - add an address to the tried table before the new table to make sure a new table collision is not possible.

ACKs for top commit:
  mzumsande:
    Code review ACK 10a354f - the fix is what I suggested [here](bitcoin/bitcoin#27214 (comment)) and should make these intermittent failures impossible.

Tree-SHA512: 24099f02e1915395130065af0ef6a2a1893955d222517d156d928765541d9c427da00172a9b5a540163f4d6aae93ca3882e8267eeb35ecc595d42178abc6191c
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 21, 2023
17e7054 doc: clarify new_only param for Select function (Amiti Uttarwar)
b0010c8 bench: test select for a new table with only one address (Amiti Uttarwar)
9b91aae bench: add coverage for addrman select with network parameter (Amiti Uttarwar)
22a4d14 test: increase coverage of addrman select (without network) (Amiti Uttarwar)
a98e542 test: add addrman test for special case (Amiti Uttarwar)
5c8b4ba tests: add addrman_select_by_network test (Amiti Uttarwar)
6b22928 addrman: add functionality to select by network (Amiti Uttarwar)
26c3bf1 scripted-diff: rename local variables to match modern conventions (Amiti Uttarwar)
4880641 refactor: consolidate select logic for new and tried tables (Amiti Uttarwar)
ca2a9c5 refactor: generalize select logic (Amiti Uttarwar)
052fbcd addrman: Introduce helper to generalize looking up an addrman entry (Amiti Uttarwar)
9bf078f refactor: update Select_ function (Amiti Uttarwar)

Pull request description:

  For the full context & motivation of this patch, see bitcoin#27213

  This is joint work with mzumsande.

  This PR adds functionality to `AddrMan::Select` to enable callers to specify a network they are interested in.

  Along the way, it refactors the function to deduplicate the logic, updates the local variables to match modern conventions, adds test coverage for both the new and existing `Select` logic, and adds bench tests for the worst case performance of both the new and existing `Select` logic.

  This functionality is used in the parent PR.

ACKs for top commit:
  vasild:
    ACK 17e7054
  brunoerg:
    re-ACK 17e7054
  ajtowns:
    ACK 17e7054
  mzumsande:
    Code Review ACK 17e7054

Tree-SHA512: e99d1ce0c44a15601a3daa37deeadfc9d26208a92969ecffbea358d57ca951102d759734ccf77eacd38db368da0bf5b6fede3cd900d8a77b3061f4adc54e52d8
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 21, 2023
10a354f test: prevent intermittent failures (Amiti Uttarwar)

Pull request description:

  Follow up to bitcoin#27214 - add an address to the tried table before the new table to make sure a new table collision is not possible.

ACKs for top commit:
  mzumsande:
    Code review ACK 10a354f - the fix is what I suggested [here](bitcoin#27214 (comment)) and should make these intermittent failures impossible.

Tree-SHA512: 24099f02e1915395130065af0ef6a2a1893955d222517d156d928765541d9c427da00172a9b5a540163f4d6aae93ca3882e8267eeb35ecc595d42178abc6191c
achow101 added a commit that referenced this pull request Jun 30, 2023
cd8ef5b test: ensure addrman test is finite (Amiti Uttarwar)
b9f1e86 addrman: change asserts to Assumes (Amiti Uttarwar)
7687707 doc: update `Select` function description (Amiti Uttarwar)
2b6bd12 refactor: de-duplicate lookups (Amiti Uttarwar)

Pull request description:

  this PR addresses outstanding review comments from #27214

ACKs for top commit:
  achow101:
    ACK cd8ef5b
  mzumsande:
    Code Review ACK cd8ef5b
  brunoerg:
    crACK cd8ef5b

Tree-SHA512: 669f67904263e3f51c39b175eabf5fa1b1e7b6841e889656afec33d0bd93fb446de9403f0a91b186ddeaf29498c8938484a0547b1188256c4e7c90db6f30bb55
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 1, 2023
cd8ef5b test: ensure addrman test is finite (Amiti Uttarwar)
b9f1e86 addrman: change asserts to Assumes (Amiti Uttarwar)
7687707 doc: update `Select` function description (Amiti Uttarwar)
2b6bd12 refactor: de-duplicate lookups (Amiti Uttarwar)

Pull request description:

  this PR addresses outstanding review comments from bitcoin#27214

ACKs for top commit:
  achow101:
    ACK cd8ef5b
  mzumsande:
    Code Review ACK cd8ef5b
  brunoerg:
    crACK cd8ef5b

Tree-SHA512: 669f67904263e3f51c39b175eabf5fa1b1e7b6841e889656afec33d0bd93fb446de9403f0a91b186ddeaf29498c8938484a0547b1188256c4e7c90db6f30bb55
@bitcoin bitcoin locked and limited conversation to collaborators May 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants