-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Several changes to mruset #6064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The HMAC for address inventory hashing is absolute overkill, but was easy to write. I doubt the performance matters. |
Rebased on top of @gavinandresen's #6066 (so that I don't need the address hasher here). Removed the boost::unordered_set usage for now, as it is incompatible with older boost versions (which don't have unordered_set::reserve). |
{ | ||
set.clear(); | ||
queue.clear(); | ||
std::vector<iterator>(nMaxSizeIn, set.end()).swap(order); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably just an old fuddy-duddy...
... but I find the "use swap to assign" idiom hard to review.
Why not just:
order.assign(nMaxSizeIn, set.end());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's horrible, but it's the only way to make a vector release its memory, afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK too. Though this is only important if we expect the size to change (become smaller) dynamically.
I'm not sure I like having clear()
double as set-maximum-size method. I'd prefer the maximum size to be passed to the constructor and be static over the lifetime of the object. If that's not possible, let's add a special resize()
call for resizing (that happens to clear the structure too).
Code review ACK-- note I changed the CRollingBloomFilter implementation commit in #6066 . Measured heap usage on OSX for a 1,000-entry mruset : 72,192 bytes. I'm running a node with these changes now. |
Rebased on updated #6066, and addressed a nit. |
@gavinandresen I measure around a 50 byte overhead per mruset entry now. Before, it was 50 bytes + double storage of the elements. |
Heading in src/mruset.h should be: |
@sandakersmann Why? It's only worked on in 2012 and 2015. |
@sipa Last update was in 2014. |
@sandakersmann Ok, fixed. |
@@ -40,6 +40,17 @@ nFlags(nFlagsIn) | |||
{ | |||
} | |||
|
|||
// Private constructor used by CRollingBloomFilter | |||
CBloomFilter::CBloomFilter(unsigned int nElements, double nFPRate, unsigned int nTweakIn) : | |||
vData((unsigned int)(-1 / LN2SQUARED * nElements * log(nFPRate)) / 8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indentation
For when you need to keep track of the last N items you've seen, and can tolerate some false-positives. Rebased-by: Pieter Wuille <pieter.wuille@gmail.com>
ab42414
to
2fceed4
Compare
Made several changes:
I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for multiple filters, and checking several filters for a 'contains'. I'll check the math and write a separate PR for that if found useful. |
Use a probabilistic bloom filter to keep track of which addresses we think we have given our peers, instead of a list. This uses much less memory, at the cost of sometimes failing to relay an address to a peer-- worst case if the bloom filter happens to be as full as it gets, 1-in-1,000. Measured memory usage of a full mruset setAddrKnown: 650Kbytes Constant memory usage of CRollingBloomFilter addrKnown: 37Kbytes. This will also help heap fragmentation, because the 37K of storage is allocated when a CNode is created (when a connection to a peer is established) and then there is no per-item-remembered memory allocation. I plan on testing by restarting a full node with an empty peers.dat, running a while with -debug=addrman and -debug=net, and making sure that the 'addr' message traffic out is reasonable. (suggestions for better tests welcome)
@sipa: There's a good survey paper by Sasu Tarkoma et al: "Theory and Practice of Bloom Filters for Distributed Systems." Which pointed me to: https://home.kookmin.ac.kr/~mkyoon/publications/2010TKDE.pdf |
ACK |
@sipa sounds great to me, although let's do that in a later pull, this is already a great improvement compared to the current state, so let's get this reviewed and merged. |
Tested ACK |
6f41b3e [QA] Missing mempool sync in pos_coldStaking and zc_publicspends tests (random-zebra) efaf727 net: correctly initialize nMinPingUsecTime (Wladimir J. van der Laan) 61c8ffe Do not add random inbound peers to addrman. (Gregory Maxwell) e6a1726 Added feeler connections increasing good addrs in the tried table. (Ethan Heilman) 070b6fb Actually only use filterInventoryKnown with MSG_TX inventory messages. (Gregory Maxwell) 9ac6b28 Only use filterInventoryKnown with MSG_TX inventory messages. (Patick Strateman) 01273db Rename setInventoryKnown filterInventoryKnown (Patick Strateman) 4f11eb2 Remove mruset as it is no longer used. (Gregory Maxwell) 2e3b05c Replace setInventoryKnown with a rolling bloom filter. (Gregory Maxwell) 409aa83 Replace trickle nodes with per-node/message Poisson delays (Pieter Wuille) 93e8c46 Move recentRejects initialization to top of InitBlockIndex (Wladimir J. van der Laan) 9a59420 Keep track of recently rejected transactions (Peter Todd) 34ee777 Only use randomly created nonces in CRollingBloomFilter. (Pieter Wuille) 338d346 Make CRollingBloomFilter set nTweak for you (Peter Todd) dcd15bc Reuse vector hashing code for uint256 (Pieter Wuille) 3230143 Add uint256 support to CRollingBloomFilter (Peter Todd) 128d644 Better mruset unit test (Pieter Wuille) 89740ed Use ring buffer of set iterators instead of deque of copies in mruset (Pieter Wuille) 14c88ee Replace mruset setAddrKnown with CRollingBloomFilter addrKnown (Gavin Andresen) e0bebbd Rolling bloom filter class (Gavin Andresen) 7c03bd5 Add correct bool combiner for net signals (Pieter Wuille) 5cb5fd6 Stop exporting ConnectNode (Fuzzbawls) 819295d Stop using ConnectNode in layer 2 code (Fuzzbawls) 851b6b4 net: No need to export DumpBanlist (Cory Fields) 4486d4e net: make Ban/Unban/ClearBan functionality consistent (Cory Fields) a5e278d net: Drop CNodeRef for AttemptToEvictConnection (Cory Fields) 9fd357d net: use the exposed GetNodeSignals() rather than g_signals directly (Cory Fields) 7962bcc net: remove unused set (Cory Fields) fabf358 Use network group instead of CNetAddr in final pass to select node to disconnect (Patrick Strateman) 7f030fe Fix comment (Patrick Strateman) 18af800 Acquire cs_vNodes before changing refrence counts (Patrick Strateman) 7aa827f CNodeRef copy constructor and assignment operator (Patrick Strateman) b3f95e7 Return false early if vEvictionCandidates is empty (Patrick Strateman) 85886c9 Better support for nodes with non-standard nMaxConnections (Patrick Strateman) 9c9e55b RAII wrapper for CNode* (Patrick Strateman) e92780d Add comments to AttemptToEvictConnection (Patrick Strateman) 0ca7ce3 Prefer to disconnect peers in favor of whitelisted peers (Patrick Strateman) a1c4aaf AttemptToEvictConnection (Patrick Strateman) aa7ce9b Record nMinPingUsecTime (Patrick Strateman) fd7bab0 Refactor: Move failure conditions to the top of AcceptConnection (Patrick Strateman) fcb732b Refactor: Bail early in AcceptConnection (Patrick Strateman) 411766d Refactor: AcceptConnection (Patrick Strateman) Pull request description: This is a culmination of several upstream PRs touching the P2P/Networking code, and resulting in a state just prior to the P2P/Network encapsulation, which will be it's own PR. Backported upstream PRs included here: - bitcoin#5859 - bitcoin#6064 - bitcoin#6374 - bitcoin#6498 - bitcoin#6636 - bitcoin#7133 - bitcoin#7125 - bitcoin#7906 - bitcoin#8282 - bitcoin#8594 ACKs for top commit: random-zebra: Great job. ACK 6f41b3e and merging... furszy: Have been running it the past days and all is looking good, ACK 6f41b3e . Tree-SHA512: 1cc4b1271516f000a06141b5b069f3ee00f3eb77056e40d2c021c484f749d9d8db2b76ce490f63572372705b646fad342666f6f90ca5fc69abcacf7b207d058f
Including:
This reduces the memory consumption of a maximal setAddrKnown by +- half.