-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Replace cluster linearization algorithm with SFL #32545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32545. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
3b7477b
to
b920e76
Compare
1c6bb72
to
df589da
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
7d5e4dc
to
23072f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
@@ -539,492 +555,651 @@ class LinearizationChunking | |||
} | |||
}; | |||
|
|||
/** Class encapsulating the state needed to find the best remaining ancestor set. | |||
/** Class to represent the internal state of the spanning-forest linearization algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the excellent doxygen documentation here.
9ee20ca
to
ba7464a
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
55931c3
to
47bdf8f
Compare
64-bit userspace (AArch64) $ file bitcoind
bitcoind: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=7e059ec01f7460042910ca4ed15270382269c9d5, for GNU/Linux 3.7.0, with debug_info, not stripped additional details$ getconf LONG_BIT
64
$ dpkg --print-architecture
arm64
$ uname -m
aarch64 |
Rebased on top of #30605. |
8b3968f
to
505bc96
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
…ature) This replaces the existing LIMO linearization algorithm (which internally uses ancestor set finding and candidate set finding) with the much more performant spanning-forest linearization algorithm. See https://delvingbitcoin.org/t/spanning-forest-cluster-linearization/1419
…nup) This removes the candidate set finding classes, as well as related tests and benchmarks for them.
…imization) This avoids the need for a loop over all parents of a transaction while walking a chunk, and removes the need to store the set of parent dependencies explicitly.
This introduces the notion of gain to the SFL algorithm. Given a chunk c, an active dependency d in it, and the chunks (t, b) that c would split into if d were deactivated, the gain is defined as either (they are equivalent): (feerate(t) - feerate(b)) * size(t) * size(b) fee(t) * size(b) - fee(b) * size(t) It happens to also be equal to these: (feerate(t) - feerate(c)) * size(t) * size(c) fee(t) * size(c) - fee(c) * size(t) Its relevance is that this metric is proportional to a lower bound on the area under the fee-size diagram which would be gained IF a deactivation of d does not result in a self-merge of t and b again. This commit adds logic to find, within each chunk, the dependency with the highest gain. In benchmarks, this appears to be a very good heuristic for deciding which splits are worth making.
…r (optimization) This reduces the number of allocations required inside the SFL algorithm, and works because the number of dependencies per transaction is at most n-1. To minimize the memory usage from this pre-allocation (which might impact memory locality), change the data type of DepIdx from uint32_t to uint8_t or uint16_t when possible.
…ptimization) Within the per-transaction child dependency list, keep the active ones before all inactive ones. This improves the complexity over iterating over active dependencies from O(m) to O(n), as at most n-1 dependencies can be active within any given chunk at any given time.
This distributes the work over the various chunks fairly, and simultaneously avoids retrying chunks over and over again which are already known to be optimal.
Out of an abundance of caution that adversarially-constructed clusters might reliably result in bad chunk split decisions with the maximum-gain strategy, make every third consecutive attempt to split the same chunk use a random strategy instead.
We do not need to actually keep track of whether a dependency is active or not; it is implied by whether or not it appears within the active prefix of its parent's child_deps, and its child's parent_deps. Just remove setting and checking it.
This adds a rough estimate of algorithm runtime, so it can be interrupted if no solution is found in time. Due to inherent differences between platforms, this will not be extremely accurate, but it is preferable over directly measuring time for consistency.
After the normal optimization process finishes, and finds an optimal spanning forest, run a second process (while computation budget remains) to split chunks into minimal equal-feerate chunks. As a side-effect, this also guarantees that the optimal chunk order is deterministic.
Part of cluster mempool: #30289. Based on #30605.
This replaces the cluster linearization algorithm introduced in #30126 and #30286 (a combination of LIMO with candidate-set search), with a completely different algorithm: spanning-forest linearization, which appears to have much better performance for hard clusters. See this post for a comparison between various linearization algorithms, and this post for benchmarks comparing them. Replaying historical mempool data on it shows that it can effectively linearize every observed cluster up to 64 transactions optimally within tens of microseconds, though pathological examples can be created which take longer.
The algorithm is effectively a very specialized version of the simplex algorithm to the problem of finding high-feerate topological subsets of clusters, but modified to find all consecutive such subsets concurrently rather than just the first one. See the post above for how it is related.
It represents the cluster as partitioned into a set of chunks, each with a spanning tree of its internal dependencies connecting the transactions. Randomized improvements are made by selecting dependencies to add and remove to these spanning trees, merging and splitting chunks, until no more improvements are possible. Like simplex, it does not necessarily make progress in every step, and thus has no upper bound on its runtime, but randomization makes long runtimes very unlikely, and additionally makes it hard to adversarially construct clusters in which the algorithm reliably makes bad choices.