-
Notifications
You must be signed in to change notification settings - Fork 37.7k
truc: optimize the in package relation calculation #33062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33062. ReviewsSee the guideline for information on the review process. ConflictsNo conflicts as of last run. |
/** Helper for PackageTRUCChecks: Returns a vector containing the indices of transactions (within | ||
* package) that are direct parents of ptx. */ | ||
std::vector<size_t> FindInPackageParents(const Package& package, const CTransactionRef& ptx) | ||
std::vector<std::vector<size_t>> BuildInPackageRelations(const Package& package) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider using DepGraph instead of a matrix of indices (it's basically the same thing but more compact)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider using DepGraph instead of a matrix of indices (it's basically the same thing but more compact)?
Sure, I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glozow Hi glozow, seems DepGraph stores all ancestors of a tx, while here:
if (mempool_ancestors.size() + in_package_parents.size() + 1 > TRUC_ANCESTOR_LIMIT)
shows PackageTRUCChecks()
want the parent (the direct ancestor).
Using DepGraph changes the semantics of this function. Any hints about the above code: Do we actually want mempool_ancestors.size() + in_package_parents.size()
or mempool_ancestors.size + in_package_ancestors.size()
here ?
Using the Bitset directly is an alternative way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I don't think this needs to be direct parents, it's just implemented this way because the limit is 2.
As a more general note - thanks for working on this and I don't want to be discouraging, but I'm not sure that a marginal improvement to this small function is worth this much effort. I see you have some other PRs that might be more impactful, so maybe focus on those? Maybe we can come back to this in the future if it helps with other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I don't think this needs to be direct parents, it's just implemented this way because the limit is 2.
As a more general note - thanks for working on this and I don't want to be discouraging, but I'm not sure that a marginal improvement to this small function is worth this much effort. I see you have some other PRs that might be more impactful, so maybe focus on those? Maybe we can come back to this in the future if it helps with other things.
Sure, thanks.
In PackageTRUCChecks(), we calculate the in-package-parents for an in package tx, which results in the total time complexisity being O(n^2), n is the number of Txs in the package. Let's precompute the overall relationships between all transactions to make it be O(n). Signed-off-by: Hao Xu <hao.xu@linux.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this isn't a bottleneck, optimizing it without a benchmark or an easily reproducible usecase seems like it's a solution begging for a problem - I speak from personal experience, that's how I started as well :)
I left a few comments while reviewing anyway, but without a concept ack on the idea itself from those who are more familiar with it, it likely won't get merged.
if (&(*tx) == &(*ptx)) break; | ||
if (possible_parents.count(tx->GetHash())) { | ||
in_package_parents.push_back(i); | ||
pindexes.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for reusing the set here? This will just create a dependency between the loops. If you have some benchmarks, it would be good to double-check, but this might be simpler (and maybe even faster):
pindexes.clear(); | |
std::unordered_set<size_t> pindexes; | |
pindexes.reserve(tx->vin.size()); // TODO or whatever size you expect |
} | ||
std::copy(pindexes.begin(), pindexes.end(), std::back_inserter(relations[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could probably use the newer version here:
std::copy(pindexes.begin(), pindexes.end(), std::back_inserter(relations[i])); | |
std::ranges::copy(pindexes, std::back_inserter(relations[i])); |
I haven't checked the usages, but why are we storing these in a vector
in the first place if they have a uniqueness constraint?
if (possible_parents.count(input.prevout.hash)) | ||
pindexes.insert(possible_parents[input.prevout.hash]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++20 allows ::contains
(+ nit: braces):
if (possible_parents.count(input.prevout.hash)) | |
pindexes.insert(possible_parents[input.prevout.hash]); | |
if (possible_parents.contains(input.prevout.hash)) { | |
pindexes.insert(possible_parents[input.prevout.hash]); | |
} |
Or even better, to avoid double hashing (via existence check first, and getting the value, second), we might as well:
if (possible_parents.count(input.prevout.hash)) | |
pindexes.insert(possible_parents[input.prevout.hash]); | |
if (const auto it{possible_parents.find(input.prevout.hash)}; it != possible_parents.end()) { | |
pindexes.insert(it->second); | |
} |
|
||
/** Helper for PackageTRUCChecks: Returns a vector containing the indices of transactions (within | ||
* package) that are direct parents of ptx. */ | ||
std::vector<size_t> FindInPackageParents(const size_t tx_index, const auto& relations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't usually auto
parameters and not sure why we're copying the result:
std::vector<size_t> FindInPackageParents(const size_t tx_index, const auto& relations) | |
inline const std::vector<size_t>& FindInPackageParents(size_t tx_index, const std::vector<std::vector<size_t>>& relations) | |
{ | |
return relations[tx_index]; | |
} |
Thanks, l0rinc. I'll leave this patch alone for now. |
In PackageTRUCChecks(), we calculate the in-package-parents for an in package tx, which results in the total time complexisity being O(n^2), n is the number of Txs in the package. Let's precompute the overall relationships between all transactions to make it be O(n).