-
Notifications
You must be signed in to change notification settings - Fork 37.7k
wallet: Prefer full destination groups in coin selection #17824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
The reported issue seems like things are not working as they should for sure, but I don't see how this code fixes it. Ultimately, if I want to send 10.5 bitcoin to you, and I have 500 UTXO:s of 1 BTC each all sending to A, the grouping part should result in 50 groups of 10 random UTXO:s each (each group worth 10 BTC), and two of these should be selected when sending, and ALL of the outputs should be marked as used afterwards. Your code seems to change this behavior to only result in one 10 BTC group per destination, which means the above send will result in "insufficient funds". |
I investigated this further. See #17603 (comment) |
Thanks, that's what I found as well, I should have described it in greater detail.
My code does handle this case correctly. You describe the behavior that would occur if I had implemented it as @achow101 suggests. The "full groups" (10 outputs) are already pushed into However, your comment made me realize another shortcoming of the current code: If there are 11 outputs of 1 BTC and we want to send 10.5 BTC then we see the insufficient funds error because we are "forgetting" the incomplete group. To fix this I would suggest implementing the following: the last, incomplete group, is pushed into one of the full groups, resulting in one of the groups being up to 19 outputs large. "Worst case" scenario that I see: we want to send a very small amount from the destination and we have exactly 19 outputs, then we would use all the 19 outputs. I think this should be ok? I will wait for your feedback before pushing the code for it. See adapted tests I ran with the current code for the examples discussed above: https://gist.github.com/fjahr/d56b68e58b4275ef52cd5f4eb50b1d4e |
I'm not sure this is the right approach here. It seems like marking the output groups as 'partial' when there are >10 outputs would address this. A simple approach would be to explicitly think of <max groups for multi-group destinations as "0-conf". The current code would then actively try to avoid using it except as a last resort. Edit: I didn't test, but I assume your example with "not working" above is for your patch, not the current master. |
I would do something like this diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
index 3954f6626..3c88ff9c0 100644
--- a/src/wallet/wallet.cpp
+++ b/src/wallet/wallet.cpp
@@ -3976,6 +3976,7 @@ bool CWalletTx::IsImmatureCoinBase() const
std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outputs, bool single_coin) const {
std::vector<OutputGroup> groups;
std::map<CTxDestination, OutputGroup> gmap;
+ std::set<CTxDestination> gfull;
CTxDestination dst;
for (const auto& output : outputs) {
if (output.fSpendable) {
@@ -3990,6 +3991,7 @@ std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outpu
if (gmap[dst].m_outputs.size() >= OUTPUT_GROUP_MAX_ENTRIES) {
groups.push_back(gmap[dst]);
gmap.erase(dst);
+ gfull.insert(dst);
}
gmap[dst].Insert(input_coin, output.nDepth, output.tx->IsFromMe(ISMINE_ALL), ancestors, descendants);
} else {
@@ -3998,7 +4000,16 @@ std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outpu
}
}
if (!single_coin) {
- for (const auto& it : gmap) groups.push_back(it.second);
+ for (auto& it : gmap) {
+ auto& group = it.second;
+ if (gfull.count(it.first) > 0 && group.m_outputs.size() < OUTPUT_GROUP_MAX_ENTRIES) {
+ // make this unattractive as we want coin selection to avoid it if possible
+ group.m_depth = 0;
+ group.m_from_me = false;
+ group.m_ancestors = 9999;
+ }
+ groups.push_back(group);
+ }
}
return groups;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: I didn't test, but I assume your example with "not working" above is for your patch, not the current master.
Yeah, that's right.
I would do something like this
That is a good idea, I did not think of that possibility before. But I still see a problem there: if there are flags activated like -walletrejectlongchains
or -txconfirmtarget
then we will have the same issue as with my code described above. The incomplete group will be ignored and the user may get an insufficient funds error when there are actually enough funds available. And since DEFAULT_TX_CONFIRM_TARGET
is set to 6 I think this would happen for every wallet that has not set this value to 0. If only using group.m_ancestors = 9999;
at least this will work for the default wallet configuration. I am not sure how widely -walletrejectlongchains
is used.
Honestly, I am not sure yet which solution I prefer. The upsides of my described approach:
- No matter what the configuration of the wallet is, there will not be a false "Insufficient funds" error.
- I think it does a better job of using funds from the same address together.
An example to explain 2.:
- A wallet has 11 outputs of 1 BTC each at
reused_address
- It also has 1 output of 1 BTC at
not_reused_address
- The user wants to send 10.5 BTC and uses
avoid_reuse
- With the
m_ancestors
approach, the wallet would send 10 BTC fromreused_address
and 1 BTC fromnot_reused_address
, indicating these addresses could be controlled by the same user - With my suggestion, there would be 11 BTC group only from
reused_address
selected for the inputs
This stands against the downsides of my approach:
- At worst we are using
2 * max - 1
inputs whenmax
could have been enough. - It is a more complicated change than yours.
Let me know if I am missing something. I am not sure how much weight each of these arguments carry. I think the perfect solution would only be possible by making a distinction between <max groups and max groups in coin selection and that seems like overkill given the impact of the issue.
I am pushing an updated version implementing your solution with only m_ancestors
and and an improved test and hope to get some more feedback. But I am happy to make further changes.
You're probably right that it would not end up using the final group even if it could. I think that's okay, in this case. As for the |
a0434b3
to
ceeab6c
Compare
Rebased and implemented using |
6fc554f wallet: Reset reused transactions cache (Fabian Jahr) Pull request description: Fixes #17603 (together with #17824) `getbalances` is using the cache within `GetAvailableCredit` under certain conditions [here](https://github.com/bitcoin/bitcoin/blob/35fff5be60e853455abc24713481544e91adfedb/src/wallet/wallet.cpp#L1826). For a wallet with `avoid_reuse` activated this can lead to inconsistent reporting of `used` transactions/balances between `getbalances` and `listunspent` as pointed out in #17603. When an address is reused before the first transaction is spending from this address, the cache is not updated even after the transaction is sent. This means the remaining outputs at the reused address are not showing up as `used` in `getbalances`. With this change, any newly incoming transaction belonging to the wallet marks all the other outputs at the same address as dirty. ACKs for top commit: kallewoof: Code review re-ACK 6fc554f promag: ACK 6fc554f. achow101: Re-ACK 6fc554f meshcollider: Code review ACK 6fc554f Tree-SHA512: c4cad2c752176d16d77b4a4202291d20baddf9f27250896a40274d74a6945e0f6b34be04c2f9b1b2e756d3ac669b794969df8f82a98e0b16f10e92f276649ea2
6fc554f wallet: Reset reused transactions cache (Fabian Jahr) Pull request description: Fixes bitcoin#17603 (together with bitcoin#17824) `getbalances` is using the cache within `GetAvailableCredit` under certain conditions [here](https://github.com/bitcoin/bitcoin/blob/35fff5be60e853455abc24713481544e91adfedb/src/wallet/wallet.cpp#L1826). For a wallet with `avoid_reuse` activated this can lead to inconsistent reporting of `used` transactions/balances between `getbalances` and `listunspent` as pointed out in bitcoin#17603. When an address is reused before the first transaction is spending from this address, the cache is not updated even after the transaction is sent. This means the remaining outputs at the reused address are not showing up as `used` in `getbalances`. With this change, any newly incoming transaction belonging to the wallet marks all the other outputs at the same address as dirty. ACKs for top commit: kallewoof: Code review re-ACK 6fc554f promag: ACK 6fc554f. achow101: Re-ACK 6fc554f meshcollider: Code review ACK 6fc554f Tree-SHA512: c4cad2c752176d16d77b4a4202291d20baddf9f27250896a40274d74a6945e0f6b34be04c2f9b1b2e756d3ac669b794969df8f82a98e0b16f10e92f276649ea2
Concept ACK. Can this PR have a Reviewers, review club notes and meeting logs of both sessions are here: https://bitcoincore.reviews/17824 Some of the suggestions from the sessions:
(Also, there seemed to be some agreement in the sessions to consider raising |
unsigned int limit_ancestor_count; | ||
unsigned int limit_descendant_count; | ||
chain().getPackageLimits(limit_ancestor_count, limit_descendant_count); | ||
size_t max_ancestors = (size_t)std::max<int64_t>(1, limit_ancestor_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware this is moved code, but any idea why max<int64_t>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I did some digging and the type was introduced here and prior to that a uint64
was first introduced here in this context. I can only speculate that since this is a value from user input it was supposed to be as forgiving as possible if a ridiculously high number is provided, but even that does not make too much sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have been a way to safeguard against a user defined value that was negative, but the switch to getPackageLimits
removed that safeguard. Not sure if that was wise or not, but that's a separate question.
55c4446
to
bf4a7ba
Compare
Thanks for all the feedback! I should have addressed all the points here and also what was discussed in the review club. Also I tried to improve the naming of this PR/commit and expanded the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 9dcafb0
(Sorry about the horribly named test functions, btw. Thanks for fixing them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 9dcafb0 code review and built/tested on the branch and rebased on master. Tried different values in the new tests to verify they hold with different edge cases.
A few comments below but please ignore to not lose review unless you need to retouch this. Thank you for improving the first commit message -- it's very good now; two nits if you have to retouch: s/use/user/ and s/smallet/smaller/. Finally, consider updating the PR description with information from the excellent first commit message.
// number of entries, to protect against inadvertently creating | ||
// a too-large transaction when using -avoidpartialspends to | ||
// prevent breaking consensus or surprising users with a very | ||
// high amount of fees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest: s/a very high amount of fees./high fees./
note this feedback by @xekyo mitigating the high fee motivation here: https://bitcoincore.reviews/17824.html#l-329
for (auto& it : gmap) { | ||
auto& group = it.second; | ||
if (full_groups.count(it.first) > 0) { | ||
// Make this unattractive as we want coin selection to avoid it if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could perhaps expand on this documentation, like in your commit message, along the lines of:
- // Make this unattractive as we want coin selection to avoid it if possible
+ // By assigning it one less than the maximum number of ancestors
+ // allowed for this wallet, we move this smaller subgroup to the
+ // bottom of the priority list for the coin selection algorithm.
// Make this unattractive as we want coin selection to avoid it if possible | ||
group.m_ancestors = max_ancestors - 1; | ||
} | ||
groups.push_back(group); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it might be better to construct rather than copy?
- groups.push_back(it->second);
+ groups.emplace_back(it->second);
it->second = OutputGroup{};
- full_groups.insert(dst);
+ full_groups.emplace(dst);
@@ -4208,7 +4208,7 @@ std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outpu
// Make this unattractive as we want coin selection to avoid it if possible
group.m_ancestors = max_ancestors - 1;
}
- groups.push_back(group);
+ groups.emplace_back(group);
When a wallet uses avoid_reuse and has a large number of outputs in a single destination, it groups these outputs in OutputGroups that are no larger than OUTPUT_GROUP_MAX_ENTRIES. The goal is to spend as many outputs as possible from the destination while not breaking consensus due to a huge number of inputs and also not surprise the use with high fees. If there are n outputs in a destination and n > OUTPUT_GROUP_MAX_ENTRIES then this results in one or many groups of size OUTPUT_GROUP_MAX_ENTRIES and possibly one group of size < OUTPUT_GROUP_MAX_ENTRIES. Prior to this commit the coin selection in the case where n > OUTPUT_GROUP_MAX_ENTRIES was skewed towards the one group of size < OUTPUT_GROUP_MAX_ENTRIES if it exists and the amount to be spent by the transaction is smaller than the aggregate of those of the group size < OUTPUT_GROUP_MAX_ENTRIES. The reason is that the coin selection decides between the different groups based on fees and mostly the smaller group will cause smaller fees. The behavior that users of the avoid_reuse flag seek is that the full groups of size OUTPUT_GROUP_MAX_ENTRIES get used first. This commit implements this by pretending that the small group has a large number of ancestors (one smallet than the maximum allowed for this wallet). This dumps the small group to the bottom of the list of priorities in the coin selection algorithm.
Rebased, no code changes. |
Re-ACK a2324e4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK a2324e4
ACK a2324e4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK a2324e4 (verified the new test fails on master without this change)
…election a2324e4 test: Improve naming and logging of avoid_reuse tests (Fabian Jahr) 1abbdac wallet: Prefer full destination groups in coin selection (Fabian Jahr) Pull request description: Fixes bitcoin#17603 (together with bitcoin#17843) In the case of destination groups of >10 outputs existing in a wallet with `avoid_reuse` enabled, the grouping algorithm is adding left-over outputs as an "incomplete" group to the list of groups even when a full group has already been added. This leads to the strange behavior that if there are >10 outputs for a destination the transaction spending from that will effectively use `len(outputs) % 10` as inputs for that transaction. From the original PR and the code comment I understand the correct behavior should be the usage of 10 outputs. I opted for minimal changes in the current code although there maybe optimizations possible for cases with >20 outputs on a destination this sounds like too much of an edge case right now. ACKs for top commit: jonatack: Re-ACK a2324e4 achow101: ACK a2324e4 kallewoof: ACK a2324e4 meshcollider: Tested ACK a2324e4 (verified the new test fails on master without this change) Tree-SHA512: 4743779c5d469fcd16df5baf166024b1d3c8eaca151df1e8281b71df62b29541cf7bfee3f8ab48d83e3b34c9256e53fd38a7b146a54c79f9caa44cce3636971a
Summary: * wallet: Prefer full destination groups in coin selection When a wallet uses avoid_reuse and has a large number of outputs in a single destination, it groups these outputs in OutputGroups that are no larger than OUTPUT_GROUP_MAX_ENTRIES. The goal is to spend as many outputs as possible from the destination while not breaking consensus due to a huge number of inputs and also not surprise the use with high fees. If there are n outputs in a destination and n > OUTPUT_GROUP_MAX_ENTRIES then this results in one or many groups of size OUTPUT_GROUP_MAX_ENTRIES and possibly one group of size < OUTPUT_GROUP_MAX_ENTRIES. Prior to this commit the coin selection in the case where n > OUTPUT_GROUP_MAX_ENTRIES was skewed towards the one group of size < OUTPUT_GROUP_MAX_ENTRIES if it exists and the amount to be spent by the transaction is smaller than the aggregate of those of the group size < OUTPUT_GROUP_MAX_ENTRIES. The reason is that the coin selection decides between the different groups based on fees and mostly the smaller group will cause smaller fees. The behavior that users of the avoid_reuse flag seek is that the full groups of size OUTPUT_GROUP_MAX_ENTRIES get used first. This commit implements this by pretending that the small group has a large number of ancestors (one smallet than the maximum allowed for this wallet). This dumps the small group to the bottom of the list of priorities in the coin selection algorithm. * test: Improve naming and logging of avoid_reuse tests This is a backport of Core [[bitcoin/bitcoin#17824 | PR17824]] Test Plan: ninja all check-all Reviewers: #bitcoin_abc, jasonbcox Reviewed By: #bitcoin_abc, jasonbcox Subscribers: jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D7834
6fc554f wallet: Reset reused transactions cache (Fabian Jahr) Pull request description: Fixes bitcoin#17603 (together with bitcoin#17824) `getbalances` is using the cache within `GetAvailableCredit` under certain conditions [here](https://github.com/bitcoin/bitcoin/blob/35fff5be60e853455abc24713481544e91adfedb/src/wallet/wallet.cpp#L1826). For a wallet with `avoid_reuse` activated this can lead to inconsistent reporting of `used` transactions/balances between `getbalances` and `listunspent` as pointed out in bitcoin#17603. When an address is reused before the first transaction is spending from this address, the cache is not updated even after the transaction is sent. This means the remaining outputs at the reused address are not showing up as `used` in `getbalances`. With this change, any newly incoming transaction belonging to the wallet marks all the other outputs at the same address as dirty. ACKs for top commit: kallewoof: Code review re-ACK 6fc554f promag: ACK 6fc554f. achow101: Re-ACK 6fc554f meshcollider: Code review ACK 6fc554f Tree-SHA512: c4cad2c752176d16d77b4a4202291d20baddf9f27250896a40274d74a6945e0f6b34be04c2f9b1b2e756d3ac669b794969df8f82a98e0b16f10e92f276649ea2
e6fe1c3 rpc: Improve avoidpartialspends and avoid_reuse documentation (Fabian Jahr) 8f07307 wallet: Increase OUTPUT_GROUP_MAX_ENTRIES to 100 (Fabian Jahr) Pull request description: Follow-up to #17824. This increases OUTPUT_GROUP_MAX_ENTRIES to 100 which means that OutputGroups will now be up to 100 outputs large, up from previously 10. The main motivation for this change is that during the PR review club on #17824 [several participants signaled](https://bitcoincore.reviews/17824.html#l-339) that 100 might be a better value here. I think fees should be manageable for users but more importantly, users should know what they can expect when using the wallet with this configuration, so I also tried to clarify the documentation on `-avoidpartialspends` and `avoid_reuse` a bit. If there are other additional ways how or docs where users can be made aware of the potential consequences of using these parameters, please let me know. Another small upside is that [there seem to be a high number of batching transactions with 100 and 200 inputs](https://miro.medium.com/max/3628/1*sZ5eaBSbsJsHx-J9iztq2g.png)([source](https://medium.com/@hasufly/an-analysis-of-batching-in-bitcoin-9bdf81a394e0)) giving these transactions a bit of a larger anonymity set, although that is probably a very weak argument. ACKs for top commit: jnewbery: ACK e6fe1c3 Xekyo: retACK e6fe1c3 rajarshimaitra: tACK `e6fe1c3` achow101: ACK e6fe1c3 glozow: code review ACK e6fe1c3 Tree-SHA512: 79685c58bafa64ed8303b0ecd616fce50fc9a2b758aa79833e4ad9f15760e09ab60c007bc16ab4cbc4222e644cfd154f1fa494b0f3a5d86faede7af33a6f2826
Fixes #17603 (together with #17843)
In the case of destination groups of >10 outputs existing in a wallet with
avoid_reuse
enabled, the grouping algorithm is adding left-over outputs as an "incomplete" group to the list of groups even when a full group has already been added. This leads to the strange behavior that if there are >10 outputs for a destination the transaction spending from that will effectively uselen(outputs) % 10
as inputs for that transaction.From the original PR and the code comment I understand the correct behavior should be the usage of 10 outputs. I opted for minimal changes in the current code although there maybe optimizations possible for cases with >20 outputs on a destination this sounds like too much of an edge case right now.