Skip to content

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Dec 29, 2019

Fixes #17603 (together with #17843)

In the case of destination groups of >10 outputs existing in a wallet with avoid_reuse enabled, the grouping algorithm is adding left-over outputs as an "incomplete" group to the list of groups even when a full group has already been added. This leads to the strange behavior that if there are >10 outputs for a destination the transaction spending from that will effectively use len(outputs) % 10 as inputs for that transaction.

From the original PR and the code comment I understand the correct behavior should be the usage of 10 outputs. I opted for minimal changes in the current code although there maybe optimizations possible for cases with >20 outputs on a destination this sounds like too much of an edge case right now.

@fanquake fanquake requested a review from instagibbs December 29, 2019 01:32
@fanquake fanquake assigned achow101 and unassigned achow101 Dec 29, 2019
@fanquake fanquake requested a review from achow101 December 29, 2019 01:34
@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 29, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@kallewoof
Copy link
Contributor

The reported issue seems like things are not working as they should for sure, but I don't see how this code fixes it. Ultimately, if I want to send 10.5 bitcoin to you, and I have 500 UTXO:s of 1 BTC each all sending to A, the grouping part should result in 50 groups of 10 random UTXO:s each (each group worth 10 BTC), and two of these should be selected when sending, and ALL of the outputs should be marked as used afterwards.

Your code seems to change this behavior to only result in one 10 BTC group per destination, which means the above send will result in "insufficient funds".

@kallewoof
Copy link
Contributor

I investigated this further. See #17603 (comment)

@fjahr
Copy link
Contributor Author

fjahr commented Dec 31, 2019

I investigated this further. See #17603 (comment)

Thanks, that's what I found as well, I should have described it in greater detail.

The reported issue seems like things are not working as they should for sure, but I don't see how this code fixes it. Ultimately, if I want to send 10.5 bitcoin to you, and I have 500 UTXO:s of 1 BTC each all sending to A, the grouping part should result in 50 groups of 10 random UTXO:s each (each group worth 10 BTC), and two of these should be selected when sending, and ALL of the outputs should be marked as used afterwards.

Your code seems to change this behavior to only result in one 10 BTC group per destination, which means the above send will result in "insufficient funds".

My code does handle this case correctly. You describe the behavior that would occur if I had implemented it as @achow101 suggests. The "full groups" (10 outputs) are already pushed into groups earlier and only the "incomplete" group of <10 outputs would be added later. So we are collecting all the full groups but skip the last one if it is not full.

However, your comment made me realize another shortcoming of the current code: If there are 11 outputs of 1 BTC and we want to send 10.5 BTC then we see the insufficient funds error because we are "forgetting" the incomplete group. To fix this I would suggest implementing the following: the last, incomplete group, is pushed into one of the full groups, resulting in one of the groups being up to 19 outputs large. "Worst case" scenario that I see: we want to send a very small amount from the destination and we have exactly 19 outputs, then we would use all the 19 outputs. I think this should be ok? I will wait for your feedback before pushing the code for it.

See adapted tests I ran with the current code for the examples discussed above: https://gist.github.com/fjahr/d56b68e58b4275ef52cd5f4eb50b1d4e

@fjahr fjahr changed the title wallet: Fix coin selection for destination groups >10 wallet: Improve coin selection for destination groups >10 Jan 2, 2020
@kallewoof
Copy link
Contributor

kallewoof commented Jan 3, 2020

I'm not sure this is the right approach here. It seems like marking the output groups as 'partial' when there are >10 outputs would address this. A simple approach would be to explicitly think of <max groups for multi-group destinations as "0-conf". The current code would then actively try to avoid using it except as a last resort.

Edit: I didn't test, but I assume your example with "not working" above is for your patch, not the current master.

@kallewoof
Copy link
Contributor

kallewoof commented Jan 3, 2020

I would do something like this

diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
index 3954f6626..3c88ff9c0 100644
--- a/src/wallet/wallet.cpp
+++ b/src/wallet/wallet.cpp
@@ -3976,6 +3976,7 @@ bool CWalletTx::IsImmatureCoinBase() const
 std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outputs, bool single_coin) const {
     std::vector<OutputGroup> groups;
     std::map<CTxDestination, OutputGroup> gmap;
+    std::set<CTxDestination> gfull;
     CTxDestination dst;
     for (const auto& output : outputs) {
         if (output.fSpendable) {
@@ -3990,6 +3991,7 @@ std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outpu
                 if (gmap[dst].m_outputs.size() >= OUTPUT_GROUP_MAX_ENTRIES) {
                     groups.push_back(gmap[dst]);
                     gmap.erase(dst);
+                    gfull.insert(dst);
                 }
                 gmap[dst].Insert(input_coin, output.nDepth, output.tx->IsFromMe(ISMINE_ALL), ancestors, descendants);
             } else {
@@ -3998,7 +4000,16 @@ std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outpu
         }
     }
     if (!single_coin) {
-        for (const auto& it : gmap) groups.push_back(it.second);
+        for (auto& it : gmap) {
+            auto& group = it.second;
+            if (gfull.count(it.first) > 0 && group.m_outputs.size() < OUTPUT_GROUP_MAX_ENTRIES) {
+                // make this unattractive as we want coin selection to avoid it if possible
+                group.m_depth = 0;
+                group.m_from_me = false;
+                group.m_ancestors = 9999;
+            }
+            groups.push_back(group);
+        }
     }
     return groups;
 }

Copy link
Contributor Author

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I didn't test, but I assume your example with "not working" above is for your patch, not the current master.

Yeah, that's right.

I would do something like this

That is a good idea, I did not think of that possibility before. But I still see a problem there: if there are flags activated like -walletrejectlongchains or -txconfirmtarget then we will have the same issue as with my code described above. The incomplete group will be ignored and the user may get an insufficient funds error when there are actually enough funds available. And since DEFAULT_TX_CONFIRM_TARGET is set to 6 I think this would happen for every wallet that has not set this value to 0. If only using group.m_ancestors = 9999; at least this will work for the default wallet configuration. I am not sure how widely -walletrejectlongchains is used.

Honestly, I am not sure yet which solution I prefer. The upsides of my described approach:

  1. No matter what the configuration of the wallet is, there will not be a false "Insufficient funds" error.
  2. I think it does a better job of using funds from the same address together.

An example to explain 2.:

  • A wallet has 11 outputs of 1 BTC each at reused_address
  • It also has 1 output of 1 BTC at not_reused_address
  • The user wants to send 10.5 BTC and uses avoid_reuse
  • With the m_ancestors approach, the wallet would send 10 BTC from reused_address and 1 BTC from not_reused_address, indicating these addresses could be controlled by the same user
  • With my suggestion, there would be 11 BTC group only from reused_address selected for the inputs

This stands against the downsides of my approach:

  1. At worst we are using 2 * max - 1 inputs when max could have been enough.
  2. It is a more complicated change than yours.

Let me know if I am missing something. I am not sure how much weight each of these arguments carry. I think the perfect solution would only be possible by making a distinction between <max groups and max groups in coin selection and that seems like overkill given the impact of the issue.

I am pushing an updated version implementing your solution with only m_ancestors and and an improved test and hope to get some more feedback. But I am happy to make further changes.

@kallewoof
Copy link
Contributor

You're probably right that it would not end up using the final group even if it could. I think that's okay, in this case. As for the -walletrejectlongchains issue, you would replace 9999 with max_ancestors-1 where max_ancestors is defined as in the coin selection code.

@fjahr fjahr force-pushed the i17603 branch 3 times, most recently from a0434b3 to ceeab6c Compare January 8, 2020 21:52
@fjahr
Copy link
Contributor Author

fjahr commented Jan 8, 2020

Rebased and implemented using max_ancestors - 1 base on feedback.

meshcollider added a commit that referenced this pull request Jan 15, 2020
6fc554f wallet: Reset reused transactions cache (Fabian Jahr)

Pull request description:

  Fixes #17603 (together with #17824)

  `getbalances` is using the cache within `GetAvailableCredit` under certain conditions [here](https://github.com/bitcoin/bitcoin/blob/35fff5be60e853455abc24713481544e91adfedb/src/wallet/wallet.cpp#L1826). For a wallet with `avoid_reuse` activated this can lead to inconsistent reporting of `used` transactions/balances between `getbalances` and `listunspent` as pointed out in #17603. When an address is reused before the first transaction is spending from this address, the cache is not updated even after the transaction is sent. This means the remaining outputs at the reused address are not showing up as `used` in `getbalances`.

  With this change, any newly incoming transaction belonging to the wallet marks all the other outputs at the same address as dirty.

ACKs for top commit:
  kallewoof:
    Code review re-ACK 6fc554f
  promag:
    ACK 6fc554f.
  achow101:
    Re-ACK 6fc554f
  meshcollider:
    Code review ACK 6fc554f

Tree-SHA512: c4cad2c752176d16d77b4a4202291d20baddf9f27250896a40274d74a6945e0f6b34be04c2f9b1b2e756d3ac669b794969df8f82a98e0b16f10e92f276649ea2
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 15, 2020
6fc554f wallet: Reset reused transactions cache (Fabian Jahr)

Pull request description:

  Fixes bitcoin#17603 (together with bitcoin#17824)

  `getbalances` is using the cache within `GetAvailableCredit` under certain conditions [here](https://github.com/bitcoin/bitcoin/blob/35fff5be60e853455abc24713481544e91adfedb/src/wallet/wallet.cpp#L1826). For a wallet with `avoid_reuse` activated this can lead to inconsistent reporting of `used` transactions/balances between `getbalances` and `listunspent` as pointed out in bitcoin#17603. When an address is reused before the first transaction is spending from this address, the cache is not updated even after the transaction is sent. This means the remaining outputs at the reused address are not showing up as `used` in `getbalances`.

  With this change, any newly incoming transaction belonging to the wallet marks all the other outputs at the same address as dirty.

ACKs for top commit:
  kallewoof:
    Code review re-ACK 6fc554f
  promag:
    ACK 6fc554f.
  achow101:
    Re-ACK 6fc554f
  meshcollider:
    Code review ACK 6fc554f

Tree-SHA512: c4cad2c752176d16d77b4a4202291d20baddf9f27250896a40274d74a6945e0f6b34be04c2f9b1b2e756d3ac669b794969df8f82a98e0b16f10e92f276649ea2
@jonatack
Copy link
Member

Concept ACK. Can this PR have a review club tag to help people know there is a resource?

Reviewers, review club notes and meeting logs of both sessions are here: https://bitcoincore.reviews/17824

Some of the suggestions from the sessions:

  • the PR title could use an improvement
  • the commit message could be a bit more elaborate
  • the changes could be better documented

(Also, there seemed to be some agreement in the sessions to consider raising OUTPUT_GROUP_MAX_ENTRIES.)

unsigned int limit_ancestor_count;
unsigned int limit_descendant_count;
chain().getPackageLimits(limit_ancestor_count, limit_descendant_count);
size_t max_ancestors = (size_t)std::max<int64_t>(1, limit_ancestor_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware this is moved code, but any idea why max<int64_t>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I did some digging and the type was introduced here and prior to that a uint64 was first introduced here in this context. I can only speculate that since this is a value from user input it was supposed to be as forgiving as possible if a ridiculously high number is provided, but even that does not make too much sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been a way to safeguard against a user defined value that was negative, but the switch to getPackageLimits removed that safeguard. Not sure if that was wise or not, but that's a separate question.

@fjahr fjahr force-pushed the i17603 branch 2 times, most recently from 55c4446 to bf4a7ba Compare March 24, 2020 10:50
@fjahr fjahr changed the title wallet: Improve coin selection for destination groups >10 wallet: Prefer full destination groups in coin selection Mar 24, 2020
@fjahr
Copy link
Contributor Author

fjahr commented Mar 24, 2020

Thanks for all the feedback! I should have addressed all the points here and also what was discussed in the review club. Also I tried to improve the naming of this PR/commit and expanded the commit message.

Copy link
Contributor

@kallewoof kallewoof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 9dcafb0

(Sorry about the horribly named test functions, btw. Thanks for fixing them.)

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9dcafb0 code review and built/tested on the branch and rebased on master. Tried different values in the new tests to verify they hold with different edge cases.

A few comments below but please ignore to not lose review unless you need to retouch this. Thank you for improving the first commit message -- it's very good now; two nits if you have to retouch: s/use/user/ and s/smallet/smaller/. Finally, consider updating the PR description with information from the excellent first commit message.

// number of entries, to protect against inadvertently creating
// a too-large transaction when using -avoidpartialspends to
// prevent breaking consensus or surprising users with a very
// high amount of fees.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: s/a very high amount of fees./high fees./

note this feedback by @xekyo mitigating the high fee motivation here: https://bitcoincore.reviews/17824.html#l-329

for (auto& it : gmap) {
auto& group = it.second;
if (full_groups.count(it.first) > 0) {
// Make this unattractive as we want coin selection to avoid it if possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could perhaps expand on this documentation, like in your commit message, along the lines of:

-                // Make this unattractive as we want coin selection to avoid it if possible
+                // By assigning it one less than the maximum number of ancestors
+                // allowed for this wallet, we move this smaller subgroup to the
+                // bottom of the priority list for the coin selection algorithm.

// Make this unattractive as we want coin selection to avoid it if possible
group.m_ancestors = max_ancestors - 1;
}
groups.push_back(group);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it might be better to construct rather than copy?

-                        groups.push_back(it->second);
+                        groups.emplace_back(it->second);
                         it->second = OutputGroup{};
-                        full_groups.insert(dst);
+                        full_groups.emplace(dst);
@@ -4208,7 +4208,7 @@ std::vector<OutputGroup> CWallet::GroupOutputs(const std::vector<COutput>& outpu
                 // Make this unattractive as we want coin selection to avoid it if possible
                 group.m_ancestors = max_ancestors - 1;
             }
-            groups.push_back(group);
+            groups.emplace_back(group);

@fjahr
Copy link
Contributor Author

fjahr commented Apr 2, 2020

@jonatack Thanks for your comments! For now, I am saving the ACKs and included these cleanups in my followup PR #18418

fjahr added 2 commits April 14, 2020 15:02
When a wallet uses avoid_reuse and has a large number of outputs in
a single destination, it groups these outputs in OutputGroups that
are no larger than OUTPUT_GROUP_MAX_ENTRIES. The goal is to spend
as many outputs as possible from the destination while not breaking
consensus due to a huge number of inputs and also not surprise the
use with high fees. If there are n outputs in a destination and
n > OUTPUT_GROUP_MAX_ENTRIES then this results in one or many groups
of size OUTPUT_GROUP_MAX_ENTRIES and possibly one group of size
< OUTPUT_GROUP_MAX_ENTRIES.

Prior to this commit the coin selection in the case where
n > OUTPUT_GROUP_MAX_ENTRIES was skewed towards the one group of
size < OUTPUT_GROUP_MAX_ENTRIES if it exists and the amount to be
spent by the transaction is smaller than the aggregate of those
of the group size < OUTPUT_GROUP_MAX_ENTRIES. The reason is that
the coin selection decides between the different groups based on
fees and mostly the smaller group will cause smaller fees.

The behavior that users of the avoid_reuse flag seek is that the
full groups of size OUTPUT_GROUP_MAX_ENTRIES get used first. This
commit implements this by pretending that the small group has
a large number of ancestors (one smallet than the maximum allowed
for this wallet). This dumps the small group to the bottom of the
list of priorities in the coin selection algorithm.
@fjahr
Copy link
Contributor Author

fjahr commented Apr 14, 2020

Rebased, no code changes.

@jonatack
Copy link
Member

Re-ACK a2324e4

Copy link
Contributor

@kallewoof kallewoof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a2324e4

@achow101
Copy link
Member

ACK a2324e4

Copy link
Contributor

@meshcollider meshcollider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK a2324e4 (verified the new test fails on master without this change)

@meshcollider meshcollider merged commit c189bfd into bitcoin:master Apr 17, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 17, 2020
…election

a2324e4 test: Improve naming and logging of avoid_reuse tests (Fabian Jahr)
1abbdac wallet: Prefer full destination groups in coin selection (Fabian Jahr)

Pull request description:

  Fixes bitcoin#17603 (together with bitcoin#17843)

  In the case of destination groups of >10 outputs existing in a wallet with `avoid_reuse` enabled, the grouping algorithm is adding left-over outputs as an "incomplete" group to the list of groups even when a full group has already been added. This leads to the strange behavior that if there are >10 outputs for a destination the transaction spending from that will effectively use `len(outputs) % 10` as inputs for that transaction.

  From the original PR and the code comment I understand the correct behavior should be the usage of 10 outputs. I opted for minimal changes in the current code although there maybe optimizations possible for cases with >20 outputs on a destination this sounds like too much of an edge case right now.

ACKs for top commit:
  jonatack:
    Re-ACK a2324e4
  achow101:
    ACK a2324e4
  kallewoof:
    ACK a2324e4
  meshcollider:
    Tested ACK a2324e4 (verified the new test fails on master without this change)

Tree-SHA512: 4743779c5d469fcd16df5baf166024b1d3c8eaca151df1e8281b71df62b29541cf7bfee3f8ab48d83e3b34c9256e53fd38a7b146a54c79f9caa44cce3636971a
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 9, 2020
Summary:
 * wallet: Prefer full destination groups in coin selection

When a wallet uses avoid_reuse and has a large number of outputs in
a single destination, it groups these outputs in OutputGroups that
are no larger than OUTPUT_GROUP_MAX_ENTRIES. The goal is to spend
as many outputs as possible from the destination while not breaking
consensus due to a huge number of inputs and also not surprise the
use with high fees. If there are n outputs in a destination and
n > OUTPUT_GROUP_MAX_ENTRIES then this results in one or many groups
of size OUTPUT_GROUP_MAX_ENTRIES and possibly one group of size
< OUTPUT_GROUP_MAX_ENTRIES.

Prior to this commit the coin selection in the case where
n > OUTPUT_GROUP_MAX_ENTRIES was skewed towards the one group of
size < OUTPUT_GROUP_MAX_ENTRIES if it exists and the amount to be
spent by the transaction is smaller than the aggregate of those
of the group size < OUTPUT_GROUP_MAX_ENTRIES. The reason is that
the coin selection decides between the different groups based on
fees and mostly the smaller group will cause smaller fees.

The behavior that users of the avoid_reuse flag seek is that the
full groups of size OUTPUT_GROUP_MAX_ENTRIES get used first. This
commit implements this by pretending that the small group has
a large number of ancestors (one smallet than the maximum allowed
for this wallet). This dumps the small group to the bottom of the
list of priorities in the coin selection algorithm.

 * test: Improve naming and logging of avoid_reuse tests

This is a backport of Core [[bitcoin/bitcoin#17824 | PR17824]]

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, jasonbcox

Reviewed By: #bitcoin_abc, jasonbcox

Subscribers: jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D7834
sidhujag pushed a commit to syscoin-core/syscoin that referenced this pull request Nov 10, 2020
6fc554f wallet: Reset reused transactions cache (Fabian Jahr)

Pull request description:

  Fixes bitcoin#17603 (together with bitcoin#17824)

  `getbalances` is using the cache within `GetAvailableCredit` under certain conditions [here](https://github.com/bitcoin/bitcoin/blob/35fff5be60e853455abc24713481544e91adfedb/src/wallet/wallet.cpp#L1826). For a wallet with `avoid_reuse` activated this can lead to inconsistent reporting of `used` transactions/balances between `getbalances` and `listunspent` as pointed out in bitcoin#17603. When an address is reused before the first transaction is spending from this address, the cache is not updated even after the transaction is sent. This means the remaining outputs at the reused address are not showing up as `used` in `getbalances`.

  With this change, any newly incoming transaction belonging to the wallet marks all the other outputs at the same address as dirty.

ACKs for top commit:
  kallewoof:
    Code review re-ACK 6fc554f
  promag:
    ACK 6fc554f.
  achow101:
    Re-ACK 6fc554f
  meshcollider:
    Code review ACK 6fc554f

Tree-SHA512: c4cad2c752176d16d77b4a4202291d20baddf9f27250896a40274d74a6945e0f6b34be04c2f9b1b2e756d3ac669b794969df8f82a98e0b16f10e92f276649ea2
fanquake added a commit that referenced this pull request May 26, 2021
e6fe1c3 rpc: Improve avoidpartialspends and avoid_reuse documentation (Fabian Jahr)
8f07307 wallet: Increase OUTPUT_GROUP_MAX_ENTRIES to 100 (Fabian Jahr)

Pull request description:

  Follow-up to #17824.

  This increases OUTPUT_GROUP_MAX_ENTRIES to 100 which means that OutputGroups will now be up to 100 outputs large, up from previously 10. The main motivation for this change is that during the PR review club on #17824 [several participants signaled](https://bitcoincore.reviews/17824.html#l-339) that 100 might be a better value here.

  I think fees should be manageable for users but more importantly, users should know what they can expect when using the wallet with this configuration, so I also tried to clarify the documentation on `-avoidpartialspends` and `avoid_reuse` a bit. If there are other additional ways how or docs where users can be made aware of the potential consequences of using these parameters, please let me know. Another small upside is that [there seem to be a high number of batching transactions with 100 and 200 inputs](https://miro.medium.com/max/3628/1*sZ5eaBSbsJsHx-J9iztq2g.png)([source](https://medium.com/@hasufly/an-analysis-of-batching-in-bitcoin-9bdf81a394e0)) giving these transactions a bit of a larger anonymity set, although that is probably a very weak argument.

ACKs for top commit:
  jnewbery:
    ACK e6fe1c3
  Xekyo:
    retACK e6fe1c3
  rajarshimaitra:
    tACK `e6fe1c3`
  achow101:
    ACK e6fe1c3
  glozow:
    code review ACK e6fe1c3

Tree-SHA512: 79685c58bafa64ed8303b0ecd616fce50fc9a2b758aa79833e4ad9f15760e09ab60c007bc16ab4cbc4222e644cfd154f1fa494b0f3a5d86faede7af33a6f2826
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

partial spend avoidance makes partial spends and getbalances doesn't notice
9 participants