Skip to content

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Mar 24, 2020

Follow-up to #17824.

This increases OUTPUT_GROUP_MAX_ENTRIES to 100 which means that OutputGroups will now be up to 100 outputs large, up from previously 10. The main motivation for this change is that during the PR review club on #17824 several participants signaled that 100 might be a better value here.

I think fees should be manageable for users but more importantly, users should know what they can expect when using the wallet with this configuration, so I also tried to clarify the documentation on -avoidpartialspends and avoid_reuse a bit. If there are other additional ways how or docs where users can be made aware of the potential consequences of using these parameters, please let me know. Another small upside is that there seem to be a high number of batching transactions with 100 and 200 inputs(source) giving these transactions a bit of a larger anonymity set, although that is probably a very weak argument.

@jonatack
Copy link
Member

Concept ACK

@laanwj laanwj added the Wallet label Mar 24, 2020
@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 24, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member

Will review when #17824 is merged and this is rebased.

Some context for reviewers: https://bitcoincore.reviews/17824.html#l-327 (from here to the end of the discussion)

@bitcoin bitcoin deleted a comment from Angelika1920 Apr 2, 2020
@fjahr
Copy link
Contributor Author

fjahr commented Apr 2, 2020

Added some minor cleanups from #17824

@fjahr
Copy link
Contributor Author

fjahr commented Apr 14, 2020

Rebased, no code changes.

@fjahr
Copy link
Contributor Author

fjahr commented Apr 17, 2020

Rebased, since #17824 was merged.

@fjahr
Copy link
Contributor Author

fjahr commented Jul 30, 2020

Rebased

@@ -43,7 +43,7 @@ const WalletInitInterface& g_wallet_init_interface = WalletInit();
void WalletInit::AddWalletOptions(ArgsManager& argsman) const
{
argsman.AddArg("-addresstype", strprintf("What type of addresses to use (\"legacy\", \"p2sh-segwit\", or \"bech32\", default: \"%s\")", FormatOutputType(DEFAULT_ADDRESS_TYPE)), ArgsManager::ALLOW_ANY, OptionsCategory::WALLET);
argsman.AddArg("-avoidpartialspends", strprintf("Group outputs by address, selecting all or none, instead of selecting on a per-output basis. Privacy is improved as an address is only used once (unless someone sends to it after spending from it), but may result in slightly higher fees as suboptimal coin selection may result due to the added limitation (default: %u (always enabled for wallets with \"avoid_reuse\" enabled))", DEFAULT_AVOIDPARTIALSPENDS), ArgsManager::ALLOW_ANY, OptionsCategory::WALLET);
argsman.AddArg("-avoidpartialspends", strprintf("Group outputs by public key, selecting many (possibly all) or none, instead of selecting on a per-output basis. Privacy is improved as public keys are mostly swept with fewer transactions and outputs are aggregated in clean change addresses. It may result in higher fees due to less optimal coin selection caused by this added limitation and possibly a larger-than-necessary number of inputs being used. Always enabled for wallets with \"avoid_reuse\" enabled, otherwise default: %u.", DEFAULT_AVOIDPARTIALSPENDS), ArgsManager::ALLOW_ANY, OptionsCategory::WALLET);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it by scriptPubKey? It seems to me that "address" might be more accurate / intuitive for users than public key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. UTXOs could be sent to multiple addresses corresponding to the same pubkey, and they'd be grouped into different Output Groups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A scriptPubKey can also have multiple address types and those would be grouped in the same output group, which was the motivation to make this more precise if I remember correctly. So spk would be the correct term to use here IMO. But it seems there is a lot of support for not making a change and spk might be too technical here, so I have reverted this bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A scriptPubKey can also have multiple address types

I don't believe this is true. A scriptPubKey will encode deterministically to a single address (base58 for P2PKH and P2SH, bech32 for P2WPKH and P2WSH, bech32m for segwit v1+).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this code changed since I opened this PR then. It used to be an issue, see #17605. As sipa mentions there it was never an issue on descriptor wallets but I didn't expect that behavior was changed on legacy wallets as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, this was about marking as reused, not the grouping. I guess this is what made me do the doc change in the first place but I was just confused.

@@ -342,14 +342,14 @@ def test_full_destination_group_is_preferred(self):
txid = self.nodes[1].sendtoaddress(address=ret_addr, amount=0.5)
inputs = self.nodes[1].getrawtransaction(txid, 1)["vin"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I added a line to grab the amount and fee for this transaction here, i.e.

txinfo = self.nodes[1].gettransaction(txid)
print("Paid fee of {} when sending {}".format(txinfo["fee"], txinfo["amount"]))

It looks like the fee would be 0.00136966BTC 😱 I thought it might get caught by maxapsfee but it went through. Am I missing something else that would catch this case?

I understand this scenario isn't typical, but is this maybe... not the best default option for users? 0.00137 is like $66, and they could be paying this fee on a 0.001BTC transaction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a problem (or at least not worse than any alternative). The high fees are intrinsic in the UTXOs that are already owned by the wallet. The fact that they get realized when making this 0.001BTC payment is no better/worse than if they were realized when making a subsequent payment for 100BTC, or if they were used to make a subsequent payment of 0.00001 BTC.

The only way this could be improved is if our coin selection algorithm favored using relatively few coins when the prevailing fee rate is high, and favored using relatively many coins when the prevailing fee rate is low. I don't believe we have such logic in our wallet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(typed this before github showed johns anwer so there is some duplication here I think) It's not great in theory. But I think we can expect that users know what they are getting into here. The users have reused addresses a lot (willingly or unwillingly) and realized that this is a threat to their privacy. Then they have opted in to avoid reuse which warns users that this can create higher fees via high numbers of inputs. And there isn't even much of a risk here if somebody sets this option just for the fun of it. As long as there is no address reuse in that wallet, avoid reuse also won't kick in.

Because this is a privacy feature that users have opted into it seems the better trade-off to me that someone is surprised by a higher fee rather than being surprised that they could only consolidate 10 outputs in one transaction when they had 30 outputs in that destination. And then they have to further watch out to no compromise themselves when consolidating the remaining 20 outputs. What users are getting aside from privacy protection is a consolidation transaction. While the fees might be high on this first tx it will hopefully also create a change output that is larger and will be much cheaper to spend later on.

maxapsfee is only relevant if avoid partial spends is not activated. If avoid partial spends is not set, the wallet will still try to avoid partial spends and prefer to use those coins that avoid partial spends of those that don't if the premium is below the threshold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh thanks for explaining this. I can see how a user turning -avoidpartialspends on would be consciously choosing to opt for privacy over lower short-term fees.

maxapsfee is only relevant if avoid partial spends is not activated. If avoid partial spends is not set, the wallet will still try to avoid partial spends and prefer to use those coins that avoid partial spends of those that don't if the premium is below the threshold.

Ohh my bad, thanks for being patient with me!

@@ -446,7 +446,7 @@ static RPCHelpMan sendtoaddress()
{"estimate_mode", RPCArg::Type::STR, RPCArg::Default{"unset"}, std::string() + "The fee estimate mode, must be one of (case insensitive):\n"
" \"" + FeeModes("\"\n\"") + "\""},
{"avoid_reuse", RPCArg::Type::BOOL, RPCArg::Default{true}, "(only available if avoid_reuse wallet flag is set) Avoid spending from dirty addresses; addresses are considered\n"
"dirty if they have previously been used in a transaction."},
"dirty if they have previously been used in a transaction. If true, this also activates avoidpartialspends, grouping outputs by their addresses."},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can mention avoidpartialspends in single quotes or a complete sentence to describe what it does: If true, it will avoid partial spends and group unspent outputs by address.

@ghost
Copy link

ghost commented May 18, 2021

Reading the comments about increasing OUTPUT_GROUP_MAX_ENTRIES from 10 to 100: https://bitcoincore.reviews/17824.html

nit: I think it will be better if we have MIN, MAX and DEFAULT in such cases and allow user to set anything between MIN and MAX. For OUTPUT_GROUP_MAX_ENTRIES:

OUTPUT_GROUP_DEFAULT_ENTRIES:10
OUTPUT_GROUP_MIN_ENTRIES: 10
OUTPUT_GROUP_MAX_ENTRIES: 100

Mentioned it as 'nit' because use of constants in Bitcoin Core is already discussed in few PRs: #21815 (comment) (4)

Increasing it from 10 to 100 is not an issue, real issues are with coin selection however that is out of scope for this PR. Had one related question which was answered by Andrew Chow: https://bitcoin.stackexchange.com/q/106418/

Other issue:

Tx: aa6986b774b1a7bdf45c2d4df43161996fa4caf00725889fb91ca79a3bd8fd9c has 14 inputs out of which 9 are associated with one address: tb1qyqd9p9d2rc5a9v6cphucn999ljfcvdhymthypy

avoidpartialspends=1 in bitcoin.conf and tried sending 0.051 BTC

I am not sure why other inputs were used because sum of inputs associated with tb1qyqd9p9d2rc5a9v6cphucn999ljfcvdhymthypy = 0.004 + 0.0076 + 0.01 + 0.009 + 0.007 + 0.0045 + 0.0066 + 0.0055 + 0.005 = 0.0592 (Assume these are 90 instead of 9)

Maybe because it will create a small change? But avoidpartialspends=1 is mentioned as an option that improves privacy. In this case user is associating himself with extra 5 addresses which could have been avoided in this transaction.

@jnewbery
Copy link
Contributor

ACK e6fe1c3

@michaelfolkson
Copy link

@prayank23: Please try to focus on the PR in question. Additional features can be added in a separate PR e.g. user configurable max. The StackExchange post is interesting but I would argue not relevant to the intent of this PR. Relay connections are certainly not relevant.

@ghost
Copy link

ghost commented May 19, 2021

The StackExchange post is interesting but I would argue not relevant to the intent of this PR.

To review this PR you need to test avoidpartialspends, it didn't work as I expected it to be.

Relay connections are certainly not relevant.

Never mentioned anything about relay. The link of other PR is about use of different constants in Bitcoin Core. It is relevant because value of OUTPUT_GROUP_MAX_ENTRIES was 10 earlier and 100 after this PR with not enough reasons to justify those numbers which affects Bitcoin Core users unaware of these things.

@michaelfolkson
Copy link

It is relevant because value of OUTPUT_GROUP_MAX_ENTRIES was 10 earlier and 100 after this PR with not enough reasons to justify those numbers which affects Bitcoin Core users unaware of these things.

I agree with you that the rationale (upsides and downsides to this change) for increasing from 10 to 100 seems weakly explained/explored and hence it is hard to assess Concept ACK/NACK. As @xekyo said in the PR review club:

05:43 Regarding Q4, I find the 10 limit a bit arbitrary, but 100 is just as arbitrary.

It seems we only have gut feel of wallet devs as a reason for Concept ACKing. Ideally we would have more than that but we can explore in the PR review club later.

I think how constants are set generally (e.g. relay connections) are a much broader subject best not explored in the comments of this PR.

@glozow
Copy link
Member

glozow commented May 19, 2021

I agree with you that the rationale (upsides and downsides to this change) for increasing from 10 to 100 seems weakly explained/explored and hence it is hard to assess Concept ACK/NACK.

@michaelfolkson I've had a think about this and my rough framework for the soundness of a OUTPUT_GROUP_MAX_ENTRIES is

  • high enough so that, if a user turned on avoid_reuse and that triggered -avoidpartialspends, transactions would probably sweep all outputs belonging to an address (because they'd be in a bit of a pickle otherwise).
  • low enough to not be a footgun (fee-wise) for users that turn on -avoidpartialspends for funsies.

It seems to me that 10 is too low for this and weakens the utility of -avoidpartialspends. I found the graph in the PR description helpful in understanding why 100 might be reasonable.

I think it will be better if we have MIN, MAX and DEFAULT in such cases and allow user to set anything between MIN and MAX. For OUTPUT_GROUP_MAX_ENTRIES

@prayank An OUTPUT_GROUP_MIN_ENTRIES doesn’t make much sense to me. The default is just 1 if they don’t want to -avoidpartialspends. Perhaps it could be useful to let the user pick a maximum size if they want to limit how much they’re paying for the consolidation, but I think setting a maximum fee using -maxtxfee would capture their intent better.

@murchandamus
Copy link
Contributor

OUTPUT_GROUP_MAX_ENTRIES is only relevant when users are privacy conscious enough to enable avoid partial spend, but then on the other hand also heavily reuse addresses. It seems unlikely that this constellation occurs so frequently that it would be worth adding and maintaining configuration options for it.

@murchandamus
Copy link
Contributor

retACK e6fe1c3

@rajarshimaitra
Copy link
Contributor

tACK e6fe1c3

Verified the tests are checking against intended behaviour.

No strong feeling over the value of OUTPUT_GROUP_MAX_ENTRIES.

@glozow 's rationale here #18418 (comment) makes sense to me.

But in the end this is also as arbitrary as 10. Would be better if this rationale could be included in the PR description. Was a little hard to find reasoning behind why it was decided to change it.

@ghost
Copy link

ghost commented May 21, 2021

OUTPUT_GROUP_MAX_ENTRIES is only relevant when users are privacy conscious enough to enable avoid partial spend, but then on the other hand also heavily reuse addresses. It seems unlikely that this constellation occurs so frequently

I was thinking about possible examples in which a user has saved avoidpartialspends=1 in bitcoin.conf but still reusing addresses. Two examples that I could think of:

  1. User read about avoidpartialspends being good for privacy, saves it in bitcoin.conf assuming it will manage everything for the user, forgets about reuse after sometime being careless or receives donation to one of his address regularly or whitelisted one address on an exchange and use it regularly for withdrawals or one bitcoin address used for payouts in some website/app or some other mistake.
  2. Forced address reuse attack: https://en.bitcoin.it/wiki/Privacy#Forced_address_reuse

In both cases its difficult to know if 100 will be good enough for all the users. Maybe it works for most of them.

Perhaps it could be useful to let the user pick a maximum size if they want to limit how much they’re paying for the consolidation, but I think setting a maximum fee using -maxtxfee would capture their intent better.

Agree. It will be useful if users could decide this and save in bitcoin.conf. Using maxtxfee is just a workaround which I had also suggested to one user who had issues with more inputs being selected by coin selection algorithm: #20598 (comment) which will abort the transaction.

Since we had discussed about other wallets in PR review club meeting, I was curious if they use any such max limit:

Electrum: if any coin is spent from a user address, all coins are.

Wasabi: Treat same-scriptpubkey coins as one big coin. If group has an unconfirmed, then the whole group is unconfirmed.

Joinmarket : To prevent forced address reuse attacks, (sometimes called dust attacks, but that's a poor name), any deposit to an already-used address is freezed by default.

Samourai: Not sure but I tested with 15 UTXOs for same address and they were all added as inputs for the transaction. I am assuming it is same as Electrum or Wasabi.

@achow101
Copy link
Member

ACK e6fe1c3

@ghost
Copy link

ghost commented May 25, 2021

Concept NACK

  • It should not just be 100 but 'all` same-scriptpubkey coins
  • If there needs to be value for it, let the users decide with some option in bitcoin.conf

Reason: Privacy. There are few other related issues as well which are mentioned in #22018

@michaelfolkson
Copy link

@prayank23: I suspect that increasing OUTPUT_GROUP_MAX_ENTRIES to unlimited (no max) would get some NACKs based on the discussions at the PR review clubs and @glozow's "low enough to not be a footgun (fee-wise) for users that turn on -avoidpartialspends"

And an option in bitcoin.conf would have greater scope and more complexity than this PR though anyone is free to open that PR before or after this PR is merged.

Hence I think this PR is a choice between the status quo (10) or 100 rather than the two choices you outline as your preferences. And there seems to be consensus amongst the wallet devs that 100 is better than 10. Hence Concept ACK, Approach ACK on this PR.

You have to set some defaults. You can't have users needing to decide before they get started on a large number of variables where many of them will have no idea what they represent. Hence a new default of 100 with a possible future PR to make it an option seems the best path forward.

@ghost
Copy link

ghost commented May 25, 2021

Hence I think this PR is a choice between the status quo (10) or 100 rather than the two choices you outline as your preferences. And there seems to be consensus amongst the wallet devs that 100 is better than 10.

10 or 100, both are bad for privacy. Increasing it from 10 to 100 is worse if you consider attacker will be able to know more about the version of core used by victim.

I suspect that increasing OUTPUT_GROUP_MAX_ENTRIES to unlimited (no max) would get some NACKs

Then we should mention the privacy issues in docs, trade-offs and reasons for not improving privacy.

@michaelfolkson
Copy link

michaelfolkson commented May 25, 2021

10 or 100, both are bad for privacy. Increasing it from 10 to 100 is worse if you consider attacker can now know more about the version of core used by victim.

I'm pretty sure keeping the version of Core private is near the bottom of the privacy priority wishlist. Otherwise we would make no improvements and add no features to Core that leak the Core version being used. The general advice appears to be that users should run a recent version (if not the most recent version) of Core. That doesn't leak anything regarding your transactions or your net worth.

Then we should mention the privacy issues in docs, trade-offs and reasons for not improving privacy.

I think everyone supports improving the docs.

edit: This discussion should move to IRC and away from this PR I think. Your Concept NACK has been logged with a rationale, other reviewers can assess whether they agree with it or not.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code review ACK e6fe1c3

@fanquake fanquake merged commit ecddd12 into bitcoin:master May 26, 2021
@fanquake
Copy link
Member

cc @meshcollider @instagibbs

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 27, 2021
gwillen pushed a commit to ElementsProject/elements that referenced this pull request Jun 1, 2022
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.