-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Add a "tx output spender" index #24539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3f7cdb6
to
aa9f963
Compare
Concept ACK. It would be good know more about specific use-cases for this, if you can provide some links or more description. But the implementation seems very simple, so I don't think adding this would be a maintenance burden. |
@ ryan, I think the idea is that there is one coin that is owned by more than one entity. There may be several (conflicting) pre-signed txs spending it and as soon as it is spent, all entities with ownership in it want to check which pre-signed tx spent the coin.
|
Lightning channels are basically trees of unpublished transactions:
There is also a penalty scheme built into these txs: if you cheat, you need to wait before you can spend your outputs but your channel peer can spend them immediately. And LN is also transitioning to a model where these txs pay minimum fees and can be "bumped" with RBF/CPFP. => it's very useful for LN nodes to detect when transactions have been spent, but also how they've been spent and react accordingly (by extracting info from the spending txs for example), and walk up chains of txs that spend each other. As described above, there are tools in bitcoin core that we could use and would help a bit, and Lightning nodes could also create their own indexes independently or use electrum servers, block explorers, but this would make running LN nodes connected to your own bitcoin node much easier. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/24539. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK - I can see why this could be helpful for lightning, the code is very similar to the txindex code.
Some general thoughts:
getmempooltxspendingprevout
in #24408 takes a list of outpoints,gettxospender
takes a single outpoint. Would it make sense to have the same format here (I'd guess a common use case would be to check both RPCs with the same outpoint(s)?)- Taking this one step further,
getrawtransaction
queries transaction data from the mempool or from blocks (if txindex is enabled) in one RPC. Would it make sense to have something similar for outpoints, instead of two separate RPCs? - I think that the indexed data will not be deleted in case of a reorg (but possibly overwritten if an outpoint would be spent in a different tx as a result of a reorg). That means that a result from
gettxospender
could be stale. Would it make sense to delete entries or should we at least mention this possibility in the rpc doc? - What's the reason for the dependency on -txindex? As far as I can see it's not technical, so would knowing the fact that an outpoint was used in a tx (even if we can't lookup the details of that tx further) be sufficient for some use cases?
aa9f963
to
1b82b7e
Compare
Bitcoin supports batched RPC request so from a functional p.o.v it's equivalent to passing a list of outpoints but it's a bit easier to reason about imho. For
Maybe, let's see where the discussion in #24408 but to me it feels cleaner to have separate calls.
We would almost always want to get the actual tx that spends our outpoint. My reasoning was that relying on |
1b82b7e
to
3e97658
Compare
3e97658
to
e021b6b
Compare
Concept ACK, nice |
e021b6b
to
a6cb2c6
Compare
Rebased on #24408 |
I understand that adding new indexes must be justified. This one is simple and quite efficient now, the code is compact, isolated and easy to understand, and imho there are more and more apps/protocols that work by creating/updating transactions that are not published that would benefit from such an index. For applications that use Bitcoin Core's wallet (this is is our case,) and benefit from all the features that have been and are being developed (fee management, fee bumping, coin selection, package relay, descriptors, ...), adding a custom tool to build an external index would be sub-optimal from a dev and also an operational point of view, and people may prefer to use this PR instead. |
Related to: #9641 |
00ea901
to
40ce7d1
Compare
40ce7d1
to
618c5f0
Compare
Would be good to get a PR with 9 Concept ACKs and no significant concerns raised moving again. Can you rebase it? A recent PR (#32694) changed how we populate the block info struct. You'd now want to set |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
618c5f0
to
f3e5a7f
Compare
Rebased on 1be688f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the rebase :)
In the pull request description you are saying that the keys are 64 bytes, but I guess you mean 64 bits?
Did you split this work into two commits to gather feedback on tradeoffs between the two approaches? The space savings and the ability to immediately return a transaction is nice, but it is also not compatible with pruning (and still requires notices about that in init.cpp). Some lightning implementations do support pruned nodes, I guess they won't be able to leverage this extra index in that case?
Can you give a typical data query flow for both variants that you would expect from your use case in lightning? Is the read performance still acceptable for your use case? My guess is the extra read cost is compensated by not having to do an extra roundtrip to fetch the full transaction from the node.
I was also wondering if it were possible to combine this with #32541 to avoid the pruning limitation by writing the position in a block instead, but could not come up with a useful way to combine the two to that end.
bool AllowPrune() const override { return false; } | ||
|
||
protected: | ||
bool CustomAppend(const interfaces::BlockInfo& block) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes technically txospenderindex does not require txindex but could it really be usable in practice without it
I'm not following here. How does the txindex
help in this scenario?
src/rpc/mempool.cpp
Outdated
@@ -608,6 +609,11 @@ static RPCHelpMan gettxspendingprevout() | |||
}, | |||
}, | |||
}, | |||
{"options", RPCArg::Type::OBJ, RPCArg::Optional::OMITTED, "", | |||
{ | |||
{"mempool_only", RPCArg::Type::BOOL, RPCArg::DefaultHint{"true if txospenderindex unavailable, otherwise false"}, "If false and empool lacks a relevant spend, use txospenderindex (throws an exception if not available)."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (type): s/empool/mempool/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in d5980f3
src/init.cpp
Outdated
@@ -513,6 +515,7 @@ void SetupServerArgs(ArgsManager& argsman, bool can_listen_ipc) | |||
argsman.AddArg("-shutdownnotify=<cmd>", "Execute command immediately before beginning shutdown. The need for shutdown may be urgent, so be careful not to delay it long (if the command doesn't require interaction with the server, consider having it fork into the background).", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); | |||
#endif | |||
argsman.AddArg("-txindex", strprintf("Maintain a full transaction index, used by the getrawtransaction rpc call (default: %u)", DEFAULT_TXINDEX), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-txospenderindex", strprintf("Maintain a transaction output spender index, used by the gettxospender rpc call (default: %u)", DEFAULT_TXOSPENDERINDEX), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you mean used by the gettxspendingprevout
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in d5980f3
throw JSONRPCError(RPC_MISC_ERROR, strprintf("No spending tx for the outpoint %s:%d in mempool, and txospenderindex is unavailable.", prevout.hash.GetHex(), prevout.n)); | ||
} else { | ||
UniValue warnings(UniValue::VARR); | ||
warnings.push_back("txospenderindex is unavailable."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to warn in this case? After all the current behaviour does not change. Is there a scenario where the warning might be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may help users figure out they've made a mistake in their setup/configuration ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just wasn't sure if this is really useful. There are many other ways something could be misconfigured. We'd also be warning here now on what continues to be just normal working behaviour. Clients should be querying getindexinfo
first anway in my opinion, but then again, I am also not sure what exactly the precedent is here. Maybe add a note to pass "mempool_only":true
to silence the warning?
src/rpc/mempool.cpp
Outdated
@@ -652,6 +675,8 @@ static RPCHelpMan gettxspendingprevout() | |||
prevouts.emplace_back(txid, nOutput); | |||
} | |||
|
|||
const bool f_txospenderindex_ready = !mempool_only.value_or(false) && g_txospenderindex && g_txospenderindex->BlockUntilSyncedToCurrentChain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We usually don't prefix bool values with f_
anymore and maybe this could be better scoped after the else if (g_txospenderindex)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in d5980f3. Moving this does not work (maybe a conflict with the mempool lock ?) I'll investigate.
// BlockUntilSyncedToCurrentChain should return false before txospenderindex is started. | ||
BOOST_CHECK(!txospenderindex.BlockUntilSyncedToCurrentChain()); | ||
|
||
BOOST_REQUIRE(txospenderindex.StartBackgroundSync()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more robust and preferable for a unit test to just call Sync()
directly.
assert_equal(result, [ {'txid' : confirmed_utxo['txid'], 'vout' : 0, 'spendingtxid' : tx1["txid"]} ]) | ||
result = self.nodes[0].gettxspendingprevout([ {'txid' : tx1['txid'], 'vout' : 0} ]) | ||
assert_equal(result, [ {'txid' : tx1['txid'], 'vout' : 0, 'spendingtxid' : tx2["txid"]} ]) | ||
# replace tx1 with tx3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why you are testing two reorgs here. What is not covered in the first one, that is covered in this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct but I wanted to have 2 explicit tests: one when a spending tx is replaced, the other when it is cancelled.
src/rpc/mempool.cpp
Outdated
@@ -690,12 +697,18 @@ static RPCHelpMan gettxspendingprevout() | |||
const CTransaction* spendingTx = mempool.GetConflictTx(prevout); | |||
if (spendingTx != nullptr) { | |||
o.pushKV("spendingtxid", spendingTx->GetHash().ToString()); | |||
if (return_spending_tx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd want to check return_spending_tx && *return_spending_tx
here, though maybe just use a boolean directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't understand this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I read it you are currently just checking if the field is set, not if it is set to true. At least providing "return_spending_tx":false in the cli still provides the full transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had forgotten this one , fixed in f2a496c
src/index/txospenderindex.cpp
Outdated
#include <uint256.h> | ||
#include <validation.h> | ||
|
||
// LeveLDB key prefix. We only have one key for now but it will make it easier to add others if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeveLDB -> LevelDB [typo in comment “LeveLDB key prefix”]
f3e5a7f
to
d5980f3
Compare
I merged them into a single commit, the first approach requires too much disk space. This index would mostly be useful to fairly large nodes (like ours :)) which would not use pruning. |
I think that we can simplify collision handling a bit and allow pruning if the index data is stored using the following schema:
instead of storing a list of positions per It supports fast lookup of a outpoint spender by using LevelDB prefix scan (similar to how it's done in electrs and bindex). When pruning a block, the |
Thanks! I used your suggestion in 0970256 and it is cleaner than using a list of tx positions. |
0970256
to
9b750be
Compare
bool TxoSpenderIndex::CustomAppend(const interfaces::BlockInfo& block) | ||
{ | ||
std::vector<std::pair<COutPoint, CDiskTxPos>> items; | ||
items.reserve(block.data->vtx.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: items
size will be the total number of this block's inputs (not its transactions' count), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I did not want to loop twice over the block's transactions to compute the actual number of spent inputs.
src/index/txospenderindex.cpp
Outdated
|
||
bool TxoSpenderIndex::CustomRemove(const interfaces::BlockInfo& block) | ||
{ | ||
std::vector<std::pair<COutPoint, CDiskTxPos>> items; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks similar to the code above - maybe refactor into a helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in done in 42d3aa8
src/index/txospenderindex.cpp
Outdated
|
||
struct DBKey { | ||
uint64_t hash; | ||
CDiskTxPos pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using FlatFilePos
here (which contains 2 int
s) instead of CDiskTxPos
(which contains also the block's offset, taking an additional int
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I find the tx without the block's offset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest saving the transaction position on disk (without its block offset):
bool TxoSpenderIndex::CustomAppend(const interfaces::BlockInfo& block)
{
std::vector<std::pair<COutPoint, CDiskTxPos>> items;
items.reserve(block.data->vtx.size());
FlatFilePos pos{block.file_number, block.data_pos};
pos.nPos += GetSizeOfCompactSize(block.data->vtx.size());
for (const auto& tx : block.data->vtx) {
if (!tx->IsCoinBase()) {
for (const auto& input : tx->vin) {
items.emplace_back(input.prevout, pos);
}
}
pos.nPos += ::GetSerializeSize(TX_WITH_WITNESS(*tx));
}
return WriteSpenderInfos(items);
}
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again it's simpler like this, done in 42d3aa8
src/index/txospenderindex.cpp
Outdated
} | ||
CBlockHeader header; | ||
try { | ||
file >> header; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the header's data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we just need to skip over it, done in 42d3aa8
42d3aa8
to
3158e03
Compare
Adds an outpoint -> txid index, which can be used to find which transactions spent a given output. The index key is the siphash of the spent outpoint (8 bytes), keyed with a random key that is created when the index is created (to avoid collision attacks). The index value is a list of tx positions (9 bytes unless there are collisions which should be extremely rare). This index is extremely useful for Lightning and more generally for layer-2 protocol that rely on chains of unpublished transactions. If enabled, this index will be used by `gettxspendingprevout` when it does not find a spending transaction in the mempool.
…x position) As suggested by @romanz, we now use a composite key with 2 parts: hash(spent outpoint) and tx position, with an empty value. To find the spending tx for a given outpoint, we do a prefix search (prefix being the hash of the provided outpoint), and for all keys that match this prefix we load the tx at the position specified in the key and return it if does spend the provided outpoint. This makes handling reorgs much simpler (we just erase the keys computed from the removed block).
3158e03
to
4e3df36
Compare
This PR adds a new "tx output spender" index, which allows users to query which tx spent a given outpoint with the
gettxspendingprevout
RPC call that was added by #24408.Such an index would be extremely useful for Lightning, and probably for most layer-2 protocols that rely on chains of unpublished transactions.
UPDATE: this PR is ready for review and issues have been addressed:
-txindex
anymoreThis index is implemented as a simple k/v database (like other indexes):
siphash(spent outpoint)
(64 bytes)txindex
(variable size, max is 12 bytes).The spending tx can optionally be returned by
gettxspendingprevout
(even it-txindex is not set
).