Skip to content

Add a "tx output spender" index #24539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sstone
Copy link
Contributor

@sstone sstone commented Mar 11, 2022

This PR adds a new "tx output spender" index, which allows users to query which tx spent a given outpoint with the gettxspendingprevout RPC call that was added by #24408.

Such an index would be extremely useful for Lightning, and probably for most layer-2 protocols that rely on chains of unpublished transactions.

UPDATE: this PR is ready for review and issues have been addressed:

This index is implemented as a simple k/v database (like other indexes):

  • keys are siphash(spent outpoint) (64 bytes)
  • values are list of spending tx positions( in case there are collisions which should be exceptionally uncommon)., using the same encoding as txindex (variable size, max is 12 bytes).

The spending tx can optionally be returned by gettxspendingprevout (even it -txindex is not set).

@sstone sstone force-pushed the add-txospender-index branch from 3f7cdb6 to aa9f963 Compare March 11, 2022 21:08
@ryanofsky
Copy link
Contributor

Concept ACK. It would be good know more about specific use-cases for this, if you can provide some links or more description. But the implementation seems very simple, so I don't think adding this would be a maintenance burden.

@maflcko
Copy link
Member

maflcko commented Mar 12, 2022

@ ryan, I think the idea is that there is one coin that is owned by more than one entity. There may be several (conflicting) pre-signed txs spending it and as soon as it is spent, all entities with ownership in it want to check which pre-signed tx spent the coin.
If they use Bitcoin Core to monitor the coin, they may use:

  • A watch-only wallet (which requires a file on disk and BDB/Sqlite)
  • Walk the raw chain and raw mempool themselves (According to rpc: add rpc to get mempool txs spending specific prevouts #24408 (comment) this is doable for the chain, but may not be for the mempool)
  • Use an index. A blockfilterindex might speed up scanning the chain, whereas this index directly gives the answer.

@sstone
Copy link
Contributor Author

sstone commented Mar 12, 2022

Lightning channels are basically trees of unpublished transactions:

  • a commit tx that spends a shared onchain "funding" UTXO
  • HTLC txs that spend the commit tx (there can be competing HTLC txs that spend the same commit tx output: one for successful payments, one for timed out payments)

There is also a penalty scheme built into these txs: if you cheat, you need to wait before you can spend your outputs but your channel peer can spend them immediately.

And LN is also transitioning to a model where these txs pay minimum fees and can be "bumped" with RBF/CPFP.

=> it's very useful for LN nodes to detect when transactions have been spent, but also how they've been spent and react accordingly (by extracting info from the spending txs for example), and walk up chains of txs that spend each other.

As described above, there are tools in bitcoin core that we could use and would help a bit, and Lightning nodes could also create their own indexes independently or use electrum servers, block explorers, but this would make running LN nodes connected to your own bitcoin node much easier.

@michaelfolkson
Copy link

Concept ACK

I think the use case is clear and if others think the maintenance burden is reasonable there doesn't seem to be an obvious downside.

This PR can be viewed as an "extension" to #24408

Seems like review should be focused on #24408 first but assuming that gets merged, Concept ACK for this.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 12, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/24539.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK ryanofsky, michaelfolkson, mzumsande, glozow, aureleoules, achow101, TheCharlatan, hodlinator
Approach ACK w0xlt

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK - I can see why this could be helpful for lightning, the code is very similar to the txindex code.

Some general thoughts:

  • getmempooltxspendingprevout in #24408 takes a list of outpoints, gettxospender takes a single outpoint. Would it make sense to have the same format here (I'd guess a common use case would be to check both RPCs with the same outpoint(s)?)
  • Taking this one step further, getrawtransaction queries transaction data from the mempool or from blocks (if txindex is enabled) in one RPC. Would it make sense to have something similar for outpoints, instead of two separate RPCs?
  • I think that the indexed data will not be deleted in case of a reorg (but possibly overwritten if an outpoint would be spent in a different tx as a result of a reorg). That means that a result from gettxospender could be stale. Would it make sense to delete entries or should we at least mention this possibility in the rpc doc?
  • What's the reason for the dependency on -txindex? As far as I can see it's not technical, so would knowing the fact that an outpoint was used in a tx (even if we can't lookup the details of that tx further) be sufficient for some use cases?

@sstone sstone force-pushed the add-txospender-index branch from aa9f963 to 1b82b7e Compare March 14, 2022 10:27
@sstone
Copy link
Contributor Author

sstone commented Mar 14, 2022

* `getmempooltxspendingprevout` in #24408 takes a list of outpoints, `gettxospender` takes a single outpoint. Would it make sense to have the same format here (I'd guess a common use case would be to check both RPCs with the same outpoint(s)?)

Bitcoin supports batched RPC request so from a functional p.o.v it's equivalent to passing a list of outpoints but it's a bit easier to reason about imho.
I think that the use case for getmempooltxspendingprevout is a bit different: you want to bump fees for a set of txs that are related to the same channel and want to understand who is spending what. You may also have txs with multiple inputs, and would pass them all together to getmempooltxspendingprevout.

For gettxospender the use case is trying to understand why channels have been closed and react if needed, if we could pass in multiple outpoints they would likely be unrelated.

* Taking this one step further, `getrawtransaction` queries transaction data from the mempool or from blocks (if txindex is enabled) in one RPC. Would it make sense to have something similar for outpoints, instead of two separate RPCs?

Maybe, let's see where the discussion in #24408 but to me it feels cleaner to have separate calls.

* I think that the indexed data will not be deleted in case of a reorg (but possibly overwritten if an outpoint would be spent in a different tx as a result of a reorg). That means that a result from `gettxospender` could be stale. Would it make sense to delete entries or should we at least mention this possibility in the rpc doc?

* What's the reason for the dependency on -txindex? As far as I can see it's not technical, so would knowing the fact that an outpoint was used in a tx (even if we can't lookup the details of that tx further) be sufficient for some use cases?

We would almost always want to get the actual tx that spends our outpoint. My reasoning was that relying on -txindex makes gettxospender robust and consistent in the case of reorgs and is also simpler and easier to understand that storing block positions instead of txids.

@glozow
Copy link
Member

glozow commented Apr 1, 2022

Concept ACK, nice

@sstone
Copy link
Contributor Author

sstone commented Apr 25, 2022

Rebased on #24408

@sstone
Copy link
Contributor Author

sstone commented Oct 6, 2024

Concept: I am curious about the justification for including this Index in Bitcoin Core vs implementing it as a third party index, especially since it is not required by anything inside of Core itself. It keeps the index in close sync with the chain state and de-duplicates effort among L2-implementations, but adds maintenance burden in Core.

I understand that adding new indexes must be justified. This one is simple and quite efficient now, the code is compact, isolated and easy to understand, and imho there are more and more apps/protocols that work by creating/updating transactions that are not published that would benefit from such an index.

For applications that use Bitcoin Core's wallet (this is is our case,) and benefit from all the features that have been and are being developed (fee management, fee bumping, coin selection, package relay, descriptors, ...), adding a custom tool to build an external index would be sub-optimal from a dev and also an operational point of view, and people may prefer to use this PR instead.

@willcl-ark
Copy link
Member

Related to: #9641

@TheCharlatan
Copy link
Contributor

Would be good to get a PR with 9 Concept ACKs and no significant concerns raised moving again. Can you rebase it? A recent PR (#32694) changed how we populate the block info struct. You'd now want to set CustomOptions for the index to get the required block data to the index, without having to call ReadBlock in your new TxSpenderIndex class.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task previous releases, depends DEBUG: https://github.com/bitcoin/bitcoin/runs/36404454050
LLM reason (✨ experimental): The CI failure is caused by a compilation error due to an incorrect override in txospenderindex.h.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@sstone sstone force-pushed the add-txospender-index branch from 618c5f0 to f3e5a7f Compare June 18, 2025 16:44
@sstone
Copy link
Contributor Author

sstone commented Jun 19, 2025

Would be good to get a PR with 9 Concept ACKs and no significant concerns raised moving again. Can you rebase it? A recent PR (#32694) changed how we populate the block info struct. You'd now want to set CustomOptions for the index to get the required block data to the index, without having to call ReadBlock in your new TxSpenderIndex class.

Rebased on 1be688f

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rebase :)

In the pull request description you are saying that the keys are 64 bytes, but I guess you mean 64 bits?

Did you split this work into two commits to gather feedback on tradeoffs between the two approaches? The space savings and the ability to immediately return a transaction is nice, but it is also not compatible with pruning (and still requires notices about that in init.cpp). Some lightning implementations do support pruned nodes, I guess they won't be able to leverage this extra index in that case?

Can you give a typical data query flow for both variants that you would expect from your use case in lightning? Is the read performance still acceptable for your use case? My guess is the extra read cost is compensated by not having to do an extra roundtrip to fetch the full transaction from the node.

I was also wondering if it were possible to combine this with #32541 to avoid the pruning limitation by writing the position in a block instead, but could not come up with a useful way to combine the two to that end.

bool AllowPrune() const override { return false; }

protected:
bool CustomAppend(const interfaces::BlockInfo& block) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes technically txospenderindex does not require txindex but could it really be usable in practice without it

I'm not following here. How does the txindex help in this scenario?

@@ -608,6 +609,11 @@ static RPCHelpMan gettxspendingprevout()
},
},
},
{"options", RPCArg::Type::OBJ, RPCArg::Optional::OMITTED, "",
{
{"mempool_only", RPCArg::Type::BOOL, RPCArg::DefaultHint{"true if txospenderindex unavailable, otherwise false"}, "If false and empool lacks a relevant spend, use txospenderindex (throws an exception if not available)."},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (type): s/empool/mempool/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in d5980f3

src/init.cpp Outdated
@@ -513,6 +515,7 @@ void SetupServerArgs(ArgsManager& argsman, bool can_listen_ipc)
argsman.AddArg("-shutdownnotify=<cmd>", "Execute command immediately before beginning shutdown. The need for shutdown may be urgent, so be careful not to delay it long (if the command doesn't require interaction with the server, consider having it fork into the background).", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
#endif
argsman.AddArg("-txindex", strprintf("Maintain a full transaction index, used by the getrawtransaction rpc call (default: %u)", DEFAULT_TXINDEX), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
argsman.AddArg("-txospenderindex", strprintf("Maintain a transaction output spender index, used by the gettxospender rpc call (default: %u)", DEFAULT_TXOSPENDERINDEX), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean used by the gettxspendingprevout call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in d5980f3

throw JSONRPCError(RPC_MISC_ERROR, strprintf("No spending tx for the outpoint %s:%d in mempool, and txospenderindex is unavailable.", prevout.hash.GetHex(), prevout.n));
} else {
UniValue warnings(UniValue::VARR);
warnings.push_back("txospenderindex is unavailable.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to warn in this case? After all the current behaviour does not change. Is there a scenario where the warning might be useful?

Copy link
Contributor Author

@sstone sstone Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may help users figure out they've made a mistake in their setup/configuration ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just wasn't sure if this is really useful. There are many other ways something could be misconfigured. We'd also be warning here now on what continues to be just normal working behaviour. Clients should be querying getindexinfo first anway in my opinion, but then again, I am also not sure what exactly the precedent is here. Maybe add a note to pass "mempool_only":true to silence the warning?

@@ -652,6 +675,8 @@ static RPCHelpMan gettxspendingprevout()
prevouts.emplace_back(txid, nOutput);
}

const bool f_txospenderindex_ready = !mempool_only.value_or(false) && g_txospenderindex && g_txospenderindex->BlockUntilSyncedToCurrentChain();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We usually don't prefix bool values with f_ anymore and maybe this could be better scoped after the else if (g_txospenderindex).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in d5980f3. Moving this does not work (maybe a conflict with the mempool lock ?) I'll investigate.

// BlockUntilSyncedToCurrentChain should return false before txospenderindex is started.
BOOST_CHECK(!txospenderindex.BlockUntilSyncedToCurrentChain());

BOOST_REQUIRE(txospenderindex.StartBackgroundSync());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more robust and preferable for a unit test to just call Sync() directly.

assert_equal(result, [ {'txid' : confirmed_utxo['txid'], 'vout' : 0, 'spendingtxid' : tx1["txid"]} ])
result = self.nodes[0].gettxspendingprevout([ {'txid' : tx1['txid'], 'vout' : 0} ])
assert_equal(result, [ {'txid' : tx1['txid'], 'vout' : 0, 'spendingtxid' : tx2["txid"]} ])
# replace tx1 with tx3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why you are testing two reorgs here. What is not covered in the first one, that is covered in this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct but I wanted to have 2 explicit tests: one when a spending tx is replaced, the other when it is cancelled.

@@ -690,12 +697,18 @@ static RPCHelpMan gettxspendingprevout()
const CTransaction* spendingTx = mempool.GetConflictTx(prevout);
if (spendingTx != nullptr) {
o.pushKV("spendingtxid", spendingTx->GetHash().ToString());
if (return_spending_tx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd want to check return_spending_tx && *return_spending_tx here, though maybe just use a boolean directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I read it you are currently just checking if the field is set, not if it is set to true. At least providing "return_spending_tx":false in the cli still provides the full transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten this one , fixed in f2a496c

#include <uint256.h>
#include <validation.h>

// LeveLDB key prefix. We only have one key for now but it will make it easier to add others if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeveLDB -> LevelDB [typo in comment “LeveLDB key prefix”]

@sstone sstone force-pushed the add-txospender-index branch from f3e5a7f to d5980f3 Compare July 1, 2025 17:58
@sstone
Copy link
Contributor Author

sstone commented Jul 1, 2025

In the pull request description you are saying that the keys are 64 bytes, but I guess you mean 64 bits?
Yes 64 bits.

Did you split this work into two commits to gather feedback on tradeoffs between the two approaches? The space savings and the ability to immediately return a transaction is nice, but it is also not compatible with pruning (and still requires notices about that in init.cpp). Some lightning implementations do support pruned nodes, I guess they won't be able to leverage this extra index in that case?

I merged them into a single commit, the first approach requires too much disk space. This index would mostly be useful to fairly large nodes (like ours :)) which would not use pruning.

@romanz
Copy link
Contributor

romanz commented Jul 6, 2025

I think that we can simplify collision handling a bit and allow pruning if the index data is stored using the following schema:

  • key: siphash(spent_outpoint) + serialize(tx_disk_position)
  • value: b""

instead of storing a list of positions per siphash(spent_outpoint).

It supports fast lookup of a outpoint spender by using LevelDB prefix scan (similar to how it's done in electrs and bindex).

When pruning a block, the key can be recomputed and deleted from the index DB (since the key containing the tx_disk_position should be unique).

@sstone
Copy link
Contributor Author

sstone commented Jul 15, 2025

I think that we can simplify collision handling a bit and allow pruning if the index data is stored using the following schema:

  • key: siphash(spent_outpoint) + serialize(tx_disk_position)
  • value: b""

instead of storing a list of positions per siphash(spent_outpoint).

It supports fast lookup of a outpoint spender by using LevelDB prefix scan (similar to how it's done in electrs and bindex).

When pruning a block, the key can be recomputed and deleted from the index DB (since the key containing the tx_disk_position should be unique).

Thanks! I used your suggestion in 0970256 and it is cleaner than using a list of tx positions.

@sstone sstone force-pushed the add-txospender-index branch from 0970256 to 9b750be Compare July 15, 2025 16:10
bool TxoSpenderIndex::CustomAppend(const interfaces::BlockInfo& block)
{
std::vector<std::pair<COutPoint, CDiskTxPos>> items;
items.reserve(block.data->vtx.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: items size will be the total number of this block's inputs (not its transactions' count), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I did not want to loop twice over the block's transactions to compute the actual number of spent inputs.


bool TxoSpenderIndex::CustomRemove(const interfaces::BlockInfo& block)
{
std::vector<std::pair<COutPoint, CDiskTxPos>> items;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks similar to the code above - maybe refactor into a helper function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in done in 42d3aa8


struct DBKey {
uint64_t hash;
CDiskTxPos pos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using FlatFilePos here (which contains 2 ints) instead of CDiskTxPos (which contains also the block's offset, taking an additional int).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I find the tx without the block's offset ?

Copy link
Contributor

@romanz romanz Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest saving the transaction position on disk (without its block offset):

bool TxoSpenderIndex::CustomAppend(const interfaces::BlockInfo& block)
{
    std::vector<std::pair<COutPoint, CDiskTxPos>> items;
    items.reserve(block.data->vtx.size());

    FlatFilePos pos{block.file_number, block.data_pos};
    pos.nPos += GetSizeOfCompactSize(block.data->vtx.size());
    for (const auto& tx : block.data->vtx) {
        if (!tx->IsCoinBase()) {
            for (const auto& input : tx->vin) {
                items.emplace_back(input.prevout, pos);
            }
        }
        pos.nPos += ::GetSerializeSize(TX_WITH_WITNESS(*tx));
    }

    return WriteSpenderInfos(items);
}

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again it's simpler like this, done in 42d3aa8

}
CBlockHeader header;
try {
file >> header;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the header's data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we just need to skip over it, done in 42d3aa8

@sstone sstone force-pushed the add-txospender-index branch 2 times, most recently from 42d3aa8 to 3158e03 Compare July 24, 2025 14:22
sstone and others added 2 commits July 24, 2025 17:57
Adds an outpoint -> txid index, which can be used to find which transactions spent a given output.
The index key is the siphash of the spent outpoint (8 bytes), keyed with a random key that is created when the index is created (to avoid collision attacks).
The index value is a list of tx positions (9 bytes unless there are collisions which should be extremely rare).

This index is extremely useful for Lightning and more generally for layer-2 protocol that rely on chains of unpublished transactions.
If enabled, this index will be used by `gettxspendingprevout` when it does not find a spending transaction in the mempool.
…x position)

As suggested by @romanz, we now use a composite key with 2 parts: hash(spent outpoint) and tx position, with an empty value.
To find the spending tx for a given outpoint, we do a prefix search (prefix being the hash of the provided outpoint), and for all keys that match this prefix
we load the tx at the position specified in the key and return it if does spend the provided outpoint.
This makes handling reorgs much simpler (we just erase the keys computed from the removed block).
@sstone sstone force-pushed the add-txospender-index branch from 3158e03 to 4e3df36 Compare July 24, 2025 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.