Skip to content

Conversation

theStack
Copy link
Contributor

@theStack theStack commented Dec 24, 2024

This PR slightly modifies the dumptxoutset RPC to allow writing the UTXO set dump into a named pipe, so that the output data can be consumed by another process, see #31373. Taking use of this with the utxo-to-sqlite.py tool (introduced in #27432), creating an UTXO set in SQLite3 format is possible on the fly and becomes a one-liner with a newly introduced script dump_to_sqlite.sh. E.g. for signet:

$ ./contrib/utxo-tools/dump_to_sqlite.sh "./build/src/bitcoin-cli -signet" ~/utxos.sqlite3
UTXO Snapshot for Signet at block hash 000000ddc3b251483cf1ebb23e2750ba..., contains 5705634 coins
1048576 coins converted [18.38%], 4.474s passed since start
2097152 coins converted [36.76%], 8.793s passed since start
3145728 coins converted [55.13%], 13.146s passed since start
4194304 coins converted [73.51%], 17.478s passed since start
5242880 coins converted [91.89%], 21.832s passed since start
{
  "coins_written": 5705634,
  "base_hash": "000000ddc3b251483cf1ebb23e2750ba2490701d0c547241b247a9beb85498d0",
  "base_height": 227678,
  "path": "/tmp/tmp.MFHEVqetv0/utxos.fifo",
  "txoutset_hash": "f29e524c999487cbd0cfca5201dce67c2c5e5c5eb115c63ad48c2239f23eea4c",
  "nchaintx": 8272649
}
TOTAL: 5705634 coins written to /home/thestack/utxos.sqlite3, snapshot height is 227678.

Note that the dumptxoutset RPC calculates an UTXO set hash as a first step before any data is emitted, so especially on mainnet it takes quite a while until the conversion starts and something is happening visibly.

The new script is quite minimal and PoC-y at this point, there are some potential improvement ideas:

  • better error handling (e.g. detect if bitcoin-cli exists, clean up tmpdir if bitcoin-cli execution fails etc.)
  • allow to pass through the rollback option (now we always dump at the current height, i.e. "latest" parameter)

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 24, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31560.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK naiyoma
Stale ACK tdb3, virtu

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32983 (rpc: refactor: use string_view in Arg/MaybeArg by stickies-v)
  • #32621 (contrib: utxo_to_sqlite.py: add option to store txid/spk as BLOBs by theStack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@theStack theStack force-pushed the 202412-dumptxoutset-allow_write_to_named_pipe branch from 7b39be0 to 427d17e Compare December 24, 2024 02:52
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/34820327020

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@theStack theStack force-pushed the 202412-dumptxoutset-allow_write_to_named_pipe branch 2 times, most recently from 66f1a0e to 43fff2e Compare December 24, 2024 03:21
Copy link
Contributor

@tdb3 tdb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK

Great feature!

Did some manual santiy testing on mainnet:

  • Used dumptxoutset to create a dump (with a node synced to block 876,186), utxo_to_sqlite.py to covert to a sqlite file, and opened/parsed in python. Conversion seemed successful, the correct number of coins were present in the table
  • Used dump_to_sqlite.sh to do the same with fifo (but with a node synced to block 200,000), then open/parsed in python. Conversion seemed successful, the correct number of coins were present in the table

Left a few relatively small comments.
May circle back and review utxo_to_sqlite.py in more detail as time allows.

@theStack
Copy link
Contributor Author

@tdb3 @romanz: Thanks for your reviews, much appreciated! Note that the first two commits which introduce the utxo-to-sqlite.py tool (+test) are part of the base PR #27432, so further comments on those changes would better fit there in the future. I took all of your suggestions and updated #27432 and rebased this PR on top of that again accordingly.

Copy link
Contributor

@tdb3 tdb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 59df848

Also re-ran tests in #31560 (review)

@luke-jr
Copy link
Member

luke-jr commented Jan 7, 2025

Can we make the exists/is_fifo/open atomic somehow? Seems liable to have a race here someday...

@theuni
Copy link
Member

theuni commented Jan 8, 2025

Not opposed to this, but it seems like a good use-case for a kernel util :)

@TheCharlatan @stickies-v @josibake

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jan 15, 2025
This allows external tooling (e.g. converters) to consume the output
directly, rather than having to write the dump to disk first and then
read it from there again.

Github-Pull: bitcoin#31560
Rebased-From: baa2f17
@luke-jr
Copy link
Member

luke-jr commented Jan 15, 2025

eg luke-jr@56ee485

@theStack theStack force-pushed the 202412-dumptxoutset-allow_write_to_named_pipe branch 2 times, most recently from 334232b to 53217bd Compare January 18, 2025 04:34
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/35809942510

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@theStack
Copy link
Contributor Author

theStack commented Jan 18, 2025

Can we make the exists/is_fifo/open atomic somehow? Seems liable to have a race here someday...

eg luke-jr@56ee485

@luke-jr: Thanks, applied this to 61cef65. Note that I had to introduce a fs::exists helper for file_status, as direct usage of std::filesystem::exists was prohibited by the linter (which failed with "Direct use of std::filesystem may be dangerous and buggy. Please include <util/fs.h> and use the fs:: namespace, which has unsafe filesystem functions marked as deleted.").

@theStack
Copy link
Contributor Author

Rebased on master (now that #27432 has been merged 🎉).

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Feb 22, 2025
This allows external tooling (e.g. converters) to consume the output
directly, rather than having to write the dump to disk first and then
read it from there again.

Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>

Github-Pull: bitcoin#31560
Rebased-From: 61cef65
@virtu
Copy link
Contributor

virtu commented Mar 21, 2025

ACK 4c8e9b4

Tested on Linux and MacOS.

Nice work. I'm looking to integrating some UTXO sats on my website, so it's nice to see people working on conveniently extracting this data.

@DrahtBot DrahtBot requested a review from tdb3 March 21, 2025 14:39
@yancyribbens
Copy link
Contributor

Interesting - I read the link above "name pipe" and it seems the main benefit to a named pipes is just that for performance, one can skip writing to disk and the churn of creating a new file. I'm not sure I see the perf benefit here in a named pipe though..

@yancyribbens
Copy link
Contributor

yancyribbens commented Mar 22, 2025

I guess a follow comment is how big is the dump. If it's very large then that would be a good motivation for a named pipe.

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
This allows external tooling (e.g. converters) to consume the output
directly, rather than having to write the dump to disk first and then
read it from there again.

Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>

Github-Pull: bitcoin#31560
Rebased-From: 61cef65
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
theStack and others added 3 commits July 4, 2025 17:17
This allows external tooling (e.g. converters) to consume the output
directly, rather than having to write the dump to disk first and then
read it from there again.

Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>
@theStack theStack force-pushed the 202412-dumptxoutset-allow_write_to_named_pipe branch from 4c8e9b4 to 145dc34 Compare July 4, 2025 15:38
@theStack
Copy link
Contributor Author

theStack commented Jul 4, 2025

Rebased on master (due to conflict with #29307, commit a69c409).

@yancyribbens: Sorry, I must have missed your comments back then.

Interesting - I read the link above "name pipe" and it seems the main benefit to a named pipes is just that for performance, one can skip writing to disk and the churn of creating a new file. I'm not sure I see the perf benefit here in a named pipe though..

Yes, feeding the data directly into the consuming process is obviously faster than first writing the whole dump to disk (which also takes additional temporary space) and then read it all from there again. Avoiding disk I/O would be possible in theory with any other IPC mechanism too, but named pipes have the advantage that the changes needed to the producer and consumer applications (in this case, bitcoind and utxo_to_sqlite.py) are minimal, as named pipes mostly behave like regular files.

I guess a follow comment is how big is the dump. If it's very large then that would be a good motivation for a named pipe.

As of today on mainnet, at block height 903978, the UTXO set dump file has a size of ~8.95 GiB. (As a side note, it seems like the UTXO set size has decreased lately, I remember the dump being over 10 GiB already a few months ago.)

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jul 17, 2025
This allows external tooling (e.g. converters) to consume the output
directly, rather than having to write the dump to disk first and then
read it from there again.

Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>

Github-Pull: bitcoin#31560
Rebased-From: 61cef65
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jul 17, 2025
Copy link
Contributor

@naiyoma naiyoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

briefly tested on regtest:


./contrib/utxo-tools/dump_to_sqlite.sh "./build/src/bitcoin-cli -regtest" ~/utxos_03_regtest.sqlite3
UTXO Snapshot for Regtest at block hash 1859fa04bf13269212d2cbcc406c91b6..., contains 1358 coins
{
  "coins_written": 1358,
  "base_hash": "1859fa04bf13269212d2cbcc406c91b6d11decbc39b97de79ae75566f5b2e826",
  "base_height": 1338,
  "path": "/tmp/tmp.NbBipQ3mhb/utxos.fifo",
  "txoutset_hash": "3bdfcb5a529096de8e73b516005d4263b7924a6c5f81e0823e9ec904126bbf4d",
  "nchaintx": 1364
}
TOTAL: 1358 coins written to /home/ubuntu/utxos_03_regtest.sqlite3, snapshot height is 1338.

Verified UTXO count matches:

sqlite3 ~/utxos_03_regtest.sqlite3 "SELECT COUNT(*) FROM utxos;"
1358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants