-
Notifications
You must be signed in to change notification settings - Fork 37.7k
rpc: allow writing UTXO set to a named pipe, introduce dump_to_sqlite.sh script #31560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
rpc: allow writing UTXO set to a named pipe, introduce dump_to_sqlite.sh script #31560
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31560. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
7b39be0
to
427d17e
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
66f1a0e
to
43fff2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
Great feature!
Did some manual santiy testing on mainnet:
- Used
dumptxoutset
to create a dump (with a node synced to block 876,186),utxo_to_sqlite.py
to covert to a sqlite file, and opened/parsed in python. Conversion seemed successful, the correct number of coins were present in the table - Used
dump_to_sqlite.sh
to do the same with fifo (but with a node synced to block 200,000), then open/parsed in python. Conversion seemed successful, the correct number of coins were present in the table
Left a few relatively small comments.
May circle back and review utxo_to_sqlite.py
in more detail as time allows.
43fff2e
to
59df848
Compare
@tdb3 @romanz: Thanks for your reviews, much appreciated! Note that the first two commits which introduce the utxo-to-sqlite.py tool (+test) are part of the base PR #27432, so further comments on those changes would better fit there in the future. I took all of your suggestions and updated #27432 and rebased this PR on top of that again accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 59df848
Also re-ran tests in #31560 (review)
Can we make the exists/is_fifo/open atomic somehow? Seems liable to have a race here someday... |
Not opposed to this, but it seems like a good use-case for a kernel util :) |
This allows external tooling (e.g. converters) to consume the output directly, rather than having to write the dump to disk first and then read it from there again. Github-Pull: bitcoin#31560 Rebased-From: baa2f17
334232b
to
53217bd
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
@luke-jr: Thanks, applied this to 61cef65. Note that I had to introduce a |
53217bd
to
4c8e9b4
Compare
Rebased on master (now that #27432 has been merged 🎉). |
This allows external tooling (e.g. converters) to consume the output directly, rather than having to write the dump to disk first and then read it from there again. Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org> Github-Pull: bitcoin#31560 Rebased-From: 61cef65
ACK 4c8e9b4 Tested on Linux and MacOS. Nice work. I'm looking to integrating some UTXO sats on my website, so it's nice to see people working on conveniently extracting this data. |
Interesting - I read the link above "name pipe" and it seems the main benefit to a named pipes is just that for performance, one can skip writing to disk and the churn of creating a new file. I'm not sure I see the perf benefit here in a named pipe though.. |
I guess a follow comment is how big is the dump. If it's very large then that would be a good motivation for a named pipe. |
This allows external tooling (e.g. converters) to consume the output directly, rather than having to write the dump to disk first and then read it from there again. Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org> Github-Pull: bitcoin#31560 Rebased-From: 61cef65
Github-Pull: bitcoin#31560 Rebased-From: cfda1d1
This allows external tooling (e.g. converters) to consume the output directly, rather than having to write the dump to disk first and then read it from there again. Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>
4c8e9b4
to
145dc34
Compare
Rebased on master (due to conflict with #29307, commit a69c409). @yancyribbens: Sorry, I must have missed your comments back then.
Yes, feeding the data directly into the consuming process is obviously faster than first writing the whole dump to disk (which also takes additional temporary space) and then read it all from there again. Avoiding disk I/O would be possible in theory with any other IPC mechanism too, but named pipes have the advantage that the changes needed to the producer and consumer applications (in this case, bitcoind and
As of today on mainnet, at block height 903978, the UTXO set dump file has a size of ~8.95 GiB. (As a side note, it seems like the UTXO set size has decreased lately, I remember the dump being over 10 GiB already a few months ago.) |
This allows external tooling (e.g. converters) to consume the output directly, rather than having to write the dump to disk first and then read it from there again. Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org> Github-Pull: bitcoin#31560 Rebased-From: 61cef65
Github-Pull: bitcoin#31560 Rebased-From: cfda1d1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
briefly tested on regtest:
./contrib/utxo-tools/dump_to_sqlite.sh "./build/src/bitcoin-cli -regtest" ~/utxos_03_regtest.sqlite3
UTXO Snapshot for Regtest at block hash 1859fa04bf13269212d2cbcc406c91b6..., contains 1358 coins
{
"coins_written": 1358,
"base_hash": "1859fa04bf13269212d2cbcc406c91b6d11decbc39b97de79ae75566f5b2e826",
"base_height": 1338,
"path": "/tmp/tmp.NbBipQ3mhb/utxos.fifo",
"txoutset_hash": "3bdfcb5a529096de8e73b516005d4263b7924a6c5f81e0823e9ec904126bbf4d",
"nchaintx": 1364
}
TOTAL: 1358 coins written to /home/ubuntu/utxos_03_regtest.sqlite3, snapshot height is 1338.
Verified UTXO count matches:
sqlite3 ~/utxos_03_regtest.sqlite3 "SELECT COUNT(*) FROM utxos;"
1358
This PR slightly modifies the
dumptxoutset
RPC to allow writing the UTXO set dump into a named pipe, so that the output data can be consumed by another process, see #31373. Taking use of this with the utxo-to-sqlite.py tool (introduced in #27432), creating an UTXO set in SQLite3 format is possible on the fly and becomes a one-liner with a newly introduced scriptdump_to_sqlite.sh
. E.g. for signet:Note that the
dumptxoutset
RPC calculates an UTXO set hash as a first step before any data is emitted, so especially on mainnet it takes quite a while until the conversion starts and something is happening visibly.The new script is quite minimal and PoC-y at this point, there are some potential improvement ideas: