Skip to content

Conversation

luke-jr
Copy link
Member

@luke-jr luke-jr commented May 8, 2024

Debug option to control LevelDB file sizes. Since LevelDB seems to always overshoot, "max" didn't seem fitting.

Intended to be followed up with a change to the default in a rebased #30039

@DrahtBot
Copy link
Contributor

DrahtBot commented May 8, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30059.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK tdb3

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
  • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
  • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
  • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot mentioned this pull request May 8, 2024
@laanwj
Copy link
Member

laanwj commented May 9, 2024

i'm broadly on board with making this an option (seperately from changing the default in #30039). Assuming there are legit reasons for varying this based on the kind of server, operating system, amount of RAM, kind of disk, etc, and there in't a single sweet spot.

Copy link
Contributor

@tdb3 tdb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

Thank you. Seems reasonable to include this as a debug option, and increases flexibility over the single value chosen for PR #30039. Left an inline comment/question.

Ran a quick sanity check:
With dbfilesize=64 set (along with txindex=1), performed an IBD on Signet from a node on the same LAN, then stopped and restarted bitcoind. Saw chainstate and indexes/txindex files that were close to 64MB (blocks/index for signet was less than 64MB, but still had a file over 2MB).

@@ -448,6 +449,11 @@ void SetupServerArgs(ArgsManager& argsman)
argsman.AddArg("-datadir=<dir>", "Specify data directory", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
argsman.AddArg("-dbbatchsize", strprintf("Maximum database write batch size in bytes (default: %u)", nDefaultDbBatchSize), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::OPTIONS);
argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (%d to %d, default: %d). In addition, unused mempool memory is shared for this cache (see -maxmempool).", nMinDbCache, nMaxDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
argsman.AddArg("-dbfilesize",
strprintf("Target size of files within databases, in MiB (%u to %u, default: %u).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text states a range of 1MiB to 1024MiB, but these values didn't seem to be enforced (e.g. 0.5, -3, and 1025 were not rejected). Is the intent here to provide guidance for the user (i.e. tell the user to choose 1 to 1024 MiB) rather than enforce a specific range of values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LevelDB itself enforces it. Would it make sense to check it redundantly on our end to control error behaviour? (But then we have to remember to keep it in sync with LevelDB... but maybe just having the docs necessitates that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, redundant checking of range (beyond LevelDB) seems a bit much for a debug option (which is used by a more niche userbase). The range in the help message seems to be more of an example of a usable range rather than a recommendation of range or a strictly enforced range. The default value would be the defacto recommendation.

If we're set on continuing to have LevelDB handle the range rather than Core, it seems like it would make sense to inform the user in the help message that the range is an example. This could help prevent a developer from wasting time on a PR that misses the intent (e.g. could envision a PR starting with: ...dbfilesize specifies a range of allowable values, but doesn't enforce this range. This PR fixes this...).

It might also help to clarify to the user which files this option impacts.

This is a bit of a nit of a nit, so feel free to disregard it. Example:

diff --git a/src/init.cpp b/src/init.cpp
index 5cc513e18c9..f4be033f1c1 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -450,7 +450,7 @@ void SetupServerArgs(ArgsManager& argsman)
     argsman.AddArg("-dbbatchsize", strprintf("Maximum database write batch size in bytes (default: %u)", nDefaultDbBatchSize), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::OPTIONS);
     argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (%d to %d, default: %d). In addition, unused mempool memory is shared for this cache (see -maxmempool).", nMinDbCache, nMaxDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
     argsman.AddArg("-dbfilesize",
-                   strprintf("Target size of files within databases, in MiB (%u to %u, default: %u).",
+                   strprintf("Target size of files within databases (chainstate, indexes), in MiB (example: %u to %u, default: %u).",
                              1, 1024,
                              DEFAULT_DB_FILE_SIZE),
                    ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::OPTIONS);

@sipa
Copy link
Member

sipa commented May 17, 2024

I think it would be good to understand what the trade-offs are for this before considering making it configurable (as in: understand in what situations we'd want to recommend people use lower vs higher values). I assume lower values mean more open files/IO and less outdated data on disk, and higher values mean fewer compactions. Depending on whether those are the only effects, and to what extent these things are affected by the value, maybe we can get away with a default value that's a few % of the expected total db size.

FWIW, RocksDB uses a default of 64 MiB, and has an option (default off) to make higher database-level files become larger (e.g.: level L files would target 64 * (2^L) MiB).

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased this and ran a few benchmarks with different values.
Will do a few more measurements, but so far I'm rather leaning on finding a better default value instead of making this configurable.
See: #30039 (comment)

@luke-jr luke-jr force-pushed the dbfilesize_param branch from 9160cc0 to 1c2c840 Compare April 5, 2025 06:26
@l0rinc
Copy link
Contributor

l0rinc commented Apr 5, 2025

Given that we couldn't find any measurable difference for bigger values, do you still think it makes sense to make this configurable? And if so, what's the intended usecase?

@luke-jr
Copy link
Member Author

luke-jr commented Apr 5, 2025

Can't predict the behaviours of all storage devices and filesystems. This would enable users to tweak it and see if another configuration works better for them.

@laanwj
Copy link
Member

laanwj commented Apr 7, 2025

Given that we couldn't find any measurable difference for bigger values, do you still think it makes sense to make this configurable? And if so, what's the intended usecase?

No strong opinion but i veer toward "no", we should not be adding unneccesary knobs, and beyond this size the comparative overhead of handling files is neglible. Maybe there comes a future time when this size needs to be increased again, but it's not something end-users are going to play with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants