-
Notifications
You must be signed in to change notification settings - Fork 37.7k
improve MallocUsage() accuracy #28531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28531. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
drahtbot_id_4_m |
The results of the old and new versions of this function should differ only slightly; it would be bad if the new one gave very different results, because node operators might find too much memory being consumed, or (not as bad) not enough. To manually test this, I ran |
Changing to draft because CI on a couple of the platforms failed the new test, so the physical memory calculation will need to be different on those platforms, I try to figure that out. |
I can reproduce the same results with glibc 2.31 and 2.38, in 32bit and 64bit. I don't think it needs to be perfect on all platforms, I think your new calculation is fine. I played a bit with a reproducer, here is mine: https://godbolt.org/z/s971sbnhG I refactored your |
🤔 There hasn't been much activity lately and the CI seems to be failing. If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in. |
b469486
to
04f3fbe
Compare
Force-pushed to 04f3fbe - rebase only. (CI is still expected to fail, I'm working on that now.) |
cee653f
to
86b71ec
Compare
6bfdad9
to
58ebf90
Compare
@martinus
On earlier runs, some other tests have failed, but I'm not sure if those are spurious failures. I'm not sure what to do now, any thoughts or suggestions are welcome! |
@LarryRuane There is no reason to assume that the memory allocation overhead on those platforms, with substantially different C libraries, would match the formula we inferred on x86_64 and arm Linux. If our goal is actually accurate memory usage calculations on all systems, we will need to derive a formula for each supported system separately. |
58ebf90
to
e0fa518
Compare
@sipa - I don't think the test failures could be due to differences in memory allocation overhead across platforms. The actual memory allocation overhead isn't being tested at all. To do that would probably require a test that uses I don't mind giving up and closing this PR, but as @martinus said in an earlier comment, this calculation doesn't need to be perfect, just better. I think this test failure must be due to differences in [UPDATE: the following discussion about I thought I had it figured out; I verified with the debugger that as the test adds the 10k entries to the two maps, it has to repeatedly grow the bucket array. The allocation sizes (in bytes) for the bucket array that I saw here on Ubuntu were: 104, 232, 472, 1026, 2056, 4328, 8872, 18856, 40696, 82184.... The tradeoff one makes when using the pool resource allocator is that memory allocations that are freed but then those same sizes not allocated again, those allocations are, in effect, leaked (until the resource is destroyed). The pool resource works great when freeing and allocating the same sizes (rounded up to the alignment, usually 8 bytes) repeatedly. This is, of course, the case for the individual map entries. So the latest force-push simply calls It's difficult when CI fails on platforms that I don't have access to. Do others have a solution to that? Can I install macOS on Ubuntu or Windows 10 (I have both) as a VM? Maybe I can set up my Windows laptop to build and run the debugger, but that seems like a lot of work. |
@LarryRuane The problem here is that the The inconcsistency now comes when using So, with your new MallocUsage the numbers are most likely more correct, but this seems to trigger the incorrect underestimation for |
@martinus - thanks, that makes perfect sense! I hadn't thought of the possibility of the nodes having two pointers instead of one (double instead of single linked list). Seems like a bad design, but anyway. I don't think there would be much benefit to fixing |
e0fa518
to
c6cba06
Compare
c3d1f4b
to
eec7112
Compare
@@ -101,6 +110,7 @@ def send_batch(fee): | |||
test_framework.log.debug("Check that mempoolminfee is larger than minrelaytxfee") | |||
assert_equal(node.getmempoolinfo()['minrelaytxfee'], Decimal('0.00001000')) | |||
assert_greater_than(node.getmempoolinfo()['mempoolminfee'], Decimal('0.00001000')) | |||
test_framework.sync_mempools() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://cirrus-ci.com/task/4825274484785152?logs=ci#L3624
in sync_mempools
[23:01:19.556] raise AssertionError("Mempool sync timed out after {}s:{}".format(
[23:01:19.556] AssertionError: Mempool sync timed out after 2400s:
[23:01:19.556]
|
||
// Fill to just beyojnd the cache size limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that size -> than size [incorrect preposition]
beyojnd -> beyond [typographical error]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting back to this change.
We should synchronize it with CCoinsMap
, chich already claims to account for an "overhead of 1 or 2 pointers".
Also, the test should pass before the change as well, so it would be helpful to add it as a separate commit before the change to make it easy to see how it behaves before and after the change as well.
The test still contains a lot of repetition, please see my suggestion on how to compact it a bit more.
There are still uncovered parts of the code, I'd feel more comfortable if all modified code parts are exercised by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LarryRuane I think you should probably be the author of the commit, given that you've already added @theStack as a co-author
src/memusage.h
Outdated
} | ||
|
||
template<typename X, typename Y, typename Z> | ||
static inline size_t DynamicUsage(const std::unordered_map<X, Y, Z>& m) | ||
{ | ||
return MallocUsage(sizeof(unordered_node<std::pair<const X, Y> >)) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count()); | ||
return MallocUsage(sizeof(unordered_node<std::pair<const X, Y> >)) * m.size() + MallocUsage(2 * sizeof(void*) * m.bucket_count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike comments when code could be used, i.e. when I see sizeof(void*)
, it doesn't tell me anything about what we're actually measuring ... 2 * sizeof(void*)
is even worse.
I was thinking of showing exactly what we're measuring here, something like:
// Empirically, an std::unordered_map node has two pointers (likely
// forward and backward pointers) on some platforms (Windows and macOS),
// so be conservative in estimating memory usage by assuming this is
// the case for all platforms.
template <typename X>
struct unordered_node : private X
{
void* next_ptr;
void* prev_ptr;
};
// The memory used by an unordered_set or unordered_map is the sum of the
// sizes of the individual nodes (which are separately allocated) plus
// the size of the bucket array (which is a single allocation).
// Empirically, each element of the bucket array consists of two pointers
// on some platforms (Windows and macOS), so be conservative.
template <typename X, typename Y>
static size_t DynamicUsage(const std::unordered_set<X, Y>& s)
{
return MallocUsage(sizeof(unordered_node<X>)) * s.size() +
MallocUsage((sizeof(unordered_node<X>::next_ptr) + sizeof(unordered_node<X>::prev_ptr)) * s.bucket_count());
}
template <typename X, typename Y, typename Z>
static size_t DynamicUsage(const std::unordered_map<X, Y, Z>& m)
{
return MallocUsage(sizeof(unordered_node<std::pair<const X, Y>>)) * m.size() +
MallocUsage((sizeof(unordered_node<std::pair<const X, Y>>::next_ptr) + sizeof(unordered_node<std::pair<const X, Y>>::prev_ptr)) * m.bucket_count());
}
Also note that the current test doesn't doesn't exercise these modified lines at all, so we've kinda' in the dark here...
src/memusage.h
Outdated
}; | ||
|
||
// The memory used by an unordered_set or unordered map is the sum of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The memory used by an unordered_set or unordered map is the sum of the | |
// The memory used by an unordered_set or unordered_map is the sum of the |
@@ -212,7 +221,9 @@ static inline size_t DynamicUsage(const std::unordered_map<Key, | |||
size_t estimated_list_node_size = MallocUsage(sizeof(void*) * 3); | |||
size_t usage_resource = estimated_list_node_size * pool_resource->NumAllocatedChunks(); | |||
size_t usage_chunks = MallocUsage(pool_resource->ChunkSizeBytes()) * pool_resource->NumAllocatedChunks(); | |||
return usage_resource + usage_chunks + MallocUsage(sizeof(void*) * m.bucket_count()); | |||
// Empirically, each element of the bucket array has two pointers on some platforms (Windows and macOS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of empirically, can we add documentation or source code links here?
chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*max_mempool_size_bytes=*/0), | ||
CoinsCacheSizeState::OK); | ||
} | ||
BOOST_CHECK_LT(i, 80'000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike that we're exposing the loop variable just to see if we finished without transitioning.
We seem to exit at worst around 32415
for me locally, maybe we can reduce the attempt count and extract it to a variable:
constexpr size_t MAX_ATTEMPTS{50'000}; // runaway-loop safety cap
And we can add a worst-case condition for the very last iteration which should always be LARGE
, e.g.:
// OK → LARGE
for (size_t i{1}; i <= MAX_ATTEMPTS; ++i) {
AddTestCoin(m_rng, view);
auto state{chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, max_mempool_size_bytes)};
if (i == MAX_ATTEMPTS || view.DynamicMemoryUsage() >= full_cap * 90 / 100) {
BOOST_CHECK_EQUAL(state, CoinsCacheSizeState::LARGE);
break;
}
BOOST_CHECK_EQUAL(state, CoinsCacheSizeState::OK);
}
constexpr unsigned int COIN_SIZE = is_64_bit ? 80 : 64; | ||
// An empty coins cache still allocates one PoolResource 'chunk', which is 256 KiB, and there | ||
// is some overhead. As a sanity check, an empty coins cache should be only slightly larger. | ||
BOOST_CHECK_LT(view.DynamicMemoryUsage(), 256 * 1024 * 100 / 98); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the 100/98
a weird way to express a percentage - if we simply want to check that it's roughly 256 KiB, we could check their ratio:
BOOST_CHECK_LT(view.DynamicMemoryUsage(), 256 * 1024 * 100 / 98); | |
BOOST_CHECK_LT(view.DynamicMemoryUsage() / (256 * 1024.0), 1.1); |
BOOST_CHECK_EQUAL( | ||
chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*max_mempool_size_bytes=*/ 1 << 19), | ||
chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*max_mempool_size_bytes=*/MAX_MEMPOOL_CACHE_BYTES), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense to add the /*max_mempool_size_bytes=*
for primitives - but we don't usually do it for self-explanatory variables:
chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*max_mempool_size_bytes=*/MAX_MEMPOOL_CACHE_BYTES), | |
chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, MAX_MEMPOOL_CACHE_BYTES), |
BOOST_CHECK_EQUAL( | ||
chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*max_mempool_size_bytes=*/0), | ||
CoinsCacheSizeState::CRITICAL); | ||
|
||
// Passing non-zero max mempool usage (512 KiB) should allow us more headroom. | ||
// Repeat (continuing with the existing cache) but with a non-zero max mempool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, we're repeating the same steps but with different data?
Can we separate the data from the algorithm and do the iteration in something like:
for (size_t max_mempool_size_bytes : {size_t{0}, MAX_MEMPOOL_CACHE_BYTES}) {
// The steps from OK to CRITICAL
}
For simplicity, here's the full `getcoinscachesizestate` test
// Copyright (c) 2019-present The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
#include <sync.h>
#include <test/util/coins.h>
#include <test/util/random.h>
#include <test/util/setup_common.h>
#include <validation.h>
#include <boost/test/unit_test.hpp>
BOOST_FIXTURE_TEST_SUITE(validation_flush_tests, TestingSetup)
//! Verify that Chainstate::GetCoinsCacheSizeState() switches from OK→LARGE→CRITICAL
//! at the expected utilization thresholds, first with *no* mempool head-room,
//! then with additional mempool head-room.
BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
{
Chainstate& chainstate{m_node.chainman->ActiveChainstate()};
LOCK(::cs_main);
CCoinsViewCache& view = chainstate.CoinsTip();
BOOST_CHECK_LT(view.DynamicMemoryUsage() / (256 * 1024.0), 1.1);
constexpr size_t MAX_COINS_CACHE_BYTES{8'000'000}; // ~8 MB cache size for the test
constexpr size_t MAX_MEMPOOL_CACHE_BYTES{4'000'000}; // ~4 MB extra head-room
constexpr size_t MAX_ATTEMPTS{50'000}; // runaway-loop safety cap
// Run the same growth-path twice: first with 0 head-room, then with extra head-room
for (size_t max_mempool_size_bytes : {size_t{0}, MAX_MEMPOOL_CACHE_BYTES}) {
BOOST_CHECK_EQUAL(chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, max_mempool_size_bytes), CoinsCacheSizeState::OK);
const size_t full_cap = MAX_COINS_CACHE_BYTES + max_mempool_size_bytes;
// OK → LARGE
for (size_t i{1}; i <= MAX_ATTEMPTS; ++i) {
AddTestCoin(m_rng, view);
auto state{chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, max_mempool_size_bytes)};
if (i == MAX_ATTEMPTS || view.DynamicMemoryUsage() >= full_cap * 90 / 100) {
BOOST_CHECK_EQUAL(state, CoinsCacheSizeState::LARGE);
break;
}
BOOST_CHECK_EQUAL(state, CoinsCacheSizeState::OK);
}
// LARGE → CRITICAL
for (size_t i{1}; i <= MAX_ATTEMPTS; ++i) {
AddTestCoin(m_rng, view);
auto state{chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, max_mempool_size_bytes)};
if (i == MAX_ATTEMPTS || view.DynamicMemoryUsage() > full_cap) {
BOOST_CHECK_EQUAL(state, CoinsCacheSizeState::CRITICAL);
break;
}
BOOST_CHECK_EQUAL(state, CoinsCacheSizeState::LARGE);
}
}
for (int i{0}; i < 1'000; ++i) {
AddTestCoin(m_rng, view);
BOOST_CHECK_EQUAL(chainstate.GetCoinsCacheSizeState(), CoinsCacheSizeState::OK);
}
// CRITICAL → OK via Flush
BOOST_CHECK_EQUAL(chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*mempool=*/0), CoinsCacheSizeState::CRITICAL);
view.SetBestBlock(m_rng.rand256());
BOOST_CHECK(view.Flush());
BOOST_CHECK_EQUAL(chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*mempool=*/0), CoinsCacheSizeState::OK);
}
BOOST_AUTO_TEST_SUITE_END()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extracted the suggested test to a separate PR: #33021
@@ -212,7 +221,9 @@ static inline size_t DynamicUsage(const std::unordered_map<Key, | |||
size_t estimated_list_node_size = MallocUsage(sizeof(void*) * 3); | |||
size_t usage_resource = estimated_list_node_size * pool_resource->NumAllocatedChunks(); | |||
size_t usage_chunks = MallocUsage(pool_resource->ChunkSizeBytes()) * pool_resource->NumAllocatedChunks(); | |||
return usage_resource + usage_chunks + MallocUsage(sizeof(void*) * m.bucket_count()); | |||
// Empirically, each element of the bucket array has two pointers on some platforms (Windows and macOS). | |||
size_t usage_bucket_array = MallocUsage(2 * sizeof(void*) * m.bucket_count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/bitcoin/bitcoin/blob/master/src/coins.h#L219 already claims to account for 1-2 pointers, do we need any other change to synchronize it with the current PR?
I understand the difficulties of guessing the correct memory usage of such an app (especially when the user cannot even provide total memory usage requirements, just For reference, the commits I used are: The PR introduces a ~2% slowdownCOMMITS="3f83c744ac28b700090e15b5dda2260724a56f49 4228018ace848adcafd197776cbb4afc2400bf16"; \
STOP=888888; DBCACHE=450; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done; echo "") && \
hyperfine \
--sort command \
--runs 1 \
--export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
--parameter-list COMMIT ${COMMITS// /,} \
--prepare "killall bitcoind; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && ninja -C build bitcoind && \
./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=10000 -printtoconsole=0; sleep 10" \
--cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
"COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=450 (COMMIT = 3f83c744ac28b700090e15b5dda2260724a56f49)
Time (abs ≡): 19573.674 s [User: 35453.443 s, System: 2822.880 s]
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=450 (COMMIT = 4228018ace848adcafd197776cbb4afc2400bf16)
Time (abs ≡): 20031.162 s [User: 36761.112 s, System: 2992.418 s] Relative speed comparison
1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=450 (COMMIT = 3f83c744ac28b700090e15b5dda2260724a56f49)
1.02 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=450 (COMMIT = 4228018ace848adcafd197776cbb4afc2400bf16) To understand the performance and memory implications, I ran the same benchmark with Massif profiling, revealing that after this PR we're using ~4% less total memory now 3f83c74 has 790MB peak
4228018 has 760MB peak
It's not immediately obvious if this is more accurate than before, so I've checked the dbcache related memory usages, which we can see in the logs as:
3f83c74 - 414,449,664B for nodes + 47,738,776B for buckets = ~440MiB dbcache usage
4228018 - 371,195,904B for nodes + 47,738,776B for buckets = ~400MiB dbcache usage
Which reveals that the the new The actual memory allocated for the bucket arrays of the cache appears identical at peak times. I would expect clang to behave slightly differently - but it takes a week to measure these, so I would like some explanations first before I spend more time re-measuring the scenarios, since it seems that MallocUsage accuracy was already surprisingly accurate. |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com> Co-authored-by: Pieter Wuille <pieter@wuille.net> Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com> Co-authored-by: l0rinc <pap.lorinc@gmail.com> # Changes to test/functional/test_framework/mempool_util.py: ## fill_mempool() should call sync_mempools() before returning We saw a case where a test (p2p_1p1c_network.py) called raise_network_minfee(), which called fill_mempool() using node 0. Then raise_network_minfee() returned, and the test called rescan_utxos(), which called getrawmempool() using a different node (node 1) followed by getrawtransaction() on each returned transaction, and the test asserted because a transaction was not found. This was caused by the timing window between the call to getrawmempool() and fetching the individual transactions; the transactions were still being propagated on the P2P network. During this window, a transaction (returned by getrawmempool()) was evicted (the mempool is close to full during this test), and did not exist in the mempool by the time it was attempted to be fetched. It might make more sense for rescan_utxos() to call sync_mempools() just before calling getrawmempool(), but it can't because rescan_utxos() is part of the MiniWallet class, which doesn't have access to test_framework (but that could probably be changed). ## ensure that `fill_mempool` leaves some room in mempool Without this change, fill_mempool() may leave the mempool very close to its memory size limit (-maxmempool). This can cause tests to incorrectly fail when they submit another transaction expecting it to succeed. Note that without this change, the same test that fails on one platform may succeed on another, because their memory allocation accounting algorithms (how they calculate memory usage, that is, MallocUsage()) may be slightly different.
This commit is temporary because it makes one of the new tests fail intentionally so we can gather some information from CI across the platforms. (second attempt, fixes CI error private field not used)
b12d848
to
b36b234
Compare
|
||
#include <boost/test/unit_test.hpp> | ||
|
||
BOOST_AUTO_TEST_SUITE(util_malloc_usage_tests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LarryRuane, most of these tests will be executed on your own fork as well (except a few Cirrus fuzzers).
You can freely experiment there - see for example my attempts at l0rinc#20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that, that will prevent clutter here with that temporary stuff, thanks! I'll do that from now on (I want to do a few more similar experiments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I also have a few experiments where I'm not sure about the state of CI so I push to my local fork first - though the remaining CIs still surprise me sometimes after upstream push :)
I wonder if perhaps it would be feasible to run a little runtime self-calibration at startup to find the malloc overhead parameters, so that they would be correct on all platforms. It would make calls to |
As @LarryRuane noticed, the new address-diff test sometimes disagrees with I had to read a bit to understand why - I never needed the gory details until now. 🙂 It seem to me that macOS's nano allocator, and the allocators used by TSan/ASan, store per-chunk metadata in shadow tables rather than inline with the user block. For example, this might be related to what we're seeing: https://github.com/aosm/libmalloc/blob/master/src/nano_malloc.c#L181. So maybe we could specialize the tests by platform/compiler/arch/OS (without duplicating I personally would avoid the runtime self-calibration path - rather abstracting away our findings and generalizing it based on self-contained test behavior sounds more predictable to me. If we decide to keep the current settings (even though my massif memory allocation measurements (which should actually measure everything instead of just hoping that it instruments every call) do show it to be less accurate), and if we're still over-counting in the end, we can probably increase the default |
Could turn into draft while CI is red? |
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
After the changes in bitcoin#25325 `getcoinscachesizestate` always end the test early, see: https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html The test revival was extracted from a related PR where it was discovered, see: bitcoin#28531 (comment) Co-authored-by: Larry Ruane <larryruane@gmail.com>
🐙 This pull request conflicts with the target branch and needs rebase. |
…eSizeState` switches from OK→LARGE→CRITICAL 554befd test: revive `getcoinscachesizestate` (Lőrinc) 64ed0fa refactor: modernize `LargeCoinsCacheThreshold` (Lőrinc) 1b40dc0 refactor: extract `LargeCoinsCacheThreshold` from `GetCoinsCacheSizeState` (Lőrinc) Pull request description: After the changes in #25325 `getcoinscachesizestate` [always ended the test early](https://maflcko.github.io/b-c-cov/test_bitcoin.coverage/src/test/validation_flush_tests.cpp.gcov.html#L65): | File | Line Rate | Line Total | Line Hit | Branch Rate | Branch Total | Branch Hit | |------------------------------|---------:|-----------:|---------:|------------:|-------------:|-----------:| | validation_flush_tests.cpp | **31.5 %** | 54 | 17 | 22.3 % | 242 | 54 | The test revival was [extracted from a related PR](#28531 (comment)) where it was [discovered](#28531 (comment)). ACKs for top commit: achow101: ACK 554befd LarryRuane: ACK 554befd w0xlt: ACK 554befd Tree-SHA512: f5057254de8fb3fa627dd20fee6818cfadeb2e9f629f9972059ad7b32e01fcd7dc9922eff9da2d363b36a9f0954d9bc1c4131d47b2a9c6cc348d9864953b91be
The
MallocUsage()
function takes an allocation size as an argument and returns the amount of physical memory consumed, which is greater due to memory allocator overhead and alignment. It was first added in 2015 (first commit of #6102), but its accuracy has degraded as memory allocation libraries have evolved. It's used when it's important that large data structures, such as the coins cache and mempool, should use a predictable, configurable (limited) amount of physical memory (see the-dbcache
and-maxmempool
configuration options), as well as a few other places.sipa figured out a concise, efficient expression that this function can use, and that's what's implemented here.
Also add a unit test, which is more helpful than usual in this case since platforms, operating systems, and libraries vary significantly in this area.