bench: Remove various extraneous benchmarks #31153

dergoegge · 2024-10-25T14:27:53Z

Measuring performance of components that do not meaningfully impact the performance of critical aspects of the system (e.g. block validation or mining) seems extraneous.

Maintaining extraneous benchmark tests incentivizes non-impactful improvements. This frequently comes up with new contributors and they can't be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.

DrahtBot · 2024-10-25T14:27:57Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31153.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	stickies-v, fanquake, brunoerg, mzumsande
Approach NACK	davidgumberg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30987 (Don't zero-after-free DataStream: Faster IBD on some configurations by davidgumberg)
#28792 (Embed default ASMap as binary dump header file by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

stickies-v

Concept ACK. Perhaps also worth updating doc/benchmarking.md with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?

jonatack

Not sure about some of these. People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn't degrade performance, and having existing benchmarks to refer to may make that easier. Seems odd to throw out work like #22284.

Are any of the benchmarks called by outside sites that track their results over time?

jonatack · 2024-10-25T15:24:43Z

src/bench/logging.cpp

@@ -1,64 +0,0 @@
-// Copyright (c) 2020-2022 The Bitcoin Core developers


Not sure about removing the logging benchmarks. Logging can be very high frequency and also used for logging lock contentions. Even if that is more likely to occur for developers, it may be useful or valuable to be able to track this.

fanquake · 2024-10-25T15:35:07Z

Concept ACK - agree with stickies-v re updating the doc.

brunoerg · 2024-10-25T15:59:13Z

Concept ACK

instagibbs · 2024-10-25T16:26:13Z

Perhaps also worth updating doc/benchmarking.md with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?

I think this documentation update would be helpful to get in even without this PR.

dergoegge · 2024-10-25T16:34:21Z

People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn't degrade performance

Imo, performance improvements (and accompanying benchmarks) should only be made to code that makes critical things slower than they need to be. Otherwise we're just improving performance for the sake of doing it, which is not something we have the resources for. We should aim our resources with intention, and the benchmarks can act as the scope.

For example, should we spend our time making hex parsing 500x faster or maybe shave 10% from IBD? I'm arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don't get distracted.

Are any of the benchmarks called by outside sites that track their results over time?

https://corecheck.dev sort of does but it's quite noisy.

dergoegge · 2024-10-25T16:35:08Z

Will add the doc changes next week

maflcko · 2024-10-28T08:56:04Z

Hex parsing is used in mining via DecodeHexBlk, no?

Also, while some of the benchmarks do not exist to be optimized, but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from O(n) to O(n^2). So in a sense, they are a bit of a unit test.

In any case, updating the docs should be uncontroversial and useful going forward.

jarolrod · 2024-10-29T09:11:58Z

but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from O(n) to O(n^2). So in a sense, they are a bit of a unit test.

This is a good reason to keep.

I'm arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don't get distracted.

And this is an important statement.

We don't necessarily need to throw away work that has previously been a useful metric in issues (#17501) to achieve the goal of new benchmark contributions being focused on specific areas.

dergoegge · 2024-10-29T10:02:50Z

throw away work that has previously been a useful metric in issues (#17501)

The benchmarks weren't mentioned in the issue you linked or the PR that fixed the issue?

jarolrod · 2024-10-29T10:20:50Z

throw away work that has previously been a useful metric in issues (#17501)

The benchmarks weren't mentioned in the issue you linked or the PR that fixed the issue?

I said it's a useful metric, as in given that context, would it not be nice to have the benchmark?

dergoegge · 2024-10-29T10:29:49Z

as in given that context

I see. No I don't think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn't have found that particular issue and therefore were also not useful in verifying the fix.

practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating).

jarolrod · 2024-10-29T10:33:34Z

as in given that context

I see. No I don't think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn't have found that particular issue and therefore were also not useful in verifying the fix.

practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating).

I'm not intending to argue, and as i said your main point in the PR is valid.

If a bench didn't exist at the time of the issue, I imagine a review comment would have mentioned it would be nice to have, as a metric. I don't think that's controversial.

dergoegge · 2024-10-30T15:34:54Z

Added the doc commit, ready for further review

src/bench/parse_hex.cpp

davidgumberg · 2024-10-30T18:17:16Z

Concept ACK, Approach NACK

I think there should be a burden on PR's that claim to improve performance to demonstrate a user-impacting scenario where the improvements are relevant, and I agree that having benchmarks that test workloads that will never impact users leads to PR's being opened and review being spent on things that don't deserve attention.

I think a similar burden exists for this PR to demonstrate or at least describe a rationale for why these removed benchmarks don't test behavior that would impact users.

maflcko

No strong opinion on the removals. I think the doc update is valuable and could be extended. Simply declaring the removed benchmarks as potentially not valuable, when looked at too narrowly, could be less controversial and still achieve the essence of the goal of this pull request.

doc/benchmarking.md

kevkevinpal · 2024-11-01T00:50:25Z

Not yet sure on this PR we already have priority_level for benchmarks

All the benchmarks being removed seem to be benchmark::PriorityLevel::HIGH
not sure if it makes sense to move them down a level?

This frequently comes up with new contributors and they can't be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.

I understand this is one of the motivations for this change does this happen often for benchmarks that are marked benchmark::PriorityLevel::LOW

mzumsande

Concept ACK

I wouldn't oppose deleting some benchmarks, but I also like the idea of using priorities. That, combined with the doc should be sufficient to deter micro-optimiziations in non-critical code.

doc/benchmarking.md

mzumsande · 2024-11-01T17:55:27Z

doc/benchmarking.md

+
+Benchmarks are ill-suited for testing of denial of service issues as they are
+restricted to the same input set (introducing bias). [Fuzz
+tests](/doc/fuzzing.md) are better suited for this purpose, as they are much


Not sure I follow this point: Under "denial of service" I would understand attacks that overwhelm the target with repeated requests (which are often but not necessarily identical) and as result may just slow it down / make it unresponsive instead of outright crashing it - while I agree that benchmarks don't really find these, I wouldn't see fuzz tests designed to find crashes as the best way to test these issues (instead of integration tests with some sort of stress)?

Fuzz testing does not only find crashes, it finds slow running inputs as well. Fuzz engines usually terminate an input's execution if it takes longer than N seconds to run, usually referred to as a "timeout". See #28812 for examples.

Depending on what the harness is testing, a slow input might indicate a DoS issue (e.g. infinite loop, quadratic behavior, ...).

dergoegge · 2024-11-04T15:16:47Z

I'm not sure if using the priority levels is a good alternative to deleting the benches. Afaict, the priority level was introduced to avoid running slow benchmarks in the CI as apposed to indicating how "important" they are. As a result, we're not catching issues with these tests until someone executes them (e.g. #29061 (comment)) and I'd prefer not to expand on that.

maflcko · 2025-01-20T11:47:59Z

Not sure why it was closed. It seemed like a useful doc update to me, but maybe I am missing something. Feel free to ACK/NACK #31690, which is just the doc update from here cherry-picked.

DrahtBot added the Tests label Oct 25, 2024

stickies-v reviewed Oct 25, 2024

View reviewed changes

jonatack reviewed Oct 25, 2024

View reviewed changes

This was referenced Oct 26, 2024

Don't zero-after-free DataStream: Faster IBD on some configurations #30987

Closed

Ephemeral Dust #30239

Merged

p2p: Fill reconciliation sets (Erlay) attempt 2 #30116

Open

maflcko reviewed Oct 30, 2024

View reviewed changes

src/bench/parse_hex.cpp Show resolved Hide resolved

maflcko reviewed Oct 31, 2024

View reviewed changes

doc/benchmarking.md Outdated Show resolved Hide resolved

mzumsande reviewed Nov 1, 2024

View reviewed changes

dergoegge force-pushed the 2024-10-nuke-benchmarks branch from fcfb1d8 to 2fd73cc Compare November 4, 2024 14:56

DrahtBot added the Needs rebase label Nov 13, 2024

dergoegge added 2 commits November 13, 2024 14:07

[bench] Remove various extraneous benchmarks

810c2dc

[doc] Ammend notes on benchmarking

8b19137

dergoegge force-pushed the 2024-10-nuke-benchmarks branch from 2fd73cc to 8b19137 Compare November 13, 2024 14:08

DrahtBot removed the Needs rebase label Nov 13, 2024

dergoegge closed this Jan 16, 2025

DrahtBot mentioned this pull request Jan 17, 2025

Embed default ASMap as binary dump header file #28792

Open

kevkevinpal mentioned this pull request Jan 20, 2025

doc: Amend notes on benchmarking #31690

Merged

		@@ -1,64 +0,0 @@
		// Copyright (c) 2020-2022 The Bitcoin Core developers

bench: Remove various extraneous benchmarks #31153

bench: Remove various extraneous benchmarks #31153

Uh oh!

Conversation

dergoegge commented Oct 25, 2024

Uh oh!

DrahtBot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

stickies-v left a comment

Choose a reason for hiding this comment

Uh oh!

jonatack left a comment

Choose a reason for hiding this comment

Uh oh!

jonatack Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fanquake commented Oct 25, 2024

Uh oh!

brunoerg commented Oct 25, 2024

Uh oh!

instagibbs commented Oct 25, 2024

Uh oh!

dergoegge commented Oct 25, 2024

Uh oh!

dergoegge commented Oct 25, 2024

Uh oh!

maflcko commented Oct 28, 2024

Uh oh!

jarolrod commented Oct 29, 2024

Uh oh!

dergoegge commented Oct 29, 2024

Uh oh!

jarolrod commented Oct 29, 2024

Uh oh!

dergoegge commented Oct 29, 2024

Uh oh!

jarolrod commented Oct 29, 2024

Uh oh!

dergoegge commented Oct 30, 2024

Uh oh!

Uh oh!

davidgumberg commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevkevinpal commented Nov 1, 2024

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mzumsande Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

dergoegge Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

dergoegge commented Nov 4, 2024

Uh oh!

maflcko commented Jan 20, 2025

Uh oh!

Uh oh!

DrahtBot commented Oct 25, 2024 •

edited

Loading

jonatack Oct 25, 2024 •

edited

Loading

davidgumberg commented Oct 30, 2024 •

edited

Loading