Skip to content

Conversation

dergoegge
Copy link
Member

Measuring performance of components that do not meaningfully impact the performance of critical aspects of the system (e.g. block validation or mining) seems extraneous.

Maintaining extraneous benchmark tests incentivizes non-impactful improvements. This frequently comes up with new contributors and they can't be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 25, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31153.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK stickies-v, fanquake, brunoerg, mzumsande
Approach NACK davidgumberg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30987 (Don't zero-after-free DataStream: Faster IBD on some configurations by davidgumberg)
  • #28792 (Embed default ASMap as binary dump header file by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the Tests label Oct 25, 2024
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Perhaps also worth updating doc/benchmarking.md with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about some of these. People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn't degrade performance, and having existing benchmarks to refer to may make that easier. Seems odd to throw out work like #22284.

Are any of the benchmarks called by outside sites that track their results over time?

@@ -1,64 +0,0 @@
// Copyright (c) 2020-2022 The Bitcoin Core developers
Copy link
Member

@jonatack jonatack Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about removing the logging benchmarks. Logging can be very high frequency and also used for logging lock contentions. Even if that is more likely to occur for developers, it may be useful or valuable to be able to track this.

@fanquake
Copy link
Member

Concept ACK - agree with stickies-v re updating the doc.

@brunoerg
Copy link
Contributor

Concept ACK

@instagibbs
Copy link
Member

Perhaps also worth updating doc/benchmarking.md with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?

I think this documentation update would be helpful to get in even without this PR.

@dergoegge
Copy link
Member Author

People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn't degrade performance

Imo, performance improvements (and accompanying benchmarks) should only be made to code that makes critical things slower than they need to be. Otherwise we're just improving performance for the sake of doing it, which is not something we have the resources for. We should aim our resources with intention, and the benchmarks can act as the scope.

For example, should we spend our time making hex parsing 500x faster or maybe shave 10% from IBD? I'm arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don't get distracted.

Are any of the benchmarks called by outside sites that track their results over time?

https://corecheck.dev sort of does but it's quite noisy.

@dergoegge
Copy link
Member Author

Will add the doc changes next week

@maflcko
Copy link
Member

maflcko commented Oct 28, 2024

Hex parsing is used in mining via DecodeHexBlk, no?

Also, while some of the benchmarks do not exist to be optimized, but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from O(n) to O(n^2). So in a sense, they are a bit of a unit test.

In any case, updating the docs should be uncontroversial and useful going forward.

@jarolrod
Copy link
Member

but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from O(n) to O(n^2). So in a sense, they are a bit of a unit test.

This is a good reason to keep.

I'm arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don't get distracted.

And this is an important statement.

We don't necessarily need to throw away work that has previously been a useful metric in issues (#17501) to achieve the goal of new benchmark contributions being focused on specific areas.

@dergoegge
Copy link
Member Author

throw away work that has previously been a useful metric in issues (#17501)

The benchmarks weren't mentioned in the issue you linked or the PR that fixed the issue?

@jarolrod
Copy link
Member

throw away work that has previously been a useful metric in issues (#17501)

The benchmarks weren't mentioned in the issue you linked or the PR that fixed the issue?

I said it's a useful metric, as in given that context, would it not be nice to have the benchmark?

@dergoegge
Copy link
Member Author

as in given that context

I see. No I don't think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn't have found that particular issue and therefore were also not useful in verifying the fix.

practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating).

@jarolrod
Copy link
Member

as in given that context

I see. No I don't think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn't have found that particular issue and therefore were also not useful in verifying the fix.

practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating).

I'm not intending to argue, and as i said your main point in the PR is valid.

If a bench didn't exist at the time of the issue, I imagine a review comment would have mentioned it would be nice to have, as a metric. I don't think that's controversial.

@dergoegge
Copy link
Member Author

Added the doc commit, ready for further review

@davidgumberg
Copy link
Contributor

davidgumberg commented Oct 30, 2024

Concept ACK, Approach NACK

I think there should be a burden on PR's that claim to improve performance to demonstrate a user-impacting scenario where the improvements are relevant, and I agree that having benchmarks that test workloads that will never impact users leads to PR's being opened and review being spent on things that don't deserve attention.

I think a similar burden exists for this PR to demonstrate or at least describe a rationale for why these removed benchmarks don't test behavior that would impact users.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on the removals. I think the doc update is valuable and could be extended. Simply declaring the removed benchmarks as potentially not valuable, when looked at too narrowly, could be less controversial and still achieve the essence of the goal of this pull request.

@kevkevinpal
Copy link
Contributor

Not yet sure on this PR we already have priority_level for benchmarks

All the benchmarks being removed seem to be benchmark::PriorityLevel::HIGH
not sure if it makes sense to move them down a level?

This frequently comes up with new contributors and they can't be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.

I understand this is one of the motivations for this change does this happen often for benchmarks that are marked benchmark::PriorityLevel::LOW

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

I wouldn't oppose deleting some benchmarks, but I also like the idea of using priorities. That, combined with the doc should be sufficient to deter micro-optimiziations in non-critical code.


Benchmarks are ill-suited for testing of denial of service issues as they are
restricted to the same input set (introducing bias). [Fuzz
tests](/doc/fuzzing.md) are better suited for this purpose, as they are much
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this point: Under "denial of service" I would understand attacks that overwhelm the target with repeated requests (which are often but not necessarily identical) and as result may just slow it down / make it unresponsive instead of outright crashing it - while I agree that benchmarks don't really find these, I wouldn't see fuzz tests designed to find crashes as the best way to test these issues (instead of integration tests with some sort of stress)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzz testing does not only find crashes, it finds slow running inputs as well. Fuzz engines usually terminate an input's execution if it takes longer than N seconds to run, usually referred to as a "timeout". See #28812 for examples.

Depending on what the harness is testing, a slow input might indicate a DoS issue (e.g. infinite loop, quadratic behavior, ...).

@dergoegge dergoegge force-pushed the 2024-10-nuke-benchmarks branch from fcfb1d8 to 2fd73cc Compare November 4, 2024 14:56
@dergoegge
Copy link
Member Author

I'm not sure if using the priority levels is a good alternative to deleting the benches. Afaict, the priority level was introduced to avoid running slow benchmarks in the CI as apposed to indicating how "important" they are. As a result, we're not catching issues with these tests until someone executes them (e.g. #29061 (comment)) and I'd prefer not to expand on that.

@dergoegge dergoegge force-pushed the 2024-10-nuke-benchmarks branch from 2fd73cc to 8b19137 Compare November 13, 2024 14:08
@maflcko
Copy link
Member

maflcko commented Jan 20, 2025

Not sure why it was closed. It seemed like a useful doc update to me, but maybe I am missing something. Feel free to ACK/NACK #31690, which is just the doc update from here cherry-picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.