-
Notifications
You must be signed in to change notification settings - Fork 37.8k
bench: Remove various extraneous benchmarks #31153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31153. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. Perhaps also worth updating doc/benchmarking.md
with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about some of these. People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn't degrade performance, and having existing benchmarks to refer to may make that easier. Seems odd to throw out work like #22284.
Are any of the benchmarks called by outside sites that track their results over time?
@@ -1,64 +0,0 @@ | |||
// Copyright (c) 2020-2022 The Bitcoin Core developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about removing the logging benchmarks. Logging can be very high frequency and also used for logging lock contentions. Even if that is more likely to occur for developers, it may be useful or valuable to be able to track this.
Concept ACK - agree with stickies-v re updating the doc. |
Concept ACK |
I think this documentation update would be helpful to get in even without this PR. |
Imo, performance improvements (and accompanying benchmarks) should only be made to code that makes critical things slower than they need to be. Otherwise we're just improving performance for the sake of doing it, which is not something we have the resources for. We should aim our resources with intention, and the benchmarks can act as the scope. For example, should we spend our time making hex parsing 500x faster or maybe shave 10% from IBD? I'm arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don't get distracted.
https://corecheck.dev sort of does but it's quite noisy. |
Will add the doc changes next week |
Hex parsing is used in mining via Also, while some of the benchmarks do not exist to be optimized, but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from In any case, updating the docs should be uncontroversial and useful going forward. |
This is a good reason to keep.
And this is an important statement. We don't necessarily need to throw away work that has previously been a useful metric in issues (#17501) to achieve the goal of new benchmark contributions being focused on specific areas. |
The benchmarks weren't mentioned in the issue you linked or the PR that fixed the issue? |
I said it's a useful metric, as in given that context, would it not be nice to have the benchmark? |
I see. No I don't think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn't have found that particular issue and therefore were also not useful in verifying the fix. practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating). |
I'm not intending to argue, and as i said your main point in the PR is valid. If a bench didn't exist at the time of the issue, I imagine a review comment would have mentioned it would be nice to have, as a metric. I don't think that's controversial. |
Added the doc commit, ready for further review |
Concept ACK, Approach NACK I think there should be a burden on PR's that claim to improve performance to demonstrate a user-impacting scenario where the improvements are relevant, and I agree that having benchmarks that test workloads that will never impact users leads to PR's being opened and review being spent on things that don't deserve attention. I think a similar burden exists for this PR to demonstrate or at least describe a rationale for why these removed benchmarks don't test behavior that would impact users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion on the removals. I think the doc update is valuable and could be extended. Simply declaring the removed benchmarks as potentially not valuable, when looked at too narrowly, could be less controversial and still achieve the essence of the goal of this pull request.
Not yet sure on this PR we already have All the benchmarks being removed seem to be
I understand this is one of the motivations for this change does this happen often for benchmarks that are marked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I wouldn't oppose deleting some benchmarks, but I also like the idea of using priorities. That, combined with the doc should be sufficient to deter micro-optimiziations in non-critical code.
|
||
Benchmarks are ill-suited for testing of denial of service issues as they are | ||
restricted to the same input set (introducing bias). [Fuzz | ||
tests](/doc/fuzzing.md) are better suited for this purpose, as they are much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow this point: Under "denial of service" I would understand attacks that overwhelm the target with repeated requests (which are often but not necessarily identical) and as result may just slow it down / make it unresponsive instead of outright crashing it - while I agree that benchmarks don't really find these, I wouldn't see fuzz tests designed to find crashes as the best way to test these issues (instead of integration tests with some sort of stress)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fuzz testing does not only find crashes, it finds slow running inputs as well. Fuzz engines usually terminate an input's execution if it takes longer than N seconds to run, usually referred to as a "timeout". See #28812 for examples.
Depending on what the harness is testing, a slow input might indicate a DoS issue (e.g. infinite loop, quadratic behavior, ...).
fcfb1d8
to
2fd73cc
Compare
I'm not sure if using the priority levels is a good alternative to deleting the benches. Afaict, the priority level was introduced to avoid running slow benchmarks in the CI as apposed to indicating how "important" they are. As a result, we're not catching issues with these tests until someone executes them (e.g. #29061 (comment)) and I'd prefer not to expand on that. |
2fd73cc
to
8b19137
Compare
Not sure why it was closed. It seemed like a useful doc update to me, but maybe I am missing something. Feel free to ACK/NACK #31690, which is just the doc update from here cherry-picked. |
Measuring performance of components that do not meaningfully impact the performance of critical aspects of the system (e.g. block validation or mining) seems extraneous.
Maintaining extraneous benchmark tests incentivizes non-impactful improvements. This frequently comes up with new contributors and they can't be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.