util: improve FindByte() performance #19690

LarryRuane · 2020-08-10T03:49:28Z

This PR is strictly a performance improvement; there is no functional change. The CBufferedFile::FindByte() method searches for the next occurrence of the given byte in the file. Currently, this is done by explicitly inspecting each byte in turn. This PR takes advantage of std::find() to do the same more efficiently, improving its CPU runtime by a factor of about 25 in typical use.

LarryRuane · 2020-08-10T03:50:05Z

This PR was suggested by @hebasto #16981 (comment) (thank you!)

promag

The performance gain looks substantial - didn't verified.

promag · 2020-08-10T08:09:41Z

src/streams.h

-            nReadPos++;
+            size_t n = vchBuf.size() - start;
+            if (n > nSrcPos - nReadPos)
+                n = nSrcPos - nReadPos;


nit, join with above line.

hebasto · 2020-08-10T11:10:25Z

@LarryRuane

... improving its CPU runtime by a factor of about 25 in typical use.

How was it measured?

laanwj · 2020-08-10T11:38:51Z

Any idea why this is so much faster? As far as I know, there is no faster algorithm to look for the first occurrence of a single byte in a memory array than a linear iteration over it. I'd expect std::find of a byte to simply unroll into a loop.

LarryRuane · 2020-08-10T14:38:55Z

How was it measured?

I ran:

time src/test/test_bitcoin --run_test=streams_tests/streams_findbyte

with master (1m20s) and with master + PR (3s).

Any idea why this is so much faster?

I think you're correct that there's no faster way than a loop with runtime proportional to the number of bytes to scan, but I assume std::find() on a char vector is highly optimized, probably using memchr() or memcmp(), which are implemented in assembly language. Also, the master version does a few things each byte (testing nReadPos == nSrcPos, remainder calculation (%), incrementing nReadPos) that the PR does once for each large run of bytes.

I just noticed that the repetition count on the test is set to a large number (50000000) and I meant to reduce it for the commit (3 seconds is too long to add to the unit test suite). I'll reduce that number in force push in a minute. This test doesn't really need to be in this PR (FindByte()'s functionality is tested very well in another test), but it helps reviewers verify the performance improvement.

LarryRuane · 2020-08-10T14:46:52Z

Force-push a small fix to the test, so it doesn't take 3 seconds to run.

laanwj · 2020-08-10T16:43:57Z

but I assume std::find() on a char vector is highly optimized, probably using memchr() or memcmp(), which are implemented in assembly language

That's true, it's possible to optimize that with assembly language (definitely with specific instruction sets).

it still surprises me because you'd expect the I/O, to read the data from disk, to dominate greatly in the block importing. Not looking for a character already in memory! It just seems out of proportion.

glozow · 2020-08-10T19:19:00Z

src/test/streams_tests.cpp

+    fwrite(&b, 1, 1, file);
+    rewind(file);
+    CBufferedFile bf(file, fileSize * 2, fileSize, 0, 0);
+    for (int rep = 0; rep < 100; ++rep) {


I don't fully understand how the performance increase is so significant, but why not a bench if you're worried about burdening the unit tests? I tried to do this but I must be doing something wrong because I can't seem to reproduce the speedup. 😞

The performance impact may be very compiler/architecture/stdlib dependent. I'm kind of surprised std::find has optimizations beyond the naive loop implementation in the first place on some platforms, so I certainly wouldn't be surprised if others don't have it.

@gzhao408, thank you, I wasn't aware of bench. I suspect the iteration count, 100, is far too low and the difference is swamped by the noise. I increased the iteration count to 10m (1e7) and it showed the expected difference, master: 1,659.30 ns/op, PR: 52.66 ns/op (ratio is about 31).

I just force-pushed (diff) to remove the unit test (which was only for benchmarking, not really testing anything) and cherry-pick Gloria's benchmark. I added one more commit to increase the iteration count.

LarryRuane · 2020-08-10T20:08:50Z

it still surprises me because you'd expect the I/O, to read the data from disk, to dominate greatly in the block importing.

FindByte() only reads from disk by calling Fill() (when the buffer is empty), which is rare. In this test, Fill() gets called only once, the first time FindByte() runs, because I wanted to isolate the modified part of the code.

sipa · 2020-08-11T00:42:37Z

@gmaxwell pointed out to me why this is so much faster: it's not that std::find is amazing, but that the original code (which I wrote in 2012, it seems!) is doing a modulus operation for every character (which is often orders of magnitude slower than the byte comparison or addition/subtraction).

Thinking about this a bit more high level: the end goal is just to scan quickly for the 4-byte network magic in a file. If this is relevant for performance (and it seems it may be), it may be better to have a function that does exactly that in CBufferedFile, rather than a search for one byte + memcmp. std::search is probably what you want.

LarryRuane · 2020-08-12T14:10:48Z

Here's a version that's very close in implementation to master that eliminates the % operation on every character:

    void FindByte(char ch) {
        size_t start = nReadPos % vchBuf.size();
        while (true) {
            if (nReadPos == nSrcPos)
                Fill();
            if (vchBuf[start] == ch)
                break;
            nReadPos++;
            start++;
            if (start >= vchBuf.size()) start = 0;
        }

and that does make a significant difference; bench_bitcoin reports 314 ns/op for this version. (My laptop isn't set up for very accurate CPU benchmarking, but my results are pretty consistent, varying by less than 1% across multiple runs.) That's about 5.3 times as fast as master (1,659.30 ns/op, as mentioned above). The PR (52.66 ns/op) is still about 6 times faster than this version. So this version is right about in the middle (in terms of ratios) between master and this PR.

I do like @sipa's suggestion to generalize FindByte() to search for a given sequence of bytes (maybe called FindBytes() or Search(), suggestions welcome); that's a much nicer interface. I'll push a commit to do that within the next few hours.

LarryRuane · 2020-08-13T05:35:58Z

I just added a new commit (25413ab, can squash later) to implement @sipa's suggestion to add a method to CBufferedFile to find a sequence of bytes, rather than just one byte, as FindByte() does. This simplifies LoadExternalBlockFile(); the overall code base isn't simpler, but it encapsulates complexity nicely within the CBufferedFile class.

It didn't work out to use std::search() because that function requires the bytes being searched to be consecutive in memory -- that works if the entire file is read into contiguous memory. But CBufferedFile implements a circular buffer. The bytes we're searching for might be partially in and partially out of memory (bytes are read on demand), and even if all are in memory, they may be split between the end of the buffer and the beginning (because of wraparound), so trying to use std::search gets complicated. FindByte() doesn't have these problems since it's looking for a single byte.

adamjonas · 2020-08-14T00:39:41Z

Looks like one of the sanitizers is finding an implicit-integer-sign-change problem in the latest update:

�[0;34m node0 stderr streams.h:857:22: runtime error: implicit conversion from type 'unsigned char' of value 250 (8-bit, unsigned) to type 'char' changed the value to -6 (8-bit, signed)
    #0 0x55f6d4cc6285 in CBufferedFile::Search(unsigned char const*, unsigned long) /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/./streams.h:857:22
    #1 0x55f6d4c967a0 in LoadExternalBlockFile(CChainParams const&, _IO_FILE*, FlatFilePos*) /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/validation.cpp:4682:24
    #2 0x55f6d45071f6 in ThreadImport(ChainstateManager&, std::vector<boost::filesystem::path, std::allocator<boost::filesystem::path> >) /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/init.cpp:705:13
    #3 0x55f6d4506814 in AppInitMain(util::Ref const&, NodeContext&)::$_10::operator()() const /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/init.cpp:1853:96
    #4 0x55f6d4506814 in std::_Function_handler<void (), AppInitMain(util::Ref const&, NodeContext&)::$_10>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:300:2
    #5 0x55f6d4585b29 in std::function<void ()>::operator()() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688:14
    #6 0x55f6d45181bb in void TraceThread<std::function<void ()> >(char const*, std::function<void ()>) /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/./util/system.h:438:9
    #7 0x55f6d4506447 in void std::__invoke_impl<void, void (*)(char const*, std::function<void ()>), char const*, AppInitMain(util::Ref const&, NodeContext&)::$_10>(std::__invoke_other, void (*&&)(char const*, std::function<void ()>), char const*&&, AppInitMain(util::Ref const&, NodeContext&)::$_10&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14
    #8 0x55f6d450616a in std::__invoke_result<void (*)(char const*, std::function<void ()>), char const*, AppInitMain(util::Ref const&, NodeContext&)::$_10>::type std::__invoke<void (*)(char const*, std::function<void ()>), char const*, AppInitMain(util::Ref const&, NodeContext&)::$_10>(void (*&&)(char const*, std::function<void ()>), char const*&&, AppInitMain(util::Ref const&, NodeContext&)::$_10&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14
    #9 0x55f6d4505db2 in void std::thread::_Invoker<std::tuple<void (*)(char const*, std::function<void ()>), char const*, AppInitMain(util::Ref const&, NodeContext&)::$_10> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13
    #10 0x55f6d4505db2 in std::thread::_Invoker<std::tuple<void (*)(char const*, std::function<void ()>), char const*, AppInitMain(util::Ref const&, NodeContext&)::$_10> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11
    #11 0x55f6d4505db2 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(char const*, std::function<void ()>), char const*, AppInitMain(util::Ref const&, NodeContext&)::$_10> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13
    #12 0x7f20f2f18cb3  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd6cb3)
    #13 0x7f20f3317608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x9608)
    #14 0x7f20f2bf5102 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x122102)

SUMMARY: UndefinedBehaviorSanitizer: implicit-integer-sign-change streams.h:857:22

LarryRuane · 2020-08-14T06:05:58Z

Thanks, @adamjonas, force-pushed a fix for that signed-unsigned CI failure, added another unit test, some improvements to Search(), some clang-format-diff.py cleanups.

DrahtBot · 2020-08-20T20:44:32Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	stickies-v, achow101
Stale ACK	laanwj, sipa, hebasto, john-moffett

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

elichai · 2020-08-30T10:09:09Z

src/bench/streams_findbyte.cpp

+static void FindByte(benchmark::Bench& bench)
+{
+    // Setup
+    FILE* file = fsbridge::fopen("streams_tmp", "w+b");


Could we maybe use something like memfd_create(2) for the benchmark, to decrease I/O noise?
Anyone knows if there is a windows equivalent or a higher abstraction in boost::filesystem?

fmemopen(3) also looks interesting

Maybe this can give us that? https://www.boost.org/doc/libs/1_72_0/libs/iostreams/doc/classes/mapped_file.html

Thanks for the suggestions, but I don't think this matters, because the file IO (read) occurs only on the very first benchmark loop iteration; after that, the data is in memory and there is no IO at all. Each iteration repositions the stream pointer to zero (bf.SetPos(0)) and then searches forward for the value 1 (bf.FindByte(1)), which is 200 bytes away. But after the first iteration, all of these bytes are in memory, and we're just moving the position between 0 and 200.

The reason for the 200, by the way, is that with random data, searching for a random byte value the average distance would be half of 256. But the data in blk.dat files (which is the only use of this code currently) isn't quite random; zero and 0xff occur more often (and we never search for those values). So I guessed that 200 is close to a typical distance that this function would move through before finding the requested byte. Maybe I should explain these points in a comment in bench/streams_findbyte.cpp.

laanwj · 2020-11-20T04:08:34Z

Code review ACK 8f2e8c2, this looks like a better abstraction, and it's an improvement not to do a modulus for every byte.

LarryRuane · 2020-12-04T21:33:10Z

Force-pushed to rationalize the commits, so I think it's in a good state for merging. I reordered the commit that adds the benchmark test, b576994, to be first, so it's easy for reviewers to checkout that commit and run the new benchmark test without the improvements:

src/bench/bench_bitcoin -filter=FindByte

I re-ran those tests just now, and on my laptop, the ns/op without the PR is 1,840, and with the PR it's 55 (an improvement of more than a factor of 33).

sipa · 2020-12-04T22:33:26Z

Code review ACK 566c2e2. I think it'd be good to move the refactoring to FindByte in the last commit to a separate commit.

LarryRuane · 2020-12-05T01:14:18Z

Force-pushed to implement latest review suggestion, changes are more cleanly separated among the commits (no code changes overall), thanks @sipa.

hebasto

Concept ACK.

src/bench/streams_findbyte.cpp

src/test/streams_tests.cpp

LarryRuane · 2023-03-02T18:17:20Z

@john-moffett - Good idea to bring back the earlier commit (the condition branch instruction theory makes sense); I just restored (force-pushed) it as you suggested. On my x86 (ns/op, lower is better):

master: 307
previous version of this PR (08cf6a9): 135
current version (using std::find): 34

achow101 · 2023-03-15T16:13:10Z

ACK dacd331

stickies-v

Approach ACK dacd331

I'm now seeing bench performance improvement on my M1 again, from ~150ns/op -> ~85ns/op.

stickies-v · 2023-03-15T18:47:05Z

src/streams.h

@@ -744,13 +744,22 @@ class CBufferedFile
    //! search for a given byte in the stream, and remain positioned on it
    void FindByte(uint8_t ch)
    {
+        const std::byte byte{static_cast<std::byte>(ch)};


Given that there is only a single non-test callsite of FindByte(), would it make sense to just update the fn signature to take std::byte directly?

stickies-v · 2023-03-15T18:47:43Z

src/streams.h

@@ -744,13 +744,22 @@ class CBufferedFile
    //! search for a given byte in the stream, and remain positioned on it
    void FindByte(uint8_t ch)
    {
+        const std::byte byte{static_cast<std::byte>(ch)};
+        size_t buf_offset = m_read_pos % vchBuf.size();


nit: and a bunch more of those

Suggested change

size_t buf_offset = m_read_pos % vchBuf.size();

size_t buf_offset{m_read_pos % vchBuf.size()};

stickies-v · 2023-03-16T11:44:27Z

src/streams.h

@@ -744,13 +744,22 @@ class CBufferedFile
    //! search for a given byte in the stream, and remain positioned on it


I think some of the rationale of this implementation should be in the docs so future contributors don't simplify the code again to unintentionally undo the performance gains, e.g. why the modulo operator is kept outside of the while loop seems quite important and non-trivial?

Unrelated to this PR, but I think it would also be helpful to improve the docs to specify that if ch is not found, std::ios_base::failure is thrown (from Fill()). It's an essential part of the interface imo.

Perhaps worth improving on this behaviour, and have FindByte() throw its own error, by wrapping the Fill() command in a try/catch? Orthogonal to this PR, though. (And I also don't like that we're catching a general std::exception for a FindByte() failure, but again, orthogonal.)

LarryRuane · 2023-03-20T04:08:19Z

Force-pushed for review comments (thanks, @stickies-v), verified benchmark performance is unchanged (ns/op with PR: 34.76, without PR: 302). Summary of force-push changes:

change FindByte() argument type from uint8_t to std::byte)
add "exception" comment before call to Fill()
add comment suggesting to avoid mod (%) operator within loop
changed assignment statements to use more modern braces syntax

LarryRuane · 2023-03-20T20:08:30Z

Force-pushed again to fix CI failures.

stickies-v

ACK 0fe832c

stickies-v · 2023-03-21T17:07:52Z

src/streams.h

Instead of changing the callsites of FindByte(), how about adding a uint8_t overload? I think it keeps the implementation clean, but since it can easily be argued that uint8_t also is a byte, this keeps the callsites straightforward and reduces the diff.

git diff

diff --git a/src/bench/streams_findbyte.cpp b/src/bench/streams_findbyte.cpp index 175564fe9..7b2e65da2 100644 --- a/src/bench/streams_findbyte.cpp +++ b/src/bench/streams_findbyte.cpp @@ -20,7 +20,7 @@ static void FindByte(benchmark::Bench& bench) bench.run([&] { bf.SetPos(0); - bf.FindByte(std::byte(1)); + bf.FindByte(1); }); // Cleanup diff --git a/src/streams.h b/src/streams.h index 2558bd830..9280fa013 100644 --- a/src/streams.h +++ b/src/streams.h @@ -763,6 +763,8 @@ public: if (buf_offset >= vchBuf.size()) buf_offset = 0; } } + + void FindByte(uint8_t byte) { return FindByte(static_cast<std::byte>(byte)); } }; #endif // BITCOIN_STREAMS_H diff --git a/src/test/fuzz/buffered_file.cpp b/src/test/fuzz/buffered_file.cpp index 2f7ce60c7..67cac8fa4 100644 --- a/src/test/fuzz/buffered_file.cpp +++ b/src/test/fuzz/buffered_file.cpp @@ -53,7 +53,7 @@ FUZZ_TARGET(buffered_file) return; } try { - opt_buffered_file->FindByte(std::byte(fuzzed_data_provider.ConsumeIntegral<uint8_t>())); + opt_buffered_file->FindByte(fuzzed_data_provider.ConsumeIntegral<uint8_t>()); } catch (const std::ios_base::failure&) { } }, diff --git a/src/test/streams_tests.cpp b/src/test/streams_tests.cpp index 79bc7b7c0..1db5b61f1 100644 --- a/src/test/streams_tests.cpp +++ b/src/test/streams_tests.cpp @@ -462,7 +462,7 @@ BOOST_AUTO_TEST_CASE(streams_buffered_file_rand) size_t find = currentPos + InsecureRandRange(8); if (find >= fileSize) find = fileSize - 1; - bf.FindByte(std::byte(find)); + bf.FindByte(find); // The value at each offset is the offset. BOOST_CHECK_EQUAL(bf.GetPos(), find); currentPos = find; diff --git a/src/validation.cpp b/src/validation.cpp index a79b81add..b42b39861 100644 --- a/src/validation.cpp +++ b/src/validation.cpp @@ -4438,7 +4438,7 @@ void Chainstate::LoadExternalBlockFile( try { // locate a header unsigned char buf[CMessageHeader::MESSAGE_START_SIZE]; - blkdat.FindByte(std::byte(params.MessageStart()[0])); + blkdat.FindByte(params.MessageStart()[0]); nRewind = blkdat.GetPos() + 1; blkdat >> buf; if (memcmp(buf, params.MessageStart(), CMessageHeader::MESSAGE_START_SIZE)) {

stickies-v · 2023-03-21T17:08:54Z

src/streams.h

+            if (m_read_pos == nSrcPos) {
+                // No more bytes available; read from the file into the buffer,
+                // setting nSrcPos to one beyond the end of the new data.
+                // Throws exception if end-of-file reached.


Given that it's part of the interface, I think this needs to be documented on the function level so devs wanting to use FindByte know how it behaves when the byte isn't found - they shouldn't need to dive into the implementation.

stickies-v · 2023-03-21T17:12:42Z

src/streams.h

    {
+        // For best performance, avoid mod operation within the loop.


nit

Suggested change

// For best performance, avoid mod operation within the loop.

// The modulus operation is much more expensive than byte

// comparison and addition, so we keep it out of the loop to

// improve performance (see #19690 for discussion).

achow101 · 2023-04-21T18:35:00Z

ACK 0fe832c

john-moffett

ACK 0fe832c

achow101 · 2023-04-21T22:45:08Z

Silent merge conflict with master:

../../../src/bench/streams_findbyte.cpp:7:10: fatal error: fs.h: No such file or directory
    7 | #include <fs.h>
      |          ^~~~~~
compilation terminated.

Avoid use of the expensive mod operator (%) when calculating the buffer offset. No functional difference. Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>

LarryRuane · 2023-05-05T12:04:36Z

Force pushed rebase to fix hidden merge conflict, thanks @achow101

stickies-v

re-ACK 72efc26

Verified that the only difference is to include <util/fs.h> instead of <fs.h> (introduced by 00e9b97)

% git range-diff HEAD~2 0fe832c4a4b2049fdf967bca375468d5ac285563 HEAD
1:  5842d92c8 ! 1:  604df63f6 [bench] add streams findbyte
    @@ src/bench/streams_findbyte.cpp (new)
     +
     +#include <bench/bench.h>
     +
    -+#include <fs.h>
    ++#include <util/fs.h>
     +#include <streams.h>
     +
     +static void FindByte(benchmark::Bench& bench)
2:  0fe832c4a = 2:  72efc2643 util: improve streams.h:FindByte() performance

achow101 · 2023-05-10T21:40:35Z

re-ACK 72efc26

72efc26 util: improve streams.h:FindByte() performance (Larry Ruane) 604df63 [bench] add streams findbyte (gzhao408) Pull request description: This PR is strictly a performance improvement; there is no functional change. The `CBufferedFile::FindByte()` method searches for the next occurrence of the given byte in the file. Currently, this is done by explicitly inspecting each byte in turn. This PR takes advantage of `std::find()` to do the same more efficiently, improving its CPU runtime by a factor of about 25 in typical use. ACKs for top commit: achow101: re-ACK 72efc26 stickies-v: re-ACK 72efc26 Tree-SHA512: ddf0bff335cc8aa34f911aa4e0558fa77ce35d963d602e4ab1c63090b4a386faf074548daf06ee829c7f2c760d06eed0125cf4c34e981c6129cea1804eb3b719

fanquake added the Utils/log/libs label Aug 10, 2020

promag reviewed Aug 10, 2020

View reviewed changes

LarryRuane force-pushed the FindByte-speedup branch from ab412ec to a31aa32 Compare August 10, 2020 14:45

glozow reviewed Aug 10, 2020

View reviewed changes

LarryRuane force-pushed the FindByte-speedup branch from a31aa32 to 8b07e17 Compare August 10, 2020 21:33

LarryRuane force-pushed the FindByte-speedup branch from 25413ab to 8f2e8c2 Compare August 14, 2020 06:00

DrahtBot mentioned this pull request Aug 21, 2020

Improve runtime performance of --reindex #16981

Merged

This was referenced Aug 28, 2020

multiprocess: Add bitcoin-gui -ipcconnect option #19461

Draft

multiprocess: Add bitcoin-wallet -ipcconnect option #19460

Draft

elichai reviewed Aug 30, 2020

View reviewed changes

LarryRuane force-pushed the FindByte-speedup branch from 8f2e8c2 to 566c2e2 Compare December 4, 2020 21:25

LarryRuane force-pushed the FindByte-speedup branch from 566c2e2 to 134de90 Compare December 5, 2020 01:10

hebasto requested changes Dec 5, 2020

View reviewed changes

src/bench/streams_findbyte.cpp Outdated Show resolved Hide resolved

src/test/streams_tests.cpp Outdated Show resolved Hide resolved

src/test/streams_tests.cpp Outdated Show resolved Hide resolved

LarryRuane force-pushed the FindByte-speedup branch from 08cf6a9 to dacd331 Compare March 2, 2023 18:08

DrahtBot requested review from hebasto, laanwj and sipa March 15, 2023 16:13

stickies-v reviewed Mar 16, 2023

View reviewed changes

LarryRuane force-pushed the FindByte-speedup branch from dacd331 to 52804a5 Compare March 20, 2023 04:05

LarryRuane force-pushed the FindByte-speedup branch 2 times, most recently from acef167 to 0fe832c Compare March 20, 2023 17:03

stickies-v approved these changes Mar 22, 2023

View reviewed changes

DrahtBot requested a review from achow101 March 22, 2023 13:38

DrahtBot removed the request for review from achow101 April 21, 2023 18:35

john-moffett approved these changes Apr 21, 2023

View reviewed changes

glozow and others added 2 commits May 5, 2023 06:03

[bench] add streams findbyte

604df63

util: improve streams.h:FindByte() performance

72efc26

Avoid use of the expensive mod operator (%) when calculating the buffer offset. No functional difference. Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>

LarryRuane force-pushed the FindByte-speedup branch from 0fe832c to 72efc26 Compare May 5, 2023 12:03

stickies-v approved these changes May 5, 2023

View reviewed changes

DrahtBot requested review from achow101 and john-moffett May 5, 2023 13:16

DrahtBot removed the request for review from achow101 May 10, 2023 21:40

achow101 merged commit 3ff67f7 into bitcoin:master May 10, 2023

bitcoin locked and limited conversation to collaborators May 9, 2024

	size_t buf_offset = m_read_pos % vchBuf.size();
	size_t buf_offset{m_read_pos % vchBuf.size()};

		@@ -744,13 +744,22 @@ class CBufferedFile
		//! search for a given byte in the stream, and remain positioned on it

		{
		// For best performance, avoid mod operation within the loop.

-        // For best performance, avoid mod operation within the loop.
+        // The modulus operation is much more expensive than byte
+        // comparison and addition, so we keep it out of the loop to
+        // improve performance (see #19690 for discussion).

util: improve FindByte() performance #19690

util: improve FindByte() performance #19690

Uh oh!

Conversation

LarryRuane commented Aug 10, 2020

Uh oh!

LarryRuane commented Aug 10, 2020

Uh oh!

promag left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hebasto commented Aug 10, 2020

Uh oh!

laanwj commented Aug 10, 2020

Uh oh!

LarryRuane commented Aug 10, 2020

Uh oh!

LarryRuane commented Aug 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj commented Aug 10, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LarryRuane commented Aug 10, 2020

Uh oh!

sipa commented Aug 11, 2020

Uh oh!

LarryRuane commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LarryRuane commented Aug 13, 2020

Uh oh!

adamjonas commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LarryRuane commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Aug 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laanwj commented Nov 20, 2020

Uh oh!

LarryRuane commented Dec 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Dec 4, 2020

Uh oh!

LarryRuane commented Dec 5, 2020

Uh oh!

hebasto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LarryRuane commented Mar 2, 2023

Uh oh!

achow101 commented Mar 15, 2023

Uh oh!

stickies-v left a comment

Choose a reason for hiding this comment

Uh oh!

LarryRuane commented Aug 10, 2020 •

edited

Loading

LarryRuane commented Aug 12, 2020 •

edited

Loading

adamjonas commented Aug 14, 2020 •

edited

Loading

LarryRuane commented Aug 14, 2020 •

edited

Loading

DrahtBot commented Aug 20, 2020 •

edited

Loading

LarryRuane commented Dec 4, 2020 •

edited

Loading

stickies-v left a comment •

edited

Loading