Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Jul 10, 2023

This allows AutoFile to roll an XOR pattern while reading or writing to the underlying file.

This is needed for #28052, but can also be used in any other place.

Also, there are tests, so I've split this up from the larger pull to make review easier, hopefully.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 10, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK Crypt-iQ, willcl-ark, jamesob
Concept ACK sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26606 (wallet: Implement independent BDB parser by achow101)
  • #26596 (wallet: Migrate legacy wallets to descriptor wallets without requiring BDB by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot changed the title streams: Add XorFile streams: Add XorFile Jul 10, 2023
@maflcko maflcko changed the title streams: Add XorFile util: Add XorFile Jul 10, 2023
@DrahtBot DrahtBot changed the title util: Add XorFile util: Add XorFile Jul 10, 2023
Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

In general looks good, but I think you could really cut down on the amount of code here if you just made XorFile a subclass of CAutoFile, and made whatever small adjustments and modernizations necessary to CAutoFile to get that to work.

E.g. XorFile::ignore() seems like essentially the same code as CAutoFile::ignore(), just with some std:: prefixes. Seems like code we don't want to maintain two copies of.

src/streams.cpp Outdated
void XorFile::write(Span<const std::byte> src)
{
if (!m_file) throw std::ios_base::failure("XorFile::write: file handle is nullptr");
std::array<std::byte, 4096> buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fa9325a

Might be interesting to see how statically allocating a buffer (say, as a member of XorFile) and reusing it might affect performance; may avoid some reallocations. Not sure how frequently this is actually called though. I expect this probably won't make a difference though, because I guess XorFile::write() will be called on the order of once per block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, but my blind guess would be that calling std::fwrite takes longer than anything else in this function. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run benchmarks against statically allocating a buffer, but it may be useful since AutoFile::write is called many times per block. Basically every time the pattern CAutoFile << x is used which happens for every element of the block header, each transaction input, output, etc. I also ran a small unit test through perf where there were many writes and found that std::ftell dominated with std::fwrite close behind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to have a heat-map (Percent of time spent in each line) for this function for IBD, or a unit test that writes a block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also ran a small unit test through perf where there were many writes and found that std::ftell dominated with std::fwrite close behind

For me fwrite dominated over ftell, but the most time was spent on XOR:

ROUTINE ====================== AutoFile::write in src/streams.cpp
    49    100 Total samples (flat / cumulative)
     .      .   40:         nSize -= nNow;
     .      .   41:     }
     .      .   42: }
     .      .   43: 
     .      .   44: void AutoFile::write(Span<const std::byte> src)
---
     .      .   45: {
     .      .   46:     if (!m_file) throw std::ios_base::failure("AutoFile::write: file handle is nullptr");
     .      .   47:     if (m_xor.empty()) {
     .      .   48:         if (std::fwrite(src.data(), 1, src.size(), m_file) != src.size()) {
     .      .   49:             throw std::ios_base::failure("AutoFile::write: write failed");
     .      .   50:         }
     .      .   51:     } else {
     .      .   52:         std::array<std::byte, 4096> buf;
     .      .   53:         while (src.size() > 0) {
     .      .   54:             auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
     .      2   55:             std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
     .     16   56:             const auto current_pos{std::ftell(m_file)};
    49     49   57:             util::Xor(buf_now, m_xor, current_pos);
     .     33   58:             if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
     .      .   59:                 throw std::ios_base::failure{"XorFile::write: failed"};
     .      .   60:             }
     .      .   61:             src = src.subspan(buf_now.size());
     .      .   62:         }
     .      .   63:     }
     .      .   64: }
---

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it must have been a fluke because when I synced up to a little past block ~200K, fwrite was in the lead and ftell and xor were about tied at ~3% each of the total time spent. I wrote a test here Crypt-iQ@bf924d6 that requires the benchmarks to be compiled but is not a benchmark. I made a patch in the following commit (Crypt-iQ@be07dbc) that reduced the number of ftell calls to 1 for AutoFile and it gave a noticeable performance improvement. I think it's ok to track the file position this way, but I might be wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the data written. My benchmark was for 1MB written. With 33 bytes written, ftell is the dominant:

ROUTINE ====================== AutoFile::write in src/streams.cpp
     2     28 Total samples (flat / cumulative)
     .      .   40:         nSize -= nNow;
     .      .   41:     }
     .      .   42: }
     .      .   43: 
     .      .   44: void AutoFile::write(Span<const std::byte> src)
---
     .      .   45: {
     .      .   46:     if (!m_file) throw std::ios_base::failure("AutoFile::write: file handle is nullptr");
     .      .   47:     if (m_xor.empty()) {
     .      .   48:         if (std::fwrite(src.data(), 1, src.size(), m_file) != src.size()) {
     .      .   49:             throw std::ios_base::failure("AutoFile::write: write failed");
     .      .   50:         }
     .      .   51:     } else {
     .      .   52:         std::array<std::byte, 4096> buf;
     .      .   53:         while (src.size() > 0) {
     .      .   54:             auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
     .      .   55:             std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
     .     23   56:             const auto current_pos{std::ftell(m_file)};
     1      1   57:             util::Xor(buf_now, m_xor, current_pos);
     1      4   58:             if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
     .      .   59:                 throw std::ios_base::failure{"XorFile::write: failed"};
     .      .   60:             }
     .      .   61:             src = src.subspan(buf_now.size());
     .      .   62:         }
     .      .   63:     }
     .      .   64: }
---

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case the bottleneck doesn't seem to be std::array<std::byte, 4096> buf;, so I think this thread can be closed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I didn't see allocations come up at all in perf

@maflcko maflcko changed the title util: Add XorFile util: Teach AutoFile how to XOR Jul 12, 2023
MarcoFalke added 2 commits July 12, 2023 09:59
The shift operators will call the write or read member function, which
already does the check. Also, call sites are free to directly call
::(Un)Serialize(s, obj) to skip this check, so removing it increases
consistency.
@maflcko
Copy link
Member Author

maflcko commented Jul 12, 2023

I think you could really cut down on the amount of code here

Thanks, rewritten from scratch to reduce the lines of code.

No need to artificially bloat the code and waste space.
@maflcko maflcko force-pushed the 2307-xor-file- branch 2 times, most recently from fa6b8c6 to fadb325 Compare July 12, 2023 13:34
Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fadb325 (jamesob/ackr/28060.2.MarcoFalke.util_add_xorfile)

Looks good! More concise change now that we're just adding XOR capability to existing classes.

Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK fadb32556f64a72377c82be71a207195e6fcf68b ([`jamesob/ackr/28060.2.MarcoFalke.util_add_xorfile`](https://github.com/jamesob/bitcoin/tree/ackr/28060.2.MarcoFalke.util_add_xorfile))

Looks good! More concise change now that we're just adding XOR capability to existing classes.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmSu17QACgkQepNdrbLE
TwWM2Q/5AQGR5Uv5aFQSPvjtd9ZbcPyYvDzwSjkFImCnCrVGeTlHux6rsS9SzL8y
gybawiRHUTHjLCO8j5ItvM7/stmyq4UiTk8QCOQL39dGJgBVjV+/pQnCqPlMSU1d
KtObCvXlVc5jWgr7OdHRDtDb5oZi1jc+bcr5ReVCggTdpcvjHxkxLSjRXEFQoeQi
kY/wa5yvids1kyCotiScB+zvc3FwVKj3/lWpPWYkpXdTxmqAjpmgweQlBqq90a2a
I5D0oI1fyerVzJrYLi5EQfY8PQZBhUDjJSzyMTpxr0EqXSRGWCDl5Zm4m0Mxo+8Q
B+Wymo/Dvf196ugDBbUWnQM/dG4tqQRZAB3U1VHwBUemb+fZgDRBeeyAs6/aIHik
YwUR8amJM08ClAJUfSaNXL/OouMqOoMds5tXwMM+Z/2msAgpdXx7c6ifGFaMQm5S
voElavRD55OLFn9UYKZHusF3EmeTN094pWkwMCeuFnnW9+Rpar5m0nvNuH7T/viI
xdFM2V/ypzkDtyDg21ZrR+OcqlGIsj0x8BB7rzs1JaxeoyO6Zshj7RW1JCPTJtBq
P2KnEiD8DoFL9gTuL3TtnT/5oIJ3Quqb8A8GnU4amzjOcUyg9mg3CwMuWuh17odO
LIckwWfQZlo7wzzOUXwE4i/Xj0ClbyNPgzroOzcXz12WdSsAQWw=
=SQKV
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-6.2.10-arch1-1-x86_64-with-glibc2.37

Configured with ./configure 'BDB_LIBS=-L/home/james/src/bitcoin/db4/lib -ldb_cxx-4.8' BDB_CFLAGS=-I/home/james/src/bitcoin/db4/include 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare ' --disable-gui-tests --disable-fuzz-binary --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh --no-create --no-recursion

Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare  -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i

Compiler version: clang version 15.0.7

@sipa
Copy link
Member

sipa commented Jul 12, 2023

(Slight tangent, not necessarily for this PR)

I was wondering if it wouldn't be better to XOR with a proper (but non-cryptographic) RNG output instead of a fixed byte pattern, as it'd avoid repetitive patterns remaining in the file, or weak keys that could accidentally be selected.

E.g. Xoshiro256++ is extremely fast (multiple GiB per second), see https://prng.di.unimi.it/. However, it's nontrivial to seek in, which would be needed here. I got a bit carried away and wrote code to do that, but it's a bit slower and more complex than what I'd like (~9 microseconds for seeking anywhere within the first 128 MiB of output, ~14 microseconds for the first 4 GiB).

An alternative is concatenating multiple Xoshiro256++ outputs together, e.g. 4 KiB from Xoshiro256++(key=SipHash(key || 0)), 4 KiB from Xoshiro++(key=SipHash(key || 1)), ... etc; that'd offer faster, simpler seeking.

@maflcko
Copy link
Member Author

maflcko commented Jul 12, 2023

Interesting. Right now, it is trivial to undo the XOR in python with something like:

        def util_xor(data):
            with open("xor.key", "rb") as xor_f:
                key = deser_string(xor_f)
            for i in range(len(data)):
                data[i] ^= key[i % len(key)]
            return bytes(data)

If it requires users to instead implement a PRNG in python, they (who want to read the dat files for whatever reason) would most likely disable the XOR feature. (Probably an edge-case, so not sure if it matters, but I wanted to mention it)

@jamesob
Copy link
Contributor

jamesob commented Jul 12, 2023

If the motivating reason for this feature is to prevent systematized, spurious detection of things in blocks (whether via antivirus software or human), isn't it sufficient to keep the XOR key very simple? Especially, as Marco says, anything more sophisticated would complicate consuming blockfiles from external software.

@sipa
Copy link
Member

sipa commented Jul 12, 2023

It's fair that this is perhaps not a real concern right now; I don't know. I find it slightly unsatisfying that we can't break any 8-byte repetitive pattern (see e.g. the picture here for a visual illustration), while computationally speaking doing so should not be any concern. I bring it up here and now, because trying to support both RNG and constant XORing later would be more code than just RNG.

Regarding external software... I'm not sure. It wouldn't surprise me that literally everyone who is interested in reading these files will stick to supporting only non-XOR regardless of whether it's with a constant or an RNG. If not, can easily have Python code at least to read through the XORing (which we may want for functional tests anyway), but for other languages things would involve more complexity.

@maflcko
Copy link
Member Author

maflcko commented Jul 12, 2023

I could write the 128MiB XOR data to the key file in full. Complexity would be 2x, because each std::fread will be done twice (once to read the XOR key section, and once to read the XOR'd data section)?

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK

MarcoFalke added 2 commits July 13, 2023 11:50
* Add m_ prefix to the std::FILE member variable
* Add std namespace where possible, to avoid confusion with member
  functions of the same name.
* Add AutoFile::feof() member function, to be used in place of
  std::feof(AutoFile::Get())
* Simplify fclose() in terms of release()
* Fix typo in the error message in the ignore member function.
New code can call the method without having first to retrieve the raw
FILE* pointer via Get().

Also, move implementation to the cpp file. Can be reviewed with:
--color-moved=dimmed-zebra  --color-moved-ws=ignore-all-space
@maflcko
Copy link
Member Author

maflcko commented Jul 13, 2023

I find it slightly unsatisfying that we can't break any 8-byte repetitive pattern.

I just find it difficult to think about without a use-case. The only thing it may affect is that it makes transparent on-disk compression harder, but 8-byte XOR already makes that hard, so I don't think this is an argument (for or against) either?

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK faebf00

Trivial rephrasing to avoid FILE release looking like a multiplication:

 diff --git a/src/streams.h b/src/streams.h
-index 03df20b5db..fc2fe4d9f8 100644
+index 03df20b5db..5ff952be76 100644
 --- a/src/streams.h
 +++ b/src/streams.h
 @@ -13,6 +13,7 @@
@@ -315,7 +315,7 @@
 -            file = nullptr;
 -        }
 -        return retval;
-+        if (std::FILE * rel{release()}) return std::fclose(rel);
++        if (auto rel{release()}) return std::fclose(rel);
 +        return 0;
      }

src/streams.cpp Outdated
while (src.size() > 0) {
auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
const auto current_pos{std::ftell(m_file)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this std::ftell to right before the loop and then use the result from std::fwrite to increment current_pos each loop iteration since from https://en.cppreference.com/w/cpp/io/c/fwrite: "The file position indicator for the stream is advanced by the number of characters written.". I ran some tests and it gives a performance improvement if src.size() is ever > 4096 bytes (though I'm not sure that is ever the case currently with block serialization hitting this function incrementally?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hoping to nuke ftell completely, but that would require removing the Get member function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the calls to ftell in WriteBlockToDisk and UndoWriteToDisk can be removed since the file position is known before the call and the number of bytes written is also known in the success case (in the failure case the writes will throw), so it should just be possible to increment the initial file position used in OpenBlockFile / OpenUndoFile ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I am happy to make the change, but it seems brittle if someone calls std::fseek after the constructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm good point, I didn't think about that -- it does seem like those functions are pretty contained since they open the file at the start and then close it at exit, but it does seem brittle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll pick up #28006 again and remove Get?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to review if you do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just for context for the follow-up) I am not sure how "clean" the code in #28006 is. Maybe an easier alternative might be to remove all calls to Get and instead wrap the calls that acted on FILE* into a lambda (which acts on FILE*) and pass the lambda to an Exec() member function. The Exec() function would then also reset the cached ftell state and force a call to ftell the next time it is needed. On top of requiring that the constructor takes a FILE*&& (taking ownership), this should be a relatively easy to review complete fix.

@maflcko
Copy link
Member Author

maflcko commented Jul 19, 2023

Pushed a small change to call std::ftell less often if a large data span is written. In a follow-up, changes can be made to call it less often when small data spans are written.

@Crypt-iQ
Copy link
Contributor

crACK fa633aa

@DrahtBot DrahtBot requested a review from jamesob July 20, 2023 19:56
@maflcko maflcko requested a review from josibake July 21, 2023 06:43
@willcl-ark
Copy link
Member

Lightly tested ACK fa633aa

Sidenote: reading the discussions on performance I decided to try and modify the Xor function to use AVX2 SIMD (mainly so I could learn more about it myself) and managed to get something working, but couldn't get it to run faster than the current implementation in this pull.

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                9.71 |      103,017,293.91 |    0.1% |      0.45 | `Xor4MBNoSIMD`
|               10.17 |       98,347,322.00 |    0.3% |      0.47 | `Xor4MBSIMD`

Of course there's a strong likelihood I did not implement it properly here... But it also seems likely that it's probably not worth the hassle as compilers seem to be quite good at optimising simpler code to take advantage of this stuff these days AFAIU.

@maflcko
Copy link
Member Author

maflcko commented Aug 1, 2023

[ @willcl-ark You could try to inspect the binary after compilation to see how the compiler optimized it (and what difference there is to your version) ]

@maflcko
Copy link
Member Author

maflcko commented Aug 1, 2023

@jamesob Mind doing a re-ACK (trivial). Otherwise I wonder if anything is left to be done here? Is this waiting on someone's NACK or ACK?

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK fa633aa (jamesob/ackr/28060.4.MarcoFalke.util_add_xorfile)

Tiny diff since last review; moving ftell out of a loop.

Range-diff
6:  faebf00dbf ! 6:  fa633aa690 streams: Teach AutoFile how to XOR
    @@ src/streams.cpp: void AutoFile::ignore(size_t nSize)
     +            throw std::ios_base::failure("AutoFile::write: write failed");
     +        }
     +    } else {
    ++        auto current_pos{std::ftell(m_file)};
    ++        if (current_pos < 0) throw std::ios_base::failure("AutoFile::write: ftell failed");
     +        std::array<std::byte, 4096> buf;
     +        while (src.size() > 0) {
     +            auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
     +            std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
    -+            const auto current_pos{std::ftell(m_file)};
     +            util::Xor(buf_now, m_xor, current_pos);
    -+            if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
    ++            if (std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
     +                throw std::ios_base::failure{"XorFile::write: failed"};
     +            }
     +            src = src.subspan(buf_now.size());
    ++            current_pos += buf_now.size();
     +        }
          }
      }
Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

reACK fa633aa6906f3b130b691568bcd20b2b76bb1cbb ([`jamesob/ackr/28060.4.MarcoFalke.util_add_xorfile`](https://github.com/jamesob/bitcoin/tree/ackr/28060.4.MarcoFalke.util_add_xorfile))

Tiny diff since last review; moving `ftell` out of a loop.


<details><summary>Range-diff</summary>

6:  faebf00dbf ! 6:  fa633aa690 streams: Teach AutoFile how to XOR
    @@ src/streams.cpp: void AutoFile::ignore(size_t nSize)
     +            throw std::ios_base::failure("AutoFile::write: write failed");
     +        }
     +    } else {
    ++        auto current_pos{std::ftell(m_file)};
    ++        if (current_pos < 0) throw std::ios_base::failure("AutoFile::write: ftell failed");
     +        std::array<std::byte, 4096> buf;
     +        while (src.size() > 0) {
     +            auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
     +            std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
    -+            const auto current_pos{std::ftell(m_file)};
     +            util::Xor(buf_now, m_xor, current_pos);
    -+            if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
    ++            if (std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
     +                throw std::ios_base::failure{"XorFile::write: failed"};
     +            }
     +            src = src.subspan(buf_now.size());
    ++            current_pos += buf_now.size();
     +        }
          }
      }

</details>
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmTJHG8ACgkQepNdrbLE
TwUnxQ/+OXTeOMe54Moioc+djEr3OAHJqw+9Yg5mzNt+SZYb45lxkkAECzL42z+6
C32oarHZRt3pK3ZAQgKrU2TXNoNnn2gxvCJ5SOcjCafARkZCf4LgZzFdEMFJXQCN
VONFBo7kZb9A7wpEp2fU8cLg5/ASOeW19VLUxHpo9FI5XAKlUEUHT1JbJ3jfiYfW
LusvOYrleegDMZlo4zrrqTBymUVYzRjMYJb7cGHu2dl11xVXWOcHNiYiHRmk7g3O
QTAd5jWM55+BGGCht7F3cmhEtdk8mfLznPg6+GFNFgKTNyV2dxYbbQerUCsEGkOO
xv5Omo5lIigVafi9MNjOJ6z4auWHmjQV8Uid1IwAVs1FJEtwA5WAQivTa8pie8sA
wLOgRZhjMABfz+pDO+5WrLMqCgagGQyG276CiEUtMo7NvXdt6lE4YsQ0d1v/BLAZ
LGZYufdn5KX6PxyLVaHluFiEKJE164OZjWSNyyqePZBNJqyElYE9UrQ5uEPoCMXs
UrvxK7IhGIwvkUBcRH68W8z1+RQ53d4jWNZfmoCLZJzhKDAv06m6xbF4HCCarHi7
3QtL9N7dBlNUBSMgZwTcQt7LZmg+QMYgyr4WvML84H1pv8dlIfx5hQ3uH0AJ+Jrl
Pcma+vHwhAa5UQSwHH5N11ANnvw1IXIftC6eHBcQApfWUmAwmKc=
=73oh
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-6.4.6-arch1-1-x86_64-with-glibc2.37

Configured with ./configure 'BDB_LIBS=-L/home/james/src/bitcoin/db4/lib -ldb_cxx-4.8' BDB_CFLAGS=-I/home/james/src/bitcoin/db4/include 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare ' --disable-gui-tests --disable-fuzz-binary --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh --no-create --no-recursion

Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare  -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i

Compiler version: clang version 15.0.7

@fanquake fanquake merged commit ceda819 into bitcoin:master Aug 1, 2023
@maflcko maflcko deleted the 2307-xor-file- branch August 1, 2023 16:46
@willcl-ark
Copy link
Member

[ @willcl-ark You could try to inspect the binary after compilation to see how the compiler optimized it (and what difference there is to your version) ]

I did take a rough look on my machine (and on godbolt), but I don't think I have the necessary super-powers needed to really dissect why my (very naive and likely badly implemented) impl. didn't pay off.

From what I can see, it doesn't appear that clang16 is auto-SIMD-ing your version, but my SIMD version looks quite instruction-heavy by comparison 🤷🏼

Increasing the bench to 100MB does see the SIMD version going marginally faster, but nothing to get excited about...:

$ ./src/bench/bench_bitcoin --filter="Xor.*"
Warning, results might be unstable:
* CPU frequency scaling enabled: CPU 0 between 800.0 and 3,800.0 MHz
* CPU governor is 'powersave' but should be 'performance'
* Turbo is enabled, CPU frequency will fluctuate

Recommendations
* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                8.92 |      112,111,802.67 |    0.5% |      0.01 | `Xor`
|                9.74 |      102,639,832.40 |    0.0% |     11.24 | `Xor100MBNoSIMD`
|                8.62 |      116,041,297.12 |    0.1% |      9.94 | `Xor100MBSIMD`

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 9, 2023
fa633aa streams: Teach AutoFile how to XOR (MarcoFalke)
000019e Add AutoFile::detail_fread member function (MarcoFalke)
fa7724b refactor: Modernize AutoFile (MarcoFalke)
fa8d227 doc: Remove comments that just repeat what the code does (MarcoFalke)
fafe2ca refactor: Remove redundant file check from AutoFile shift operators (MarcoFalke)
9999a49 Extract util::Xor, Add key_offset option, Add bench (MarcoFalke)

Pull request description:

  This allows `AutoFile` to roll an XOR pattern while reading or writing to the underlying file.

  This is needed for bitcoin#28052, but can also be used in any other place.

  Also, there are tests, so I've split this up from the larger pull to make review easier, hopefully.

ACKs for top commit:
  Crypt-iQ:
    crACK fa633aa
  willcl-ark:
    Lightly tested ACK fa633aa
  jamesob:
    reACK fa633aa ([`jamesob/ackr/28060.4.MarcoFalke.util_add_xorfile`](https://github.com/jamesob/bitcoin/tree/ackr/28060.4.MarcoFalke.util_add_xorfile))

Tree-SHA512: 6d66cad0a089a096d3f95e4f2b28bce80b349d4b76f53d09dc9a0bea4fc1b7c0652724469c37971ba27728c7d46398a4c1d289c252af4c0f83bb2fcbc6f8e90b
@bitcoin bitcoin locked and limited conversation to collaborators Jul 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants