-
Notifications
You must be signed in to change notification settings - Fork 37.7k
util: Teach AutoFile how to XOR #28060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
In general looks good, but I think you could really cut down on the amount of code here if you just made XorFile
a subclass of CAutoFile
, and made whatever small adjustments and modernizations necessary to CAutoFile
to get that to work.
E.g. XorFile::ignore()
seems like essentially the same code as CAutoFile::ignore()
, just with some std::
prefixes. Seems like code we don't want to maintain two copies of.
src/streams.cpp
Outdated
void XorFile::write(Span<const std::byte> src) | ||
{ | ||
if (!m_file) throw std::ios_base::failure("XorFile::write: file handle is nullptr"); | ||
std::array<std::byte, 4096> buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be interesting to see how statically allocating a buffer (say, as a member of XorFile
) and reusing it might affect performance; may avoid some reallocations. Not sure how frequently this is actually called though. I expect this probably won't make a difference though, because I guess XorFile::write()
will be called on the order of once per block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, but my blind guess would be that calling std::fwrite
takes longer than anything else in this function. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run benchmarks against statically allocating a buffer, but it may be useful since AutoFile::write
is called many times per block. Basically every time the pattern CAutoFile << x
is used which happens for every element of the block header, each transaction input, output, etc. I also ran a small unit test through perf
where there were many writes and found that std::ftell
dominated with std::fwrite
close behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to have a heat-map (Percent of time spent in each line) for this function for IBD, or a unit test that writes a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also ran a small unit test through perf where there were many writes and found that std::ftell dominated with std::fwrite close behind
For me fwrite dominated over ftell, but the most time was spent on XOR:
ROUTINE ====================== AutoFile::write in src/streams.cpp
49 100 Total samples (flat / cumulative)
. . 40: nSize -= nNow;
. . 41: }
. . 42: }
. . 43:
. . 44: void AutoFile::write(Span<const std::byte> src)
---
. . 45: {
. . 46: if (!m_file) throw std::ios_base::failure("AutoFile::write: file handle is nullptr");
. . 47: if (m_xor.empty()) {
. . 48: if (std::fwrite(src.data(), 1, src.size(), m_file) != src.size()) {
. . 49: throw std::ios_base::failure("AutoFile::write: write failed");
. . 50: }
. . 51: } else {
. . 52: std::array<std::byte, 4096> buf;
. . 53: while (src.size() > 0) {
. . 54: auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
. 2 55: std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
. 16 56: const auto current_pos{std::ftell(m_file)};
49 49 57: util::Xor(buf_now, m_xor, current_pos);
. 33 58: if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
. . 59: throw std::ios_base::failure{"XorFile::write: failed"};
. . 60: }
. . 61: src = src.subspan(buf_now.size());
. . 62: }
. . 63: }
. . 64: }
---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it must have been a fluke because when I synced up to a little past block ~200K, fwrite
was in the lead and ftell
and xor were about tied at ~3% each of the total time spent. I wrote a test here Crypt-iQ@bf924d6 that requires the benchmarks to be compiled but is not a benchmark. I made a patch in the following commit (Crypt-iQ@be07dbc) that reduced the number of ftell
calls to 1 for AutoFile
and it gave a noticeable performance improvement. I think it's ok to track the file position this way, but I might be wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on the data written. My benchmark was for 1MB written. With 33 bytes written, ftell
is the dominant:
ROUTINE ====================== AutoFile::write in src/streams.cpp
2 28 Total samples (flat / cumulative)
. . 40: nSize -= nNow;
. . 41: }
. . 42: }
. . 43:
. . 44: void AutoFile::write(Span<const std::byte> src)
---
. . 45: {
. . 46: if (!m_file) throw std::ios_base::failure("AutoFile::write: file handle is nullptr");
. . 47: if (m_xor.empty()) {
. . 48: if (std::fwrite(src.data(), 1, src.size(), m_file) != src.size()) {
. . 49: throw std::ios_base::failure("AutoFile::write: write failed");
. . 50: }
. . 51: } else {
. . 52: std::array<std::byte, 4096> buf;
. . 53: while (src.size() > 0) {
. . 54: auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
. . 55: std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
. 23 56: const auto current_pos{std::ftell(m_file)};
1 1 57: util::Xor(buf_now, m_xor, current_pos);
1 4 58: if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
. . 59: throw std::ios_base::failure{"XorFile::write: failed"};
. . 60: }
. . 61: src = src.subspan(buf_now.size());
. . 62: }
. . 63: }
. . 64: }
---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case the bottleneck doesn't seem to be std::array<std::byte, 4096> buf;
, so I think this thread can be closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I didn't see allocations come up at all in perf
The shift operators will call the write or read member function, which already does the check. Also, call sites are free to directly call ::(Un)Serialize(s, obj) to skip this check, so removing it increases consistency.
Thanks, rewritten from scratch to reduce the lines of code. |
No need to artificially bloat the code and waste space.
fa6b8c6
to
fadb325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK fadb325 (jamesob/ackr/28060.2.MarcoFalke.util_add_xorfile
)
Looks good! More concise change now that we're just adding XOR capability to existing classes.
Show signature data
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK fadb32556f64a72377c82be71a207195e6fcf68b ([`jamesob/ackr/28060.2.MarcoFalke.util_add_xorfile`](https://github.com/jamesob/bitcoin/tree/ackr/28060.2.MarcoFalke.util_add_xorfile))
Looks good! More concise change now that we're just adding XOR capability to existing classes.
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmSu17QACgkQepNdrbLE
TwWM2Q/5AQGR5Uv5aFQSPvjtd9ZbcPyYvDzwSjkFImCnCrVGeTlHux6rsS9SzL8y
gybawiRHUTHjLCO8j5ItvM7/stmyq4UiTk8QCOQL39dGJgBVjV+/pQnCqPlMSU1d
KtObCvXlVc5jWgr7OdHRDtDb5oZi1jc+bcr5ReVCggTdpcvjHxkxLSjRXEFQoeQi
kY/wa5yvids1kyCotiScB+zvc3FwVKj3/lWpPWYkpXdTxmqAjpmgweQlBqq90a2a
I5D0oI1fyerVzJrYLi5EQfY8PQZBhUDjJSzyMTpxr0EqXSRGWCDl5Zm4m0Mxo+8Q
B+Wymo/Dvf196ugDBbUWnQM/dG4tqQRZAB3U1VHwBUemb+fZgDRBeeyAs6/aIHik
YwUR8amJM08ClAJUfSaNXL/OouMqOoMds5tXwMM+Z/2msAgpdXx7c6ifGFaMQm5S
voElavRD55OLFn9UYKZHusF3EmeTN094pWkwMCeuFnnW9+Rpar5m0nvNuH7T/viI
xdFM2V/ypzkDtyDg21ZrR+OcqlGIsj0x8BB7rzs1JaxeoyO6Zshj7RW1JCPTJtBq
P2KnEiD8DoFL9gTuL3TtnT/5oIJ3Quqb8A8GnU4amzjOcUyg9mg3CwMuWuh17odO
LIckwWfQZlo7wzzOUXwE4i/Xj0ClbyNPgzroOzcXz12WdSsAQWw=
=SQKV
-----END PGP SIGNATURE-----
Show platform data
Tested on Linux-6.2.10-arch1-1-x86_64-with-glibc2.37
Configured with ./configure 'BDB_LIBS=-L/home/james/src/bitcoin/db4/lib -ldb_cxx-4.8' BDB_CFLAGS=-I/home/james/src/bitcoin/db4/include 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare ' --disable-gui-tests --disable-fuzz-binary --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh --no-create --no-recursion
Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=. -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha i
Compiler version: clang version 15.0.7
(Slight tangent, not necessarily for this PR) I was wondering if it wouldn't be better to XOR with a proper (but non-cryptographic) RNG output instead of a fixed byte pattern, as it'd avoid repetitive patterns remaining in the file, or weak keys that could accidentally be selected. E.g. Xoshiro256++ is extremely fast (multiple GiB per second), see https://prng.di.unimi.it/. However, it's nontrivial to seek in, which would be needed here. I got a bit carried away and wrote code to do that, but it's a bit slower and more complex than what I'd like (~9 microseconds for seeking anywhere within the first 128 MiB of output, ~14 microseconds for the first 4 GiB). An alternative is concatenating multiple Xoshiro256++ outputs together, e.g. 4 KiB from Xoshiro256++(key=SipHash(key || 0)), 4 KiB from Xoshiro++(key=SipHash(key || 1)), ... etc; that'd offer faster, simpler seeking. |
Interesting. Right now, it is trivial to undo the XOR in python with something like: def util_xor(data):
with open("xor.key", "rb") as xor_f:
key = deser_string(xor_f)
for i in range(len(data)):
data[i] ^= key[i % len(key)]
return bytes(data) If it requires users to instead implement a PRNG in python, they (who want to read the dat files for whatever reason) would most likely disable the XOR feature. (Probably an edge-case, so not sure if it matters, but I wanted to mention it) |
If the motivating reason for this feature is to prevent systematized, spurious detection of things in blocks (whether via antivirus software or human), isn't it sufficient to keep the XOR key very simple? Especially, as Marco says, anything more sophisticated would complicate consuming blockfiles from external software. |
It's fair that this is perhaps not a real concern right now; I don't know. I find it slightly unsatisfying that we can't break any 8-byte repetitive pattern (see e.g. the picture here for a visual illustration), while computationally speaking doing so should not be any concern. I bring it up here and now, because trying to support both RNG and constant XORing later would be more code than just RNG. Regarding external software... I'm not sure. It wouldn't surprise me that literally everyone who is interested in reading these files will stick to supporting only non-XOR regardless of whether it's with a constant or an RNG. If not, can easily have Python code at least to read through the XORing (which we may want for functional tests anyway), but for other languages things would involve more complexity. |
I could write the 128MiB XOR data to the key file in full. Complexity would be 2x, because each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK
* Add m_ prefix to the std::FILE member variable * Add std namespace where possible, to avoid confusion with member functions of the same name. * Add AutoFile::feof() member function, to be used in place of std::feof(AutoFile::Get()) * Simplify fclose() in terms of release() * Fix typo in the error message in the ignore member function.
New code can call the method without having first to retrieve the raw FILE* pointer via Get(). Also, move implementation to the cpp file. Can be reviewed with: --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space
I just find it difficult to think about without a use-case. The only thing it may affect is that it makes transparent on-disk compression harder, but 8-byte XOR already makes that hard, so I don't think this is an argument (for or against) either? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK faebf00
Trivial rephrasing to avoid FILE
release looking like a multiplication:
diff --git a/src/streams.h b/src/streams.h
-index 03df20b5db..fc2fe4d9f8 100644
+index 03df20b5db..5ff952be76 100644
--- a/src/streams.h
+++ b/src/streams.h
@@ -13,6 +13,7 @@
@@ -315,7 +315,7 @@
- file = nullptr;
- }
- return retval;
-+ if (std::FILE * rel{release()}) return std::fclose(rel);
++ if (auto rel{release()}) return std::fclose(rel);
+ return 0;
}
src/streams.cpp
Outdated
while (src.size() > 0) { | ||
auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))}; | ||
std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin()); | ||
const auto current_pos{std::ftell(m_file)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can move this std::ftell
to right before the loop and then use the result from std::fwrite
to increment current_pos
each loop iteration since from https://en.cppreference.com/w/cpp/io/c/fwrite: "The file position indicator for the stream is advanced by the number of characters written.". I ran some tests and it gives a performance improvement if src.size()
is ever > 4096 bytes (though I'm not sure that is ever the case currently with block serialization hitting this function incrementally?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hoping to nuke ftell
completely, but that would require removing the Get
member function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the calls to ftell
in WriteBlockToDisk
and UndoWriteToDisk
can be removed since the file position is known before the call and the number of bytes written is also known in the success case (in the failure case the writes will throw), so it should just be possible to increment the initial file position used in OpenBlockFile
/ OpenUndoFile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I am happy to make the change, but it seems brittle if someone calls std::fseek
after the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm good point, I didn't think about that -- it does seem like those functions are pretty contained since they open the file at the start and then close it at exit, but it does seem brittle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll pick up #28006 again and remove Get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to review if you do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just for context for the follow-up) I am not sure how "clean" the code in #28006 is. Maybe an easier alternative might be to remove all calls to Get
and instead wrap the calls that acted on FILE*
into a lambda (which acts on FILE*
) and pass the lambda to an Exec()
member function. The Exec()
function would then also reset the cached ftell
state and force a call to ftell
the next time it is needed. On top of requiring that the constructor takes a FILE*&&
(taking ownership), this should be a relatively easy to review complete fix.
Pushed a small change to call |
crACK fa633aa |
Lightly tested ACK fa633aa Sidenote: reading the discussions on performance I decided to try and modify the Xor function to use AVX2 SIMD (mainly so I could learn more about it myself) and managed to get something working, but couldn't get it to run faster than the current implementation in this pull.
Of course there's a strong likelihood I did not implement it properly here... But it also seems likely that it's probably not worth the hassle as compilers seem to be quite good at optimising simpler code to take advantage of this stuff these days AFAIU. |
[ @willcl-ark You could try to inspect the binary after compilation to see how the compiler optimized it (and what difference there is to your version) ] |
@jamesob Mind doing a re-ACK (trivial). Otherwise I wonder if anything is left to be done here? Is this waiting on someone's NACK or ACK? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK fa633aa (jamesob/ackr/28060.4.MarcoFalke.util_add_xorfile
)
Tiny diff since last review; moving ftell
out of a loop.
Range-diff
6: faebf00dbf ! 6: fa633aa690 streams: Teach AutoFile how to XOR
@@ src/streams.cpp: void AutoFile::ignore(size_t nSize)
+ throw std::ios_base::failure("AutoFile::write: write failed");
+ }
+ } else {
++ auto current_pos{std::ftell(m_file)};
++ if (current_pos < 0) throw std::ios_base::failure("AutoFile::write: ftell failed");
+ std::array<std::byte, 4096> buf;
+ while (src.size() > 0) {
+ auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
+ std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
-+ const auto current_pos{std::ftell(m_file)};
+ util::Xor(buf_now, m_xor, current_pos);
-+ if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
++ if (std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
+ throw std::ios_base::failure{"XorFile::write: failed"};
+ }
+ src = src.subspan(buf_now.size());
++ current_pos += buf_now.size();
+ }
}
}
Show signature data
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
reACK fa633aa6906f3b130b691568bcd20b2b76bb1cbb ([`jamesob/ackr/28060.4.MarcoFalke.util_add_xorfile`](https://github.com/jamesob/bitcoin/tree/ackr/28060.4.MarcoFalke.util_add_xorfile))
Tiny diff since last review; moving `ftell` out of a loop.
<details><summary>Range-diff</summary>
6: faebf00dbf ! 6: fa633aa690 streams: Teach AutoFile how to XOR
@@ src/streams.cpp: void AutoFile::ignore(size_t nSize)
+ throw std::ios_base::failure("AutoFile::write: write failed");
+ }
+ } else {
++ auto current_pos{std::ftell(m_file)};
++ if (current_pos < 0) throw std::ios_base::failure("AutoFile::write: ftell failed");
+ std::array<std::byte, 4096> buf;
+ while (src.size() > 0) {
+ auto buf_now{Span{buf}.first(std::min<size_t>(src.size(), buf.size()))};
+ std::copy(src.begin(), src.begin() + buf_now.size(), buf_now.begin());
-+ const auto current_pos{std::ftell(m_file)};
+ util::Xor(buf_now, m_xor, current_pos);
-+ if (current_pos < 0 || std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
++ if (std::fwrite(buf_now.data(), 1, buf_now.size(), m_file) != buf_now.size()) {
+ throw std::ios_base::failure{"XorFile::write: failed"};
+ }
+ src = src.subspan(buf_now.size());
++ current_pos += buf_now.size();
+ }
}
}
</details>
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmTJHG8ACgkQepNdrbLE
TwUnxQ/+OXTeOMe54Moioc+djEr3OAHJqw+9Yg5mzNt+SZYb45lxkkAECzL42z+6
C32oarHZRt3pK3ZAQgKrU2TXNoNnn2gxvCJ5SOcjCafARkZCf4LgZzFdEMFJXQCN
VONFBo7kZb9A7wpEp2fU8cLg5/ASOeW19VLUxHpo9FI5XAKlUEUHT1JbJ3jfiYfW
LusvOYrleegDMZlo4zrrqTBymUVYzRjMYJb7cGHu2dl11xVXWOcHNiYiHRmk7g3O
QTAd5jWM55+BGGCht7F3cmhEtdk8mfLznPg6+GFNFgKTNyV2dxYbbQerUCsEGkOO
xv5Omo5lIigVafi9MNjOJ6z4auWHmjQV8Uid1IwAVs1FJEtwA5WAQivTa8pie8sA
wLOgRZhjMABfz+pDO+5WrLMqCgagGQyG276CiEUtMo7NvXdt6lE4YsQ0d1v/BLAZ
LGZYufdn5KX6PxyLVaHluFiEKJE164OZjWSNyyqePZBNJqyElYE9UrQ5uEPoCMXs
UrvxK7IhGIwvkUBcRH68W8z1+RQ53d4jWNZfmoCLZJzhKDAv06m6xbF4HCCarHi7
3QtL9N7dBlNUBSMgZwTcQt7LZmg+QMYgyr4WvML84H1pv8dlIfx5hQ3uH0AJ+Jrl
Pcma+vHwhAa5UQSwHH5N11ANnvw1IXIftC6eHBcQApfWUmAwmKc=
=73oh
-----END PGP SIGNATURE-----
Show platform data
Tested on Linux-6.4.6-arch1-1-x86_64-with-glibc2.37
Configured with ./configure 'BDB_LIBS=-L/home/james/src/bitcoin/db4/lib -ldb_cxx-4.8' BDB_CFLAGS=-I/home/james/src/bitcoin/db4/include 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare ' --disable-gui-tests --disable-fuzz-binary --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh --no-create --no-recursion
Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=. -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha i
Compiler version: clang version 15.0.7
I did take a rough look on my machine (and on godbolt), but I don't think I have the necessary super-powers needed to really dissect why my (very naive and likely badly implemented) impl. didn't pay off. From what I can see, it doesn't appear that clang16 is auto-SIMD-ing your version, but my SIMD version looks quite instruction-heavy by comparison 🤷🏼 Increasing the bench to 100MB does see the SIMD version going marginally faster, but nothing to get excited about...:
|
fa633aa streams: Teach AutoFile how to XOR (MarcoFalke) 000019e Add AutoFile::detail_fread member function (MarcoFalke) fa7724b refactor: Modernize AutoFile (MarcoFalke) fa8d227 doc: Remove comments that just repeat what the code does (MarcoFalke) fafe2ca refactor: Remove redundant file check from AutoFile shift operators (MarcoFalke) 9999a49 Extract util::Xor, Add key_offset option, Add bench (MarcoFalke) Pull request description: This allows `AutoFile` to roll an XOR pattern while reading or writing to the underlying file. This is needed for bitcoin#28052, but can also be used in any other place. Also, there are tests, so I've split this up from the larger pull to make review easier, hopefully. ACKs for top commit: Crypt-iQ: crACK fa633aa willcl-ark: Lightly tested ACK fa633aa jamesob: reACK fa633aa ([`jamesob/ackr/28060.4.MarcoFalke.util_add_xorfile`](https://github.com/jamesob/bitcoin/tree/ackr/28060.4.MarcoFalke.util_add_xorfile)) Tree-SHA512: 6d66cad0a089a096d3f95e4f2b28bce80b349d4b76f53d09dc9a0bea4fc1b7c0652724469c37971ba27728c7d46398a4c1d289c252af4c0f83bb2fcbc6f8e90b
This allows
AutoFile
to roll an XOR pattern while reading or writing to the underlying file.This is needed for #28052, but can also be used in any other place.
Also, there are tests, so I've split this up from the larger pull to make review easier, hopefully.