-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[crypto] Reduce wasted pseudorandom bytes in ChaCha20 #25354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yes, this behavior sounds really really strange. The key stream bytes you get shouldn't depend on which quantities you request them in? I wonder why the reference implementation does this. |
@laanwj I don't have access to DJB but I have tried to ask him upon your prompt: https://twitter.com/dhruv/status/1536354037552951296 |
Concept ACK on changing this; it much more closely matches the expectation of a stream cipher interface, especially as the specification does not say anything about skipping bytes. That said, I think that wondering about the origin of this is mostly pointless apart from historical curiosity. It's an implementation issue, which seems to have leaked into the protocol definitions of things that have been built on top (incl. RFC8439 itself, which skips the second 32 byte of the chacha20 stream in the chacha20poly1305 cipher). Any protocol relying on it obviously can't just swap out this implementation as it'd be a protocol break, but other than that, either behavior can be implemented in function of either implementation (add a buffer on top in one direction, or introduce explicit skipping steps in the other). |
It's just the API. See the comment starting with "Encryption/decryption of arbitrary length messages" in https://web.archive.org/web/20170119113129/http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/api/skel/ecrypt-sync.h?view=auto |
Right, that explains. It's still a bit bizarre that the function even bothers updating the position counter in the state then, but there's nothing wrong with it. |
@real-or-random Nice detective work! Thank you. |
Right. I'm happy that we found this issue before BIP324 was deployed. It would have been virtually impossible to make this optimization later. |
I wrote a fuzz test for verifying that invoking the |
Thanks, @sipa! Yes, I will incorporate it once I'm done refreshing all BIP324 PRs to the new spec in the next few days. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
467cacc
to
70d2a45
Compare
Rebased with master for other branches in the project |
@@ -246,7 +246,6 @@ test_fuzz_fuzz_SOURCES = \ | |||
test/fuzz/crypto_chacha20.cpp \ | |||
test/fuzz/crypto_chacha20_poly1305_aead.cpp \ | |||
test/fuzz/crypto_common.cpp \ | |||
test/fuzz/crypto_diff_fuzz_chacha20.cpp \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
diff --git a/test/sanitizer_suppressions/ubsan b/test/sanitizer_suppressions/ubsan
index 67ef512895..e698ffce53 100644
--- a/test/sanitizer_suppressions/ubsan
+++ b/test/sanitizer_suppressions/ubsan
@@ -19,7 +19,6 @@ unsigned-integer-overflow:bench/bench.h
unsigned-integer-overflow:FuzzedDataProvider.h
unsigned-integer-overflow:leveldb/
unsigned-integer-overflow:minisketch/
-unsigned-integer-overflow:test/fuzz/crypto_diff_fuzz_chacha20.cpp
implicit-integer-sign-change:*/include/boost/
implicit-integer-sign-change:*/include/c++/
implicit-integer-sign-change:*/new_allocator.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
In order to increase test coverage, I'd suggest to
- also add calls that consume more pseudorandom bytes than available from the buffered previous block; this tests that
to_use
is limited and that the the following two conditions are triggered (
https://github.com/dhruv/bitcoin/blob/70d2a4582cfbd063fc44238de9b445f546b5a514/src/crypto/chacha20.cpp#L123-L126):
...
if (prev_block_start_pos >= 64) {
prev_block_start_pos = 0;
}
if (bytes) continue;
...
- add a corresponding test for
ChaCha20::Crypt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
} | ||
|
||
ChaCha20::ChaCha20(const unsigned char* k, size_t keylen) | ||
{ | ||
SetKey(k, keylen); | ||
prev_block_start_pos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
70d2a45: could do memset(prev_block_bytes, 0, sizeof(prev_block_bytes));
here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR
./src/bench/bench_bitcoin -filter='CHACHA20_1MB'
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.89 | 346,242,492.84 | 0.4% | 20.13 | 6.62 | 3.039 | 0.08 | 0.0% | 0.03 | CHACHA20_1MB |
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.88 | 346,669,699.46 | 0.6% | 20.13 | 6.59 | 3.052 | 0.08 | 0.0% | 0.03 | CHACHA20_1MB |
Master
./src/bench/bench_bitcoin -filter='CHACHA20_1MB'
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.56 | 390,349,081.17 | 0.3% | 17.03 | 5.87 | 2.900 | 0.05 | 0.0% | 0.03 | CHACHA20_1MB |
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.57 | 389,658,756.50 | 0.4% | 17.03 | 5.89 | 2.892 | 0.05 | 0.0% | 0.03 | CHACHA20_1MB |
is this implementation slower?
} | ||
|
||
ChaCha20::ChaCha20() | ||
{ | ||
memset(input, 0, sizeof(input)); | ||
memset(prev_block_bytes, 0, sizeof(prev_block_bytes)); | ||
prev_block_start_pos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prev_block_start_pos
already initialized in class
} | ||
|
||
ChaCha20::ChaCha20(const unsigned char* k, size_t keylen) | ||
{ | ||
SetKey(k, keylen); | ||
prev_block_start_pos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prev_block_start_pos
already initialized in class
You're right, there's a slowdown. However, the existing implementation is incorrect and skipping blocks will be ~20x slower for the BIP324 specification. Here are my numbers: master:
#25354 (This PR):
#26153 (Alternate implementation by @sipa):
Since the implementation by @sipa meets the non-inferiority bar on the benchmarks (corroborated by @aureleoules' comment on #26153), I'm closing this PR and will shortly rebase all my branches for BIP324 to use #26153. |
…improvements 511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille) fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille) 93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille) 62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille) f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille) 5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille) 38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille) 5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl) 12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille) 6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille) e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille) Pull request description: This is an alternative to #25354 (by my benchmarking, somewhat faster), subsumes #25712, and adds additional test vectors. It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching. I thought about rebasing #25712 on top of this, but the changes before are fairly extensive, so redid it instead. ACKs for top commit: ajtowns: ut reACK 511aa4f dhruv: tACK crACK 511aa4f Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
The current implementation of ChaCha20 advances the block even when the last pseudorandom block of 64 bytes is only partially used. This code buffers and reuses those pseudorandom bytes.
This improvement is relevant to the new design of BIP324 where:
Unfortunately, this also means that our implementation is no longer identical to the reference implementation by Daniel J Bernstein (if I had to guess though, closer to his intention for a stream cipher), and the differential fuzz test against his implementation from #22704 no longer makes sense (if it's not against the pristine version, it does not serve the original purpose)