Skip to content

Conversation

dhruv
Copy link
Contributor

@dhruv dhruv commented Jun 13, 2022

The current implementation of ChaCha20 advances the block even when the last pseudorandom block of 64 bytes is only partially used. This code buffers and reuses those pseudorandom bytes.

This improvement is relevant to the new design of BIP324 where:

  • One instance of ChaCha20(for encrypting the length) uses only 3 pseudorandom bytes per message, so throwing away 61/64 bytes would be a 95% waste.
  • There are efficiencies to be had from being able to send a vector of plaintexts to encrypt rather than creating a concatenated buffer, but that means the block cannot be advanced in between calls.

Unfortunately, this also means that our implementation is no longer identical to the reference implementation by Daniel J Bernstein (if I had to guess though, closer to his intention for a stream cipher), and the differential fuzz test against his implementation from #22704 no longer makes sense (if it's not against the pristine version, it does not serve the original purpose)

@laanwj
Copy link
Member

laanwj commented Jun 13, 2022

if I had to guess though, closer to his intention for a stream cipher

Yes, this behavior sounds really really strange. The key stream bytes you get shouldn't depend on which quantities you request them in? I wonder why the reference implementation does this.

@dhruv
Copy link
Contributor Author

dhruv commented Jun 13, 2022

@laanwj I don't have access to DJB but I have tried to ask him upon your prompt: https://twitter.com/dhruv/status/1536354037552951296

@dhruv dhruv mentioned this pull request Jun 13, 2022
@sipa
Copy link
Member

sipa commented Jun 13, 2022

Concept ACK on changing this; it much more closely matches the expectation of a stream cipher interface, especially as the specification does not say anything about skipping bytes.

That said, I think that wondering about the origin of this is mostly pointless apart from historical curiosity. It's an implementation issue, which seems to have leaked into the protocol definitions of things that have been built on top (incl. RFC8439 itself, which skips the second 32 byte of the chacha20 stream in the chacha20poly1305 cipher). Any protocol relying on it obviously can't just swap out this implementation as it'd be a protocol break, but other than that, either behavior can be implemented in function of either implementation (add a buffer on top in one direction, or introduce explicit skipping steps in the other).

@real-or-random
Copy link
Contributor

It's just the API. See the comment starting with "Encryption/decryption of arbitrary length messages" in https://web.archive.org/web/20170119113129/http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/api/skel/ecrypt-sync.h?view=auto

@sipa
Copy link
Member

sipa commented Jun 13, 2022

It's just the API. See the comment starting with "Encryption/decryption of arbitrary length messages" in https://web.archive.org/web/20170119113129/http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/api/skel/ecrypt-sync.h?view=auto

but he is NOT allowed to make additional encryption calls once he

  • has called ECRYPT_encrypt_bytes() (unless he starts a new message
  • of course).

Right, that explains. It's still a bit bizarre that the function even bothers updating the position counter in the state then, but there's nothing wrong with it.

@dhruv
Copy link
Contributor Author

dhruv commented Jun 13, 2022

@real-or-random Nice detective work! Thank you.

@laanwj
Copy link
Member

laanwj commented Jun 13, 2022

That said, I think that wondering about the origin of this is mostly pointless apart from historical curiosity. It's an implementation issue

Right. I'm happy that we found this issue before BIP324 was deployed. It would have been virtually impossible to make this optimization later.

@sipa
Copy link
Member

sipa commented Jun 13, 2022

I wrote a fuzz test for verifying that invoking the ChaCha20::Crypto() and/or ChaCha20::Keystream() function multiple times concatenated works identical to calling it once for the whole block: https://github.com/sipa/bitcoin/commits/pr25354. Feel free to steal/include/incorporate.

@dhruv
Copy link
Contributor Author

dhruv commented Jun 14, 2022

I wrote a fuzz test for verifying that invoking the ChaCha20::Crypto() and/or ChaCha20::Keystream() function multiple times concatenated works identical to calling it once for the whole block: https://github.com/sipa/bitcoin/commits/pr25354. Feel free to steal/include/incorporate.

Thanks, @sipa! Yes, I will incorporate it once I'm done refreshing all BIP324 PRs to the new spec in the next few days.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 15, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26153 (Reduce wasted pseudorandom bytes in ChaCha20 + various improvements by sipa)
  • #25712 (crypto: drop 16 byte key support for ChaCha20 by theStack)
  • #25698 (crypto: avoid potential buffer overread in ChaCha20::SetKey by theStack)
  • #25695 (tidy: add modernize-use-using by fanquake)
  • #25361 (BIP324: Cipher suite by dhruv)
  • #25172 (refactor: use std:: prefix for std lib funcs by fanquake)
  • #23561 (BIP324: Handshake prerequisites by dhruv)
  • #23322 ([Fuzz] Poly1305 differential fuzzing against Floodyberry's implementation by prakash1512)
  • #23233 (BIP324: Add encrypted p2p transport {de}serializer by dhruv)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@dhruv dhruv force-pushed the chacha20-partial-blocks branch from 467cacc to 70d2a45 Compare June 25, 2022 03:48
@dhruv
Copy link
Contributor Author

dhruv commented Jun 25, 2022

Rebased with master for other branches in the project

@@ -246,7 +246,6 @@ test_fuzz_fuzz_SOURCES = \
test/fuzz/crypto_chacha20.cpp \
test/fuzz/crypto_chacha20_poly1305_aead.cpp \
test/fuzz/crypto_common.cpp \
test/fuzz/crypto_diff_fuzz_chacha20.cpp \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

diff --git a/test/sanitizer_suppressions/ubsan b/test/sanitizer_suppressions/ubsan
index 67ef512895..e698ffce53 100644
--- a/test/sanitizer_suppressions/ubsan
+++ b/test/sanitizer_suppressions/ubsan
@@ -19,7 +19,6 @@ unsigned-integer-overflow:bench/bench.h
 unsigned-integer-overflow:FuzzedDataProvider.h
 unsigned-integer-overflow:leveldb/
 unsigned-integer-overflow:minisketch/
-unsigned-integer-overflow:test/fuzz/crypto_diff_fuzz_chacha20.cpp
 implicit-integer-sign-change:*/include/boost/
 implicit-integer-sign-change:*/include/c++/
 implicit-integer-sign-change:*/new_allocator.h

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

In order to increase test coverage, I'd suggest to

            ...
            if (prev_block_start_pos >= 64) {
                prev_block_start_pos = 0;
            }
            if (bytes) continue;
            ...
  • add a corresponding test for ChaCha20::Crypt

Copy link

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

}

ChaCha20::ChaCha20(const unsigned char* k, size_t keylen)
{
SetKey(k, keylen);
prev_block_start_pos = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

70d2a45: could do memset(prev_block_bytes, 0, sizeof(prev_block_bytes)); here too.

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR

./src/bench/bench_bitcoin -filter='CHACHA20_1MB'

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.89 346,242,492.84 0.4% 20.13 6.62 3.039 0.08 0.0% 0.03 CHACHA20_1MB
ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.88 346,669,699.46 0.6% 20.13 6.59 3.052 0.08 0.0% 0.03 CHACHA20_1MB

Master

./src/bench/bench_bitcoin -filter='CHACHA20_1MB'

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.56 390,349,081.17 0.3% 17.03 5.87 2.900 0.05 0.0% 0.03 CHACHA20_1MB
ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.57 389,658,756.50 0.4% 17.03 5.89 2.892 0.05 0.0% 0.03 CHACHA20_1MB

is this implementation slower?

}

ChaCha20::ChaCha20()
{
memset(input, 0, sizeof(input));
memset(prev_block_bytes, 0, sizeof(prev_block_bytes));
prev_block_start_pos = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_block_start_pos already initialized in class

}

ChaCha20::ChaCha20(const unsigned char* k, size_t keylen)
{
SetKey(k, keylen);
prev_block_start_pos = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_block_start_pos already initialized in class

@dhruv
Copy link
Contributor Author

dhruv commented Sep 22, 2022

@aureleoules

You're right, there's a slowdown. However, the existing implementation is incorrect and skipping blocks will be ~20x slower for the BIP324 specification. Here are my numbers:

master:

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
1.25 799,288,251.65 0.1% 17.33 4.74 3.654 0.05 0.0% 11.02 CHACHA20_1MB

#25354 (This PR):

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
1.34 747,649,058.86 0.0% 19.94 5.07 3.933 0.08 0.0% 10.99 CHACHA20_1MB

#26153 (Alternate implementation by @sipa):

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
1.25 802,562,761.09 0.3% 16.70 4.72 3.538 0.03 0.0% 10.98 CHACHA20_1MB

Since the implementation by @sipa meets the non-inferiority bar on the benchmarks (corroborated by @aureleoules' comment on #26153), I'm closing this PR and will shortly rebase all my branches for BIP324 to use #26153.

@dhruv dhruv closed this Sep 22, 2022
fanquake added a commit that referenced this pull request Feb 15, 2023
…improvements

511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille)
fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille)
93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille)
62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille)
f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille)
5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille)
38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille)
5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl)
12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille)
6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille)
e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille)

Pull request description:

  This is an alternative to #25354 (by my benchmarking, somewhat faster), subsumes #25712, and adds additional test vectors.

  It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

  I thought about rebasing #25712 on top of this, but the changes before are fairly extensive, so redid it instead.

ACKs for top commit:
  ajtowns:
    ut reACK 511aa4f
  dhruv:
    tACK crACK 511aa4f

Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
@bitcoin bitcoin locked and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: Needs review
Development

Successfully merging this pull request may close these issues.

10 participants