-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Enable HW-accelerated implementations of SHA256 for MSVC builds #24773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Concept ACK—but it does makes me wonder why AVX2 specifically? Would, say, enabling SHANI (which is better on hw that supports it) give a lot of extra work? |
I have no hardware that supports SHANI ( |
That's a valid reason. I guess the base work in this PR will make it easier for people who care, to add support the other instruction sets too? |
Exactly. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
tACK 70f36d8. *I'm not qualified to verify the safety of changing crypto logic. I performed an msvc build and then ran the benchmark with and without this PR and there doesn't seem to be much of a difference for On
With #24773:
|
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov) 55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov) Pull request description: This change is required for bitcoin/bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list". But it is useful by itself as it makes code more concise and readable. ACKs for top commit: Empact: Code review ACK 07ddecb laanwj: Code review ACK 07ddecb vincenzopalazzo: ACK bitcoin/bitcoin@07ddecb w0xlt: ACK 07ddecb Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
Rebased on top of the merged #24772. |
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov) 55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov) Pull request description: This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list". But it is useful by itself as it makes code more concise and readable. ACKs for top commit: Empact: Code review ACK 07ddecb laanwj: Code review ACK 07ddecb vincenzopalazzo: ACK bitcoin@07ddecb w0xlt: ACK 07ddecb Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
Converted to a "draft" until resolving discussion in #24774. |
Closing for now, given you've closed the remaining base PR (#24774). Let me know if this is incorrect / should be reopened. |
f3c4857
to
daddd58
Compare
6e0f1d2 msvc: Optimize "Release" builds (Hennadii Stepanov) Pull request description: It is awkward not using optimization. In addition to the obvious benefits for Windows users, this PR reduces the duration of functional tests by an hour. Picked from bitcoin/bitcoin#24773. ACKs for top commit: sipsorcery: tACK 6e0f1d2. Tree-SHA512: 5aa7fd38cb1a81d58ea3206756a8099891866c82a747d3b8079cab0b2afa1f40ba53adff2f32eb233efcd1227babee80ab175e35a83678fafa8a4f63c356e5ca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested e94ae81 on Windows 11 with VS2022. CPU: i7-4870HQ and saw big speed improvement.
On master:
$ ./src/bench_bitcoin.exe -filter=SHA256D64.*
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 11.57 | 86,459,102.90 | 0.0% | 0.01 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
| 11.89 | 84,074,406.67 | 0.0% | 0.01 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
| 11.97 | 83,559,862.30 | 0.6% | 0.01 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
| 11.90 | 84,031,286.06 | 0.0% | 0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
Using #24773
$ ./src/bench_bitcoin.exe -filter=SHA256D64.*
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.77 | 361,557,983.01 | 0.1% | 0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
| 4.33 | 230,699,639.18 | 0.1% | 0.01 | `SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation`
| 4.57 | 219,037,433.16 | 2.1% | 0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
| 11.88 | 84,171,590.03 | 0.2% | 0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
One tiny nit on naming, not important to hold it up.
/* Define this symbol to build in assembly routines */ | ||
#define USE_ASM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: From what I understand we are enabling the special CPU instructions using intrinsics and not inline assembly. This makes the comment a little misleading and makes the symbol unfortunately named.
Perhaps the symbol could be?
/* Define this symbol to build in assembly routines */ | |
#define USE_ASM | |
/* Define this symbol to enable CPU SHA Extensions */ | |
#define USE_SHA_EXTENTIONS |
or?
/* Define this symbol to build in assembly routines */ | |
#define USE_ASM | |
/* Define this symbol to enable CPU Extensions */ | |
#define USE_INTRINSICS |
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov) 55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov) Pull request description: This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list". But it is useful by itself as it makes code more concise and readable. ACKs for top commit: Empact: Code review ACK 07ddecb laanwj: Code review ACK 07ddecb vincenzopalazzo: ACK bitcoin@07ddecb w0xlt: ACK 07ddecb Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov) 55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov) Pull request description: This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list". But it is useful by itself as it makes code more concise and readable. ACKs for top commit: Empact: Code review ACK 07ddecb laanwj: Code review ACK 07ddecb vincenzopalazzo: ACK bitcoin@07ddecb w0xlt: ACK 07ddecb Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov) 55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov) Pull request description: This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list". But it is useful by itself as it makes code more concise and readable. ACKs for top commit: Empact: Code review ACK 07ddecb laanwj: Code review ACK 07ddecb vincenzopalazzo: ACK bitcoin@07ddecb w0xlt: ACK 07ddecb Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov) 55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov) Pull request description: This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list". But it is useful by itself as it makes code more concise and readable. ACKs for top commit: Empact: Code review ACK 07ddecb laanwj: Code review ACK 07ddecb vincenzopalazzo: ACK bitcoin@07ddecb w0xlt: ACK 07ddecb Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
Rebased. |
Deferring to after cmake |
3f19875 scripted-diff: Use platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov) e16c22f Introduce platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov) Pull request description: Split from bitcoin#24773 as requested in bitcoin#24773 (comment). ACKs for top commit: theuni: utACK 3f19875 fanquake: ACK 3f19875 Tree-SHA512: a19b713433bb4d3c5fff1ddb4d1413837823a400c1d46363a8181e7632b059846ba92264be1c867f35f532af90945ed20887103471b09c07623e0f3905b4098b
3f19875 scripted-diff: Use platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov) e16c22f Introduce platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov) Pull request description: Split from bitcoin#24773 as requested in bitcoin#24773 (comment). ACKs for top commit: theuni: utACK 3f19875 fanquake: ACK 3f19875 Tree-SHA512: a19b713433bb4d3c5fff1ddb4d1413837823a400c1d46363a8181e7632b059846ba92264be1c867f35f532af90945ed20887103471b09c07623e0f3905b4098b
This PR enables AVX2, SSE4.1 and x86 SHA-NI implementations of SHA256 instead of the "standard" one.
NOTE about testing. During runtime the SHA-NI implementation is available only if a CPU has the
sha_ni
flag set.Here are benchmark results on my machine with Intel Core i5 8350U (no
sha_ni
flag) + Windows 11 Pro 22H2:On another machine with Intel Core i9-12950HX (
sha_ni
flag set) + Windows 11 Home 22H2: