Skip to content

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Apr 5, 2022

This PR enables AVX2, SSE4.1 and x86 SHA-NI implementations of SHA256 instead of the "standard" one.

NOTE about testing. During runtime the SHA-NI implementation is available only if a CPU has the sha_ni flag set.

Here are benchmark results on my machine with Intel Core i5 8350U (no sha_ni flag) + Windows 11 Pro 22H2:

>.\src\bench_bitcoin.exe -filter=SHA256D64.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.79 |      357,807,381.52 |    0.2% |      0.01 | SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
|                4.29 |      232,954,767.62 |    0.0% |      0.01 | SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation
|                4.33 |      231,031,727.38 |    0.6% |      0.01 | SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
|               11.65 |       85,836,280.29 |    0.3% |      0.01 | SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation


On another machine with Intel Core i9-12950HX (sha_ni flag set) + Windows 11 Home 22H2:

>.\src\bench_bitcoin.exe -filter=SHA256.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               24.74 |       40,421,883.67 |    1.2% |      0.02 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
|               25.09 |       39,856,473.88 |    2.7% |      0.02 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
|               25.02 |       39,965,849.49 |    1.4% |      0.02 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
|               25.06 |       39,900,152.21 |    1.4% |      0.02 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
|               17.87 |       55,953,892.74 |    4.0% |      0.01 | `SHA256_32b_AVX2 using the 'standard' SHA256 implementation`
|               17.59 |       56,839,372.12 |    2.5% |      0.01 | `SHA256_32b_SHANI using the 'standard' SHA256 implementation`
|               17.95 |       55,721,393.03 |    2.9% |      0.01 | `SHA256_32b_SSE4 using the 'standard' SHA256 implementation`
|               18.47 |       54,140,724.95 |    2.1% |      0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
|                8.27 |      120,885,364.41 |    2.3% |      0.09 | `SHA256_AVX2 using the 'standard' SHA256 implementation`
|                8.10 |      123,519,312.24 |    0.5% |      0.09 | `SHA256_SHANI using the 'standard' SHA256 implementation`
|                8.15 |      122,720,467.32 |    1.7% |      0.09 | `SHA256_SSE4 using the 'standard' SHA256 implementation`
|                7.93 |      126,141,581.31 |    1.4% |      0.09 | `SHA256_STANDARD using the 'standard' SHA256 implementation`

  • this PR:
>.\src\bench_bitcoin.exe -filter=SHA256.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.24 |      446,126,616.75 |    1.7% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
|                1.05 |      953,349,958.44 |    3.3% |      0.01 | `SHA256D64_1024_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
|                2.82 |      354,952,157.43 |    1.1% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
|                5.45 |      183,471,444.57 |    1.6% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
|                4.37 |      228,701,362.34 |    1.4% |      0.01 | `SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
|                1.34 |      748,698,312.44 |    2.4% |      0.01 | `SHA256_32b_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
|                4.30 |      232,450,436.16 |    2.6% |      0.01 | `SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
|                4.54 |      220,072,501.29 |    3.6% |      0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
|                2.14 |      467,093,278.53 |    4.2% |      0.02 | `SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
|                0.44 |    2,279,721,873.93 |    2.9% |      0.01 | `SHA256_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
|                2.09 |      478,148,608.59 |    2.2% |      0.02 | `SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
|                2.06 |      486,570,650.06 |    2.4% |      0.02 | `SHA256_STANDARD using the 'standard' SHA256 implementation`

@laanwj
Copy link
Member

laanwj commented Apr 6, 2022

Concept ACK—but it does makes me wonder why AVX2 specifically? Would, say, enabling SHANI (which is better on hw that supports it) give a lot of extra work?

@hebasto
Copy link
Member Author

hebasto commented Apr 6, 2022

Concept ACK—but it does makes me wonder why AVX2 specifically? Would, say, enabling SHANI (which is better on hw that supports it) give a lot of extra work?

I have no hardware that supports SHANI (

@laanwj
Copy link
Member

laanwj commented Apr 6, 2022

I have no hardware that supports SHANI (

That's a valid reason. I guess the base work in this PR will make it easier for people who care, to add support the other instruction sets too?

@hebasto
Copy link
Member Author

hebasto commented Apr 6, 2022

I have no hardware that supports SHANI (

That's a valid reason. I guess the base work in this PR will make it easier for people who care, to add support the other instruction sets too?

Exactly.

@laanwj laanwj requested a review from sipsorcery April 6, 2022 11:02
@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 6, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK laanwj, theuni
Stale ACK sipsorcery

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29774 (build: Enable fuzz binary in MSVC by hebasto)
  • #29625 (Several randomness improvements by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipsorcery
Copy link
Contributor

tACK 70f36d8.

*I'm not qualified to verify the safety of changing crypto logic.

I performed an msvc build and then ran the benchmark with and without this PR and there doesn't seem to be much of a difference for SHA256 on my Intel i7 CPU based PC. I'm not sure if that as expected or not?

On master:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               12.70 |       78,766,206.15 |    2.6% |      0.14 | `SHA1`
|               16.99 |       58,841,181.77 |    1.2% |      0.19 | `SHA256`
|               46.23 |       21,630,470.66 |   11.5% |      0.04 | :wavy_dash: `SHA256D64_1024` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
|               36.54 |       27,365,792.76 |    1.8% |      0.01 | `SHA256_32b`
|               24.93 |       40,108,453.26 |    3.8% |      0.29 | `SHA3_256_1M`
|               11.64 |       85,924,678.43 |    1.7% |      0.13 | `SHA512`

With #24773:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               13.79 |       72,504,223.37 |    4.9% |      0.16 | `SHA1`
|               18.06 |       55,382,638.65 |    4.6% |      0.21 | `SHA256`
|               88.30 |       11,324,888.97 |   13.0% |      0.06 | :wavy_dash: `SHA256D64_1024` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
|               56.46 |       17,710,386.66 |   18.7% |      0.01 | :wavy_dash: `SHA256_32b` (Unstable with ~618.0 iters. Increase `minEpochIterations` to e.g. 6180)
|               28.18 |       35,489,686.70 |    2.3% |      0.33 | `SHA3_256_1M`
|               12.35 |       80,963,792.99 |    6.3% |      0.14 | :wavy_dash: `SHA512` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g.10)

laanwj added a commit to bitcoin-core/gui that referenced this pull request Apr 19, 2022
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov)
55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov)

Pull request description:

  This change is required for bitcoin/bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list".

  But it is useful by itself as it makes code more concise and readable.

ACKs for top commit:
  Empact:
    Code review ACK 07ddecb
  laanwj:
    Code review ACK 07ddecb
  vincenzopalazzo:
    ACK bitcoin/bitcoin@07ddecb
  w0xlt:
    ACK 07ddecb

Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
@hebasto
Copy link
Member Author

hebasto commented Apr 19, 2022

Rebased on top of the merged #24772.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 19, 2022
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov)
55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov)

Pull request description:

  This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list".

  But it is useful by itself as it makes code more concise and readable.

ACKs for top commit:
  Empact:
    Code review ACK 07ddecb
  laanwj:
    Code review ACK 07ddecb
  vincenzopalazzo:
    ACK bitcoin@07ddecb
  w0xlt:
    ACK 07ddecb

Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
@hebasto hebasto marked this pull request as draft May 28, 2022 13:13
@hebasto
Copy link
Member Author

hebasto commented May 28, 2022

Converted to a "draft" until resolving discussion in #24774.

@fanquake
Copy link
Member

Closing for now, given you've closed the remaining base PR (#24774). Let me know if this is incorrect / should be reopened.

@fanquake fanquake closed this Oct 13, 2022
@hebasto hebasto reopened this Nov 28, 2022
@hebasto hebasto changed the title [POC] Enable AVX2 implementation of SHA256 for MSVC builds Enable HW-accelerated implementations of SHA256 for MSVC builds Nov 28, 2022
@hebasto hebasto force-pushed the 220405-avx2 branch 2 times, most recently from f3c4857 to daddd58 Compare November 28, 2022 14:58
fanquake added a commit to bitcoin-core/gui that referenced this pull request Dec 11, 2023
6e0f1d2 msvc: Optimize "Release" builds (Hennadii Stepanov)

Pull request description:

  It is awkward not using optimization.

  In addition to the obvious benefits for Windows users, this PR reduces the duration of functional tests by an hour.

  Picked from bitcoin/bitcoin#24773.

ACKs for top commit:
  sipsorcery:
    tACK 6e0f1d2.

Tree-SHA512: 5aa7fd38cb1a81d58ea3206756a8099891866c82a747d3b8079cab0b2afa1f40ba53adff2f32eb233efcd1227babee80ab175e35a83678fafa8a4f63c356e5ca
Copy link
Contributor

@m3dwards m3dwards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested e94ae81 on Windows 11 with VS2022. CPU: i7-4870HQ and saw big speed improvement.

On master:

$ ./src/bench_bitcoin.exe -filter=SHA256D64.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               11.57 |       86,459,102.90 |    0.0% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
|               11.89 |       84,074,406.67 |    0.0% |      0.01 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
|               11.97 |       83,559,862.30 |    0.6% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
|               11.90 |       84,031,286.06 |    0.0% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`

Using #24773

$ ./src/bench_bitcoin.exe -filter=SHA256D64.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.77 |      361,557,983.01 |    0.1% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
|                4.33 |      230,699,639.18 |    0.1% |      0.01 | `SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation`
|                4.57 |      219,037,433.16 |    2.1% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
|               11.88 |       84,171,590.03 |    0.2% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`

One tiny nit on naming, not important to hold it up.

Comment on lines +32 to +33
/* Define this symbol to build in assembly routines */
#define USE_ASM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: From what I understand we are enabling the special CPU instructions using intrinsics and not inline assembly. This makes the comment a little misleading and makes the symbol unfortunately named.

Perhaps the symbol could be?

Suggested change
/* Define this symbol to build in assembly routines */
#define USE_ASM
/* Define this symbol to enable CPU SHA Extensions */
#define USE_SHA_EXTENTIONS

or?

Suggested change
/* Define this symbol to build in assembly routines */
#define USE_ASM
/* Define this symbol to enable CPU Extensions */
#define USE_INTRINSICS

@DrahtBot DrahtBot requested review from theuni and laanwj December 20, 2023 13:52
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jan 2, 2024
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov)
55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov)

Pull request description:

  This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list".

  But it is useful by itself as it makes code more concise and readable.

ACKs for top commit:
  Empact:
    Code review ACK 07ddecb
  laanwj:
    Code review ACK 07ddecb
  vincenzopalazzo:
    ACK bitcoin@07ddecb
  w0xlt:
    ACK 07ddecb

Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jan 2, 2024
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov)
55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov)

Pull request description:

  This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list".

  But it is useful by itself as it makes code more concise and readable.

ACKs for top commit:
  Empact:
    Code review ACK 07ddecb
  laanwj:
    Code review ACK 07ddecb
  vincenzopalazzo:
    ACK bitcoin@07ddecb
  w0xlt:
    ACK 07ddecb

Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jan 2, 2024
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov)
55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov)

Pull request description:

  This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list".

  But it is useful by itself as it makes code more concise and readable.

ACKs for top commit:
  Empact:
    Code review ACK 07ddecb
  laanwj:
    Code review ACK 07ddecb
  vincenzopalazzo:
    ACK bitcoin@07ddecb
  w0xlt:
    ACK 07ddecb

Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jan 2, 2024
07ddecb refactor: Use [[maybe_unused]] attribute (Hennadii Stepanov)
55e0fc8 refactor: Drop unneeded workarounds aimed to silence unused warning (Hennadii Stepanov)

Pull request description:

  This change is required for bitcoin#24773 as it prevents MSVC yelling about "warning C4551: function call missing argument list".

  But it is useful by itself as it makes code more concise and readable.

ACKs for top commit:
  Empact:
    Code review ACK 07ddecb
  laanwj:
    Code review ACK 07ddecb
  vincenzopalazzo:
    ACK bitcoin@07ddecb
  w0xlt:
    ACK 07ddecb

Tree-SHA512: 01791855a9ba742202d5718203303af989fcb501b7cf2a24ac8d78e87487acca38f77bef264b8e27e41ad1ccf96e426725cf65bfd96ce2ac71c46b3792bed857
@hebasto
Copy link
Member Author

hebasto commented Feb 12, 2024

Rebased.

@achow101
Copy link
Member

achow101 commented Apr 9, 2024

Deferring to after cmake

@achow101 achow101 closed this Apr 9, 2024
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 24, 2024
3f19875 scripted-diff: Use platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov)
e16c22f Introduce platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov)

Pull request description:

  Split from bitcoin#24773 as requested in bitcoin#24773 (comment).

ACKs for top commit:
  theuni:
    utACK 3f19875
  fanquake:
    ACK 3f19875

Tree-SHA512: a19b713433bb4d3c5fff1ddb4d1413837823a400c1d46363a8181e7632b059846ba92264be1c867f35f532af90945ed20887103471b09c07623e0f3905b4098b
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 25, 2024
3f19875 scripted-diff: Use platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov)
e16c22f Introduce platform-agnostic `ALWAYS_INLINE` macro (Hennadii Stepanov)

Pull request description:

  Split from bitcoin#24773 as requested in bitcoin#24773 (comment).

ACKs for top commit:
  theuni:
    utACK 3f19875
  fanquake:
    ACK 3f19875

Tree-SHA512: a19b713433bb4d3c5fff1ddb4d1413837823a400c1d46363a8181e7632b059846ba92264be1c867f35f532af90945ed20887103471b09c07623e0f3905b4098b
@bitcoin bitcoin locked and limited conversation to collaborators Apr 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants