Skip to content

Conversation

pjbgf
Copy link
Owner

@pjbgf pjbgf commented Jan 23, 2023

The AMD64 partial optimisation does the 80 rounds of SHA1 in goasm, and also stores the compression state which is later assessed for collision in the Go code.
This provides a significant speedup over the pure Go implementation, while keeping the goasm implementation reasonably simple and well aligned with the standard Go SHA1 implementation.

Go benchmark results:

goos: linux
goarch: amd64
pkg: github.com/pjbgf/sha1cd/test
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                                │       before        │                 after                      │
                                │       sec/op        │       sec/op         vs base               │
Hash8Bytes/sha1cd-16                     397.9n ± ∞ ¹          264.1n ± ∞ ¹  -33.63% (p=0.008 n=5)
Hash320Bytes/sha1cd-16                   2.130µ ± ∞ ¹          1.354µ ± ∞ ¹  -36.43% (p=0.008 n=5)
Hash1K/sha1cd-16                         5.832µ ± ∞ ¹          3.645µ ± ∞ ¹  -37.50% (p=0.008 n=5)
Hash8K/sha1cd-16                         43.56µ ± ∞ ¹          27.28µ ± ∞ ¹  -37.39% (p=0.008 n=5)
HashWithCollision/sha1cd-16              6.335µ ± ∞ ¹          4.588µ ± ∞ ¹  -27.58% (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95

                                │     before      │              after                    │
                                │       B/s       │      B/s        vs base               │
Hash8Bytes/sha1cd-16                19.18Mi ± ∞ ¹    28.90Mi ± ∞ ¹  +50.67% (p=0.008 n=5)
Hash320Bytes/sha1cd-16              143.3Mi ± ∞ ¹    225.4Mi ± ∞ ¹  +57.36% (p=0.008 n=5)
Hash1K/sha1cd-16                    167.5Mi ± ∞ ¹    267.9Mi ± ∞ ¹  +60.00% (p=0.008 n=5)
Hash8K/sha1cd-16                    179.3Mi ± ∞ ¹    286.4Mi ± ∞ ¹  +59.71% (p=0.008 n=5)
HashWithCollision/sha1cd-16         96.34Mi ± ∞ ¹   133.03Mi ± ∞ ¹  +38.08% (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95

Tests and benchmarks are now segregated by implementation. For benchmarks the comparison between Go's sha1, and the 3 different sha1cd implementations can be more easily observed:

BenchmarkHash8Bytes/sha1-16                      8186018               145.2 ns/op        55.11 MB/s           0 B/op          0 allocs/op
BenchmarkHash8Bytes/sha1cd_native-16             3309099               385.5 ns/op        20.75 MB/s         192 B/op          2 allocs/op
BenchmarkHash8Bytes/sha1cd_generic-16            2430501               495.0 ns/op        16.16 MB/s         192 B/op          2 allocs/op
BenchmarkHash8Bytes/sha1cd_cgo-16                1000000              1768 ns/op           4.53 MB/s        2688 B/op          1 allocs/op
BenchmarkHash320Bytes/sha1-16                    1932904               602.5 ns/op       531.13 MB/s           0 B/op          0 allocs/op
BenchmarkHash320Bytes/sha1cd_native-16            826519              1400 ns/op         228.58 MB/s         192 B/op          2 allocs/op
BenchmarkHash320Bytes/sha1cd_generic-16           600067              1828 ns/op         175.01 MB/s         192 B/op          2 allocs/op
BenchmarkHash320Bytes/sha1cd_cgo-16               623458              2664 ns/op         120.11 MB/s        2688 B/op          1 allocs/op
BenchmarkHash1K/sha1-16                          1186273              1010 ns/op        1013.73 MB/s           0 B/op          0 allocs/op
BenchmarkHash1K/sha1cd_native-16                  396523              2985 ns/op         343.09 MB/s         192 B/op          2 allocs/op
BenchmarkHash1K/sha1cd_generic-16                 242604              5307 ns/op         192.94 MB/s         192 B/op          2 allocs/op
BenchmarkHash1K/sha1cd_cgo-16                     309276              4273 ns/op         239.62 MB/s        2688 B/op          1 allocs/op
BenchmarkHash8K/sha1-16                           145369              8098 ns/op        1011.61 MB/s           0 B/op          0 allocs/op
BenchmarkHash8K/sha1cd_native-16                   45085             26536 ns/op         308.71 MB/s         192 B/op          2 allocs/op
BenchmarkHash8K/sha1cd_generic-16                  26968             44383 ns/op         184.57 MB/s         192 B/op          2 allocs/op
BenchmarkHash8K/sha1cd_cgo-16                      47247             23654 ns/op         346.33 MB/s        2688 B/op          1 allocs/op
BenchmarkHashWithCollision/sha1cd_native-16       249066              4769 ns/op         134.20 MB/s         192 B/op          2 allocs/op
BenchmarkHashWithCollision/sha1cd_generic-16      185652              6463 ns/op          99.02 MB/s         192 B/op          2 allocs/op
BenchmarkHashWithCollision/sha1cd_cgo-16          256411              4340 ns/op         147.46 MB/s        2688 B/op          1 allocs/op

This PR only introduces a native implementation for AMD64.

Relates to: #15

@pjbgf pjbgf force-pushed the optimisations branch 2 times, most recently from eae3603 to cc4e80e Compare February 25, 2023 10:53
The AMD64 partial optimisation does the 80 rounds of SHA1 in goasm, and also
stores the compression state which is later assessed for collision in the Go
code. This provides a significant speedup over the pure Go implementation,
while keeping the goasm implementation reasonably simple and well aligned with
the standard Go SHA1 implementation.

Go benchmark results:

goos: linux
goarch: amd64
pkg: github.com/pjbgf/sha1cd/test
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                                │       before        │                 after                      │
                                │       sec/op        │       sec/op         vs base               │
Hash8Bytes/sha1cd-16                     397.9n ± ∞ ¹          264.1n ± ∞ ¹  -33.63% (p=0.008 n=5)
Hash320Bytes/sha1cd-16                   2.130µ ± ∞ ¹          1.354µ ± ∞ ¹  -36.43% (p=0.008 n=5)
Hash1K/sha1cd-16                         5.832µ ± ∞ ¹          3.645µ ± ∞ ¹  -37.50% (p=0.008 n=5)
Hash8K/sha1cd-16                         43.56µ ± ∞ ¹          27.28µ ± ∞ ¹  -37.39% (p=0.008 n=5)
HashWithCollision/sha1cd-16              6.335µ ± ∞ ¹          4.588µ ± ∞ ¹  -27.58% (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95

                                │     before      │            after                 │
                                │      B/op       │     B/op       vs base           │
Hash8Bytes/sha1cd-16                    0.0 ± ∞ ¹     112.0 ± ∞ ¹  ? (p=0.008 n=5)
Hash320Bytes/sha1cd-16                  0.0 ± ∞ ¹     112.0 ± ∞ ¹  ? (p=0.008 n=5)
Hash1K/sha1cd-16                        0.0 ± ∞ ¹     112.0 ± ∞ ¹  ? (p=0.008 n=5)
Hash8K/sha1cd-16                        0.0 ± ∞ ¹     112.0 ± ∞ ¹  ? (p=0.008 n=5)
HashWithCollision/sha1cd-16             0.0 ± ∞ ¹     112.0 ± ∞ ¹  ? (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95

                                │     before      │           after                │
                                │    allocs/op    │  allocs/op   vs base           │
Hash8Bytes/sha1cd-16                  0.000 ± ∞ ¹   1.000 ± ∞ ¹  ? (p=0.008 n=5)
Hash320Bytes/sha1cd-16                0.000 ± ∞ ¹   1.000 ± ∞ ¹  ? (p=0.008 n=5)
Hash1K/sha1cd-16                      0.000 ± ∞ ¹   1.000 ± ∞ ¹  ? (p=0.008 n=5)
Hash8K/sha1cd-16                      0.000 ± ∞ ¹   1.000 ± ∞ ¹  ? (p=0.008 n=5)
HashWithCollision/sha1cd-16           0.000 ± ∞ ¹   1.000 ± ∞ ¹  ? (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95

                                │     before      │              after                    │
                                │       B/s       │      B/s        vs base               │
Hash8Bytes/sha1cd-16                19.18Mi ± ∞ ¹    28.90Mi ± ∞ ¹  +50.67% (p=0.008 n=5)
Hash320Bytes/sha1cd-16              143.3Mi ± ∞ ¹    225.4Mi ± ∞ ¹  +57.36% (p=0.008 n=5)
Hash1K/sha1cd-16                    167.5Mi ± ∞ ¹    267.9Mi ± ∞ ¹  +60.00% (p=0.008 n=5)
Hash8K/sha1cd-16                    179.3Mi ± ∞ ¹    286.4Mi ± ∞ ¹  +59.71% (p=0.008 n=5)
HashWithCollision/sha1cd-16         96.34Mi ± ∞ ¹   133.03Mi ± ∞ ¹  +38.08% (p=0.008 n=5)
geomean                             142.8Mi          160.4Mi        +12.33%
¹ need >= 6 samples for confidence interval at level 0.95

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The tests and benchmarks are now segregate by implementation, so it
is easier to compare results and ensure regression for cgo, generic
and native implementations.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
@pjbgf pjbgf merged commit a2b84bc into main Feb 25, 2023
@pjbgf pjbgf deleted the optimisations branch February 25, 2023 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant