Skip to content

Tweak SepParserAvx512PackCmpOrMoveMaskTzcnt by moving MoveMask #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 19, 2025

Conversation

nietras
Copy link
Owner

@nietras nietras commented Apr 19, 2025

BEFORE

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 9950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.203
  [Host]     : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TORZYV : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=Job-TORZYV  EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=0  Runtime=.NET 9.0
Toolchain=net90  InvocationCount=Default  IterationTime=350ms
MaxIterationCount=15  MinIterationCount=5  WarmupCount=6
Quotes=False  Reader=String  Error=0.0153 ms
StdDev=0.0068 ms

| Method    | Scope | Rows  | Mean     | Ratio | MB | MB/s    | ns/row | Allocated | Alloc Ratio |
|---------- |------ |------ |---------:|------:|---:|--------:|-------:|----------:|------------:|
| Sep______ | Row   | 50000 | 1.629 ms |  1.00 | 29 | 17908.5 |   32.6 |   1.17 KB |        1.00 |
G_M000_IG05:                ;; offset=0x008E
       cmp      rsi, rbp
       jb       G_M000_IG39
       mov      edi, r9d
       lea      rdi, bword ptr [r10+2*rdi]
       vmovups  zmm4, zmmword ptr [rdi]
       vpackuswb zmm4, zmm4, zmmword ptr [rdi+0x40]
       vmovups  zmm5, zmmword ptr [reloc @RWD00]
       vpermq   zmm4, zmm5, zmm4
       vpcmpeqb k1, zmm4, zmm0
       vpmovm2b zmm5, k1
       vpcmpeqb k1, zmm4, zmm1
       vpmovm2b zmm16, k1
       vpcmpeqb k1, zmm4, zmm2
       vpmovm2b zmm17, k1
       vpcmpeqb k1, zmm4, zmm3
       vpmovm2b zmm4, k1
       vpternlogd zmm5, zmm4, zmm16, -2
       vpord    zmm16, zmm5, zmm17
       vpmovb2m k1, zmm16
       kmovq    r15, k1
       test     r15, r15
       je       G_M000_IG03
       vpmovb2m k1, zmm4
       kmovq    r13, k1
       lea      r12, [r15+r8]
       cmp      r13, r12
       je       G_M000_IG43

G_M000_IG06:                ;; offset=0x012A
       vpmovb2m k1, zmm5
       kmovq    rcx, k1
       cmp      rcx, r12
       je       G_M000_IG22

G_M000_IG07:                ;; offset=0x0141
       xor      ecx, ecx

AFTER

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 9950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.203
  [Host]     : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TKOEXX : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=Job-TKOEXX  EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=0  Runtime=.NET 9.0
Toolchain=net90  InvocationCount=Default  IterationTime=350ms
MaxIterationCount=15  MinIterationCount=5  WarmupCount=6
Quotes=False  Reader=String  Error=0.0108 ms
StdDev=0.0017 ms

| Method    | Scope | Rows  | Mean     | Ratio | MB | MB/s    | ns/row | Allocated | Alloc Ratio |
|---------- |------ |------ |---------:|------:|---:|--------:|-------:|----------:|------------:|
| Sep______ | Row   | 50000 | 1.498 ms |  1.00 | 29 | 19476.8 |   30.0 |   1.23 KB |        1.00 |
G_M000_IG05:                ;; offset=0x008E
       cmp      rsi, rbp
       jb       G_M000_IG39
       mov      edi, r9d
       lea      rdi, bword ptr [r10+2*rdi]
       vmovups  zmm4, zmmword ptr [rdi]
       vpackuswb zmm4, zmm4, zmmword ptr [rdi+0x40]
       vmovups  zmm5, zmmword ptr [reloc @RWD00]
       vpermq   zmm4, zmm5, zmm4
       vpcmpeqb k1, zmm4, zmm0
       kmovq    r15, k1
       vpcmpeqb k1, zmm4, zmm1
       kmovq    r13, k1
       vpcmpeqb k1, zmm4, zmm2
       kmovq    r12, k1
       vpcmpeqb k1, zmm4, zmm3
       kmovq    rcx, k1
       or       r15, rcx
       or       r15, r13
       or       r12, r15
       je       SHORT G_M000_IG03
       mov      r13, rcx
       lea      rcx, [r12+r8]
       cmp      r13, rcx
       je       G_M000_IG43

G_M000_IG06:                ;; offset=0x010E
       cmp      r15, rcx
       je       G_M000_IG22

G_M000_IG07:                ;; offset=0x0117
       xor      ecx, ecx

Copy link

codecov bot commented Apr 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.63%. Comparing base (6dcb9f7) to head (b05607f).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #294   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files          54       54           
  Lines        4366     4366           
  Branches      507      507           
=======================================
  Hits         4350     4350           
  Misses         12       12           
  Partials        4        4           
Flag Coverage Δ
Debug 99.35% <ø> (ø)
Release 99.73% <ø> (ø)
macos-latest 93.40% <ø> (ø)
ubuntu-latest 99.51% <ø> (ø)
windows-latest 99.51% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nietras nietras merged commit 640f8ba into main Apr 19, 2025
34 checks passed
@nietras nietras deleted the tweak-avx512-movemask branch April 19, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant