Skip to content

Conversation

nietras
Copy link
Owner

@nietras nietras commented Apr 19, 2025

New better approach for AVX-512 due to mask register code gen issues. Hits ~21 GB/s on a good day!

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 9950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.203
  [Host]     : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-SEDLIK : .NET 9.0.4 (9.0.425.16305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=Job-SEDLIK  EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=0  Runtime=.NET 9.0
Toolchain=net90  InvocationCount=Default  IterationTime=350ms
MaxIterationCount=15  MinIterationCount=5  WarmupCount=6
Quotes=False  Reader=String  Error=0.0242 ms
StdDev=0.0086 ms

| Method    | Scope | Rows  | Mean     | Ratio | MB | MB/s    | ns/row | Allocated | Alloc Ratio |
|---------- |------ |------ |---------:|------:|---:|--------:|-------:|----------:|------------:|
| Sep______ | Row   | 50000 | 1.408 ms |  1.00 | 29 | 20726.9 |   28.2 |   1.01 KB |        1.00 |
G_M000_IG05:                ;; offset=0x009C
       cmp      rsi, rbp
       jb       G_M000_IG39
       mov      edi, r9d
       lea      rdi, bword ptr [r10+2*rdi]
       vmovups  zmm4, zmmword ptr [rdi]
       vpmovuswb zmm4, zmm4
       vpcmpeqb ymm5, ymm4, ymm0
       vpcmpeqb ymm6, ymm4, ymm1
       vpcmpeqb ymm7, ymm4, ymm2
       vpcmpeqb ymm4, ymm4, ymm3
       vpternlogd ymm5, ymm4, ymm6, -2
       vpor     ymm6, ymm5, ymm7
       vpmovmskb r15d, ymm6
       mov      r15d, r15d
       test     r15, r15
       je       SHORT G_M000_IG03
       vpmovmskb r13d, ymm4
       mov      r13d, r13d
       lea      r12, [r15+r8]
       cmp      r13, r12
       je       G_M000_IG43

G_M000_IG06:                ;; offset=0x00F3
       vpmovmskb ecx, ymm5
       mov      ecx, ecx
       cmp      rcx, r12
       je       G_M000_IG22

G_M000_IG07:                ;; offset=0x0102
       xor      ecx, ecx

Copy link

codecov bot commented Apr 19, 2025

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 99.56%. Comparing base (6dcb9f7) to head (c394d3b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/Sep/Internals/SepParserFactory.cs 0.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #295      +/-   ##
==========================================
- Coverage   99.63%   99.56%   -0.07%     
==========================================
  Files          54       54              
  Lines        4366     4369       +3     
  Branches      507      508       +1     
==========================================
  Hits         4350     4350              
- Misses         12       14       +2     
- Partials        4        5       +1     
Flag Coverage Δ
Debug 99.29% <0.00%> (-0.07%) ⬇️
Release 99.70% <0.00%> (-0.04%) ⬇️
macos-latest 93.33% <0.00%> (-0.07%) ⬇️
ubuntu-latest 99.45% <0.00%> (-0.07%) ⬇️
windows-latest 99.45% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nietras nietras merged commit b364d43 into main Apr 19, 2025
33 of 34 checks passed
@nietras nietras deleted the avx512to256 branch April 19, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant