[x86] Codegen phaddw, phaddd, and pmaddwd #6878

rootjalex · 2022-07-21T20:09:54Z

This PR adds support for int16 -> int32 horizontal widening adds to use pmaddwd, and pattern matches on horizontal adds to use phadd(w | d), which is faster than the permute + padd that we currently generate for such reductions (cite @abadams + llvm-mca because my x86 machine is currently broken).

Also fly-by move of some code that I incorrectly placed in the wrong runtime file in #6677 .

steven-johnson

LGTM pending green -- I assume these changes are good against non-trunk LLVM versions too.

rootjalex · 2022-07-21T20:50:50Z

Yes, it should be fine - the phadd intrinsics have been in LLVM for over a decade sse3 avx2.

steven-johnson · 2022-07-21T21:30:31Z

The vector and scalar versions of op_phaddw_285 disagree. Maximum error: 60123

rootjalex · 2022-07-22T15:45:35Z

Failures were due to a shuffling mistake, I forgot that AVX2 has the 128 bit boundaries for instructions - fixed in 0acae70, and confirmed that we still produce better codegen for reductions.

abadams · 2022-07-22T18:46:27Z

There's a trade-off between adding new llvm ir implementations of things, and rewriting Halide Exprs into forms that would match existing patterns. E.g. for the widening add, you could rewrite the Expr so that it matches should_use_dot_product.

My question is: Did you consider that, and does this PR fall on the right side of that line?

rootjalex · 2022-07-22T18:55:28Z

We don't have any existing patterns for phadd instructions, so I'll only respond to your question in regards to the use of pmaddwd here.
I did not consider rewriting within Halide IR, but your suggestion raises a good point - I realize that I did not provide LLVM implementations for the AVX512 dot_product instructions, which could be similarly used. I suspect that your suggest would benefit from greater generality, and would be better suited for extensibility, so I will change this PR to transform all of the horizontal_widening_adds into the pattern that will match against the existing dot_product patterns

rootjalex · 2022-07-22T18:57:20Z

I will say that I don't think these particular patterns are suitable for rewriting to enable should_use_dot_product, but instead I will aim for producing the patterns that will match the VectorReduce dot_product instructions.

steven-johnson · 2022-07-26T00:00:30Z

buildbots are green -- is this ready to land?

rootjalex · 2022-07-26T03:28:18Z

I might close this PR in favor of #6884 , which will make the horizontal_widening_add patterns useful on any architecture that has pmaddwd variants. Just waiting to get feedback on whether that PR is indeed a better way of doing things.

steven-johnson · 2022-08-22T16:46:14Z

Monday Morning Review Ping -- where does this PR stand? Is it still likely to be dropped in favor of #6884?

rootjalex · 2022-08-22T17:40:23Z

Yes, I am closing this in favor of #6884 . If we end up not merging #6884 for some reason, I will re-open this PR.

rootjalex added 3 commits July 21, 2022 14:41

x86 should use phaddw and phaddd

625c18b

remove accidental duplicat code

fba66fd

add simd_op_check tests

23e3d13

rootjalex requested review from steven-johnson and abadams July 21, 2022 20:10

rootjalex added the performance label Jul 21, 2022

steven-johnson approved these changes Jul 21, 2022

View reviewed changes

fix avx2 phadd shuffles

0acae70

rootjalex mentioned this pull request Jul 25, 2022

[x86] Separate vector instruction selection and CodeGen passes #6884

Open

3 tasks

rootjalex closed this Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[x86] Codegen phaddw, phaddd, and pmaddwd #6878

[x86] Codegen phaddw, phaddd, and pmaddwd #6878

Uh oh!

rootjalex commented Jul 21, 2022 •

edited

Loading

Uh oh!

steven-johnson left a comment

Uh oh!

rootjalex commented Jul 21, 2022

Uh oh!

steven-johnson commented Jul 21, 2022

Uh oh!

rootjalex commented Jul 22, 2022

Uh oh!

abadams commented Jul 22, 2022

Uh oh!

rootjalex commented Jul 22, 2022

Uh oh!

rootjalex commented Jul 22, 2022

Uh oh!

steven-johnson commented Jul 26, 2022

Uh oh!

rootjalex commented Jul 26, 2022

Uh oh!

steven-johnson commented Aug 22, 2022

Uh oh!

rootjalex commented Aug 22, 2022

Uh oh!

Uh oh!

[x86] Codegen phaddw, phaddd, and pmaddwd #6878

[x86] Codegen phaddw, phaddd, and pmaddwd #6878

Uh oh!

Conversation

rootjalex commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steven-johnson left a comment

Choose a reason for hiding this comment

Uh oh!

rootjalex commented Jul 21, 2022

Uh oh!

steven-johnson commented Jul 21, 2022

Uh oh!

rootjalex commented Jul 22, 2022

Uh oh!

abadams commented Jul 22, 2022

Uh oh!

rootjalex commented Jul 22, 2022

Uh oh!

rootjalex commented Jul 22, 2022

Uh oh!

steven-johnson commented Jul 26, 2022

Uh oh!

rootjalex commented Jul 26, 2022

Uh oh!

steven-johnson commented Aug 22, 2022

Uh oh!

rootjalex commented Aug 22, 2022

Uh oh!

Uh oh!

rootjalex commented Jul 21, 2022 •

edited

Loading