Skip to content

[x86] Codegen phaddw, phaddd, and pmaddwd #6878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

rootjalex
Copy link
Member

@rootjalex rootjalex commented Jul 21, 2022

This PR adds support for int16 -> int32 horizontal widening adds to use pmaddwd, and pattern matches on horizontal adds to use phadd(w | d), which is faster than the permute + padd that we currently generate for such reductions (cite @abadams + llvm-mca because my x86 machine is currently broken).

Also fly-by move of some code that I incorrectly placed in the wrong runtime file in #6677 .

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green -- I assume these changes are good against non-trunk LLVM versions too.

@rootjalex
Copy link
Member Author

Yes, it should be fine - the phadd intrinsics have been in LLVM for over a decade sse3 avx2.

@steven-johnson
Copy link
Contributor

The vector and scalar versions of op_phaddw_285 disagree. Maximum error: 60123

@rootjalex
Copy link
Member Author

Failures were due to a shuffling mistake, I forgot that AVX2 has the 128 bit boundaries for instructions - fixed in 0acae70, and confirmed that we still produce better codegen for reductions.

@abadams
Copy link
Member

abadams commented Jul 22, 2022

There's a trade-off between adding new llvm ir implementations of things, and rewriting Halide Exprs into forms that would match existing patterns. E.g. for the widening add, you could rewrite the Expr so that it matches should_use_dot_product.

My question is: Did you consider that, and does this PR fall on the right side of that line?

@rootjalex
Copy link
Member Author

We don't have any existing patterns for phadd instructions, so I'll only respond to your question in regards to the use of pmaddwd here.
I did not consider rewriting within Halide IR, but your suggestion raises a good point - I realize that I did not provide LLVM implementations for the AVX512 dot_product instructions, which could be similarly used. I suspect that your suggest would benefit from greater generality, and would be better suited for extensibility, so I will change this PR to transform all of the horizontal_widening_adds into the pattern that will match against the existing dot_product patterns

@rootjalex
Copy link
Member Author

I will say that I don't think these particular patterns are suitable for rewriting to enable should_use_dot_product, but instead I will aim for producing the patterns that will match the VectorReduce dot_product instructions.

@steven-johnson
Copy link
Contributor

buildbots are green -- is this ready to land?

@rootjalex
Copy link
Member Author

I might close this PR in favor of #6884 , which will make the horizontal_widening_add patterns useful on any architecture that has pmaddwd variants. Just waiting to get feedback on whether that PR is indeed a better way of doing things.

@steven-johnson
Copy link
Contributor

Monday Morning Review Ping -- where does this PR stand? Is it still likely to be dropped in favor of #6884?

@rootjalex
Copy link
Member Author

Yes, I am closing this in favor of #6884 . If we end up not merging #6884 for some reason, I will re-open this PR.

@rootjalex rootjalex closed this Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants