-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[x86] Codegen phaddw, phaddd, and pmaddwd #6878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending green -- I assume these changes are good against non-trunk LLVM versions too.
|
Failures were due to a shuffling mistake, I forgot that AVX2 has the 128 bit boundaries for instructions - fixed in 0acae70, and confirmed that we still produce better codegen for reductions. |
There's a trade-off between adding new llvm ir implementations of things, and rewriting Halide Exprs into forms that would match existing patterns. E.g. for the widening add, you could rewrite the Expr so that it matches should_use_dot_product. My question is: Did you consider that, and does this PR fall on the right side of that line? |
We don't have any existing patterns for |
I will say that I don't think these particular patterns are suitable for rewriting to enable |
buildbots are green -- is this ready to land? |
I might close this PR in favor of #6884 , which will make the |
Monday Morning Review Ping -- where does this PR stand? Is it still likely to be dropped in favor of #6884? |
This PR adds support for
int16 -> int32
horizontal widening adds to usepmaddwd
, and pattern matches on horizontal adds to usephadd(w | d)
, which is faster than the permute + padd that we currently generate for such reductions (cite @abadams + llvm-mca because my x86 machine is currently broken).Also fly-by move of some code that I incorrectly placed in the wrong runtime file in #6677 .