-
Notifications
You must be signed in to change notification settings - Fork 1.5k
x64: generate rules to pick between SSE or AVX instructions #10942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds the ability for an instruction to track an alternate version of itself. This is helpful for connecting SSE instructions to their upgraded AVX version. The logic here compares the mnemonic name minus the `v` prefix and checks that the opcodes are the same. This auto-assignment cautiously limits us to SSE instructions like `rw(xmm), r(xmm_m*)`, which have a most regular SSE-AVX relationship. To see the list of instructions that have been assigned alternates, from the `cranelift/assembler-x64/meta` directory, run: ```console $ cargo run | grep alternate ```
This change updates `gen_asm.rs` to emit an extra rule for instructions that have an AVX alternate: `x64_<inst>_or_avx`. The rule logic has the same shape as what we currently hand-write ~180 times in `inst.isle` (!).
// Automatically assign AVX alternates to SSE instructions (see | ||
// `Inst::alternate`). This allows later code generation to | ||
// instruction-select between the AVX and SSE versions of an instruction. | ||
assign_avx_alternates(&mut all); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation (below) could all go live in a separate module (?).
@@ -504,6 +504,38 @@ pub fn generate_isle_inst_decls(f: &mut Formatter, inst: &Inst) { | |||
f, | |||
"(rule ({rule_name} {param_names}) ({convert} ({raw_name} {implicit_params} {param_names})))" | |||
); | |||
|
|||
if let Some(avx_name) = &inst.alternate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not terribly happy about dropping this here; it will work fine but at some point we may need to refactor the function this is in. It is just quite handy that all the parameter types and names have been pre-calculated for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- neat that this was made to work! Some thoughts below. If this PR turns out to be the best way I'm fine with it going in but I'm curious what you think of the below...
/// opcode if one is available and the primary otherwise. | ||
pub fn opcode(&self) -> u8 { | ||
match self { | ||
Encoding::Rex(rex) => match rex.opcodes.secondary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this nested match
in an opcode()
method on rex.opcodes
?
Also, it's not totally clear to me what "primary opcode" means in the doc-comment above and why we return the secondary for the "primary" if it exists -- I read the doc-comment for Opcodes
below too and I'm still a little fuzzy. Could we say what we mean here by "primary" and why the secondary is the primary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual uses "primary" and "secondary" to explain all this but, yes, it kind of feels like the "primary" opcode should always be the last one. I'll add Opcodes::opcode
and try to explain things there a bit better.
all | ||
} | ||
|
||
/// Assigns AVX alternates to SSE instructions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's neat that this was made to work automagically -- all the same, I worry a tiny bit that it may be brittle (unexpectedly determine a match when some subtle condition means they're not quite equivalent; or on the other side, miss a match because of something). The note below about rounding and the fact this limits things to just one format for now as a result seems to bear this out a little.
In the spirit of the explicit DSL, I wonder if it would be better to have a clause on the instruction builder that sets the alternate? Something like
inst("vpadd", ...).avx_for("padd"),
inst("padd", ...),
...
(avx_for
, upgrades
, equiv
, pick a verb phrase...)
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did consider adding an explicit version: inst(...).alt("vpadd")
. I can't remember what dissuaded me, but probably laziness (there will be a lot of places to update!). But it is more clear so I'm in favor of that — less magic. I'll rework this to use a DSL function and use the current "check the SSE-to-AVX link" logic to verify that the user has specified .alt(...)
correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd agree with @cfallin as well in that I'm in favor of explicitly saying "this is the AVX thing for that other instruction" which then generates a panic/assertion/etc if something doesn't match. (I realize y'all've already reached this conclusion but wanted to echo my opinion here too to voice support)
This change uses `Inst::alt()` to explicitly mark which instructions have an AVX alternate. This retains the strict checks preventing us from manually making a mistake. It also passes along the `Feature` in an `Alternate` structure to make this easier to extend later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
The new assembler gives us the opportunity to generate some extra ISLE rules to eliminate boilerplate like the following, hand-written around 180 times throughout
inst.isle
:This PR only attempts to solve that exact pattern (i.e.,
Xmm XmmMem
) but the approach could scale to more patterns in the future.