Skip to content

Cranelift: Revive division-by-constant magic sequences #11129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 25, 2025

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Jun 24, 2025

  • Revive the old module for generating division-by-constant magic numbers

  • Add proptests for divide-by-constant magic number generators, just for a little extra correctness assurances

  • Apply division-by-constant magic sequences in the ISLE mid-end.

    The application of these sequences have been ported to ISLE now to better integrate with our existing mid-end optimizations and to allow for recursive rule application in the optimized right-hand sides.

Fixes #6049

fitzgen added 3 commits June 20, 2025 10:07
The application of these sequences have been ported to ISLE now to better
integrate with our existing mid-end optimizations and to allow for recursive
rule application in the optimized right-hand sides.

Fixes bytecodealliance#6049
@fitzgen fitzgen requested review from a team as code owners June 24, 2025 21:43
@fitzgen fitzgen requested review from alexcrichton and removed request for a team June 24, 2025 21:43
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 👍. Review-wise the main thing I did was double-check that the eval_* functions match the ISLE lowerings of the instructions to ensure that all the goodness of the proptest/test suite/etc mirrors into ISLE well. Very much appreciate the extensive test suite here!

A few thoughts:

  • Could the proptest bits be expanded to {u,s}rem as well as division? That way the final bits of ISLE can all additionally be mapped into Rust/proptest as well.
  • Is it true that in all cases this expansion is better than whatever the hardware does?
  • More of a curiosity, but how do egraphs know to favor these instruction sequences? This is expanding to quite a few more instrunctions than a div so do we have the cost of div/rem set really high?

@cfallin
Copy link
Member

cfallin commented Jun 24, 2025

Is it true that in all cases this expansion is better than whatever the hardware does?

I guess I'm becoming the resident instruction latency/throughput oracle today but yes definitely -- 64-bit integer divide has a latency of 35-88 cycles (!) on Skylake, with a throughput of 1/6 (so not fully pipelined), while 64-bit integer multiply is 3 cycles and is fully pipelined (1 per cycle). I vaguely recall some issue discussion from when we first switched over to the egraph mid-end where we noticed some benchmarks where this showed up as minor regressions, too. Very happy to see this finally ported over!

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:meta Everything related to the meta-language. isle Related to the ISLE domain-specific language labels Jun 25, 2025
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:meta", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen
Copy link
Member Author

fitzgen commented Jun 25, 2025

More of a curiosity, but how do egraphs know to favor these instruction sequences? This is expanding to quite a few more instrunctions than a div so do we have the cost of div/rem set really high?

We don't put the values defined by side-effectful instructions into the egraph, so we have a slightly different path. This is why we use the ISLE simplify_skeleton term instead of the usual simplify. This path assumes that all RHSes are an improvement over the LHS, and does a shallow/greedy cost comparison for choosing among multiple RHSes (rather than a dynamic programming thing that looks at deep structure).

Concretely, in this case there will be one RHS candidate (the new magic sequence), the one RHS is trivially the best RHS, and for simplify_skeleton we always take RHSes over LHSes when available. So basically, "we favor these instructions because we generated them".

@fitzgen fitzgen enabled auto-merge June 25, 2025 18:57
@fitzgen fitzgen added this pull request to the merge queue Jun 25, 2025
Merged via the queue into bytecodealliance:main with commit f678260 Jun 25, 2025
71 checks passed
@fitzgen fitzgen deleted the revive-div-rem-magic-consts branch June 25, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port divide-by-constant magic number optimizations from simple_preopt to ISLE mid-end
4 participants