Skip to content

Souper-synthesized optimizations that we should investigate generalizing and adding to the mid-end #5783

@fitzgen

Description

@fitzgen

Here are some synthesized optimizations for CLIF harvested from
spidermonkey.wasm with explicit bounds checks enabled. I won't have time to
investigate, generalize, or implement them before I go on vacation, so I want to
note them down in an issue for posterity.


Replacing clz(x) == 0 with a comparison:

========= ./2559773154913247904.result =========
%0:i32 = var
%1:i32 = ctlz %0
%2:i1 = eq %1, 0:i32
infer %2

; RHS inferred successfully
%3:i1 = ult 2147483647:i32, %0
result %3

Similarly, we can do this for clz(x) == 1, which wasn't harvested from the
CLIF:

%0:i32 = var
%1:i32 = ctlz %0
%2:i1 = eq %1, 1:i32
infer %2

; RHS inferred successfully
%3:i1 = slt 1073741823:i32, %0
result %3

Note that there is no generalization for clz(x) == C available here (unless
C is larger than x's bit width, in which case it is always false). It only
works for zero and one because we don't need to check for a lower bound on x.

Probably a similar rewrite we could do with clz(x) == OP_BIT_WIDTH.


"Reverse" const propagation of a shl into a select:

========= ./11901186248914954319.result =========
%1:i1 = var
%2:i32 = select %1, 9:i32, 1:i32
%3:i32 = shl -1:i32, %2
infer %3

; RHS inferred successfully
%3:i32 = select %1, 4294966784:i32, 4294967294:i32
result %3

Maybe not profitable if %2 is used multiple times, in which case %1's live
range might be extended after this optimization. But I guess this is always true
with these sorts of peepholes...


Haven't dug into this one yet:

========= ./2969648510667058023.result =========
%0:i32 = var
%1:i32 = and %0, 2147483647:i32
%2:i32 = shl %1, 1:i32
infer %2

; RHS inferred successfully
%3:i32 = add %0, %0
result %3

A bunch of masking off bits that will be shifted out anyways:

========= ./4985840733093600201.result =========
%0:i32 = var
%1:i32 = and %0, 1073741823:i32
%2:i32 = shl %1, 2:i32
infer %2

; RHS inferred successfully
%3:i32 = shl %0, 2:i32
result %3

========= ./17860953418551930245.result =========
%0:i32 = var
%1:i32 = and %0, 268435455:i32
%2:i32 = shl %1, 4:i32
infer %2

; RHS inferred successfully
%3:i32 = shl %0, 4:i32
result %3

========= ./8617343957324108668.result =========
%0:i32 = var
%1:i32 = and %0, 536870911:i32
%2:i32 = shl %1, 3:i32
infer %2

; RHS inferred successfully
%3:i32 = shl %0, 3:i32
result %3

Replacing an and and an eq with an ult. Haven't thought about these yet.

========= ./9633751011751143656.result =========
%0:i32 = var
%1:i32 = and %0, -2147483648:i32
%2:i1 = eq %1, 0:i32
infer %2

; RHS inferred successfully
%3:i1 = ult %0, 2147483648:i32
result %3

========= ./3840756061434214754.result =========
%0:i32 = var
%1:i32 = and %0, 4294967280:i32
%2:i1 = eq %1, 0:i32
infer %2

; RHS inferred successfully
%3:i1 = ult %0, 16:i32
result %3

Unnecessary ors:

========= ./14551618322220925804.result =========
%0:i32 = var
%1:i32 = or %0, 1024:i32
%2:i32 = and %1, 2048:i32
infer %2

; RHS inferred successfully
%3:i32 = and 2048:i32, %0
result %3

========= ./13190508444419006141.result =========
%0:i32 = var
%1:i32 = or %0, 33032:i32
%2:i32 = and %1, 192:i32
infer %2

; RHS inferred successfully
%3:i32 = and 192:i32, %0
result %3

More masking off bits that will be shifted away:

========= ./3703682576609255056.result =========
%0:i32 = var
%1:i32 = and %0, 255:i32
%2:i32 = shl %1, 24:i32
infer %2

; RHS inferred successfully
%3:i32 = shl %0, 24:i32
result %3

========= ./5310030443405410098.result =========
%0:i32 = var
%1:i32 = and %0, 255:i32
%2:i32 = shl %1, 25:i32
infer %2

; RHS inferred successfully
%3:i32 = shl %0, 25:i32
result %3

A ton of "reverse" const prop with comparisons found, here are a few:

========= ./13877742632332331899.result =========
%0:i32 = var
%1:i32 = add %0, 1:i32
%2:i1 = eq %1, 1:i32
infer %2

; RHS inferred successfully
%3:i1 = eq 0:i32, %0
result %3

========= ./3568272213429168939.result =========
%0:i32 = var
%1:i32 = add %0, 44:i32
%2:i1 = eq %1, 3608:i32
infer %2

; RHS inferred successfully
%3:i1 = eq 3564:i32, %0
result %3

========= ./12119931432587256453.result =========
%0:i32 = var
%1:i32 = ashr %0, 2:i32
%2:i1 = slt %1, 24:i32
infer %2

; RHS inferred successfully
%3:i1 = slt %0, 96:i32
result %3

========= ./7033427926372082757.result =========
%0:i32 = var
%1:i32 = and %0, -8:i32
%2:i1 = sle %1, 31:i32
infer %2

; RHS inferred successfully
%3:i1 = slt %0, 32:i32
result %3

And also a bunch "reverse" const prop with other operators as well (not as many
as with comparisons though). Again, here are a few:

========= ./7402500702784523674.result =========
%0:i32 = var
%1:i32 = shl %0, 1:i32
%2:i32 = shl %1, 16:i32
infer %2

; RHS inferred successfully
%3:i32 = shl %0, 17:i32
result %3

========= ./10936462652029262118.result =========
%0:i32 = var
%1:i32 = shl %0, 1:i32
%2:i32 = mul %1, 12:i32
infer %2

; RHS inferred successfully
%3:i32 = mul 24:i32, %0
result %3

========= ./13431533325972989820.result =========
%0:i32 = var
%1:i32 = shl -1:i32, %0
%2:i32 = shl %1, 2:i32
infer %2

; RHS inferred successfully
%3:i32 = shl 4294967292:i32, %0
result %3

Metadata

Metadata

Assignees

No one assigned

    Labels

    cranelift:goal:optimize-speedFocus area: the speed of the code produced by Cranelift.souperIssues and PRs related to the Souper superoptimizer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions