Skip to content

GuardWithIf is really slow on small-extent vectorized loops on architectures without predication #7947

@abadams

Description

@abadams

Two possible new tail strategies that could help:

RoundUpAndBlend, which would be like GuardWithIf but it would use a select instead of an if, so if GuardWithIf does something equivalent to:

f(x) = if x < extent then g(x) else dontcare

RoundUpAndBlend would do:

f(x) = select(x < extent, g(x), f(x))

I.e. it loads the vector it would store to, modifies some of the lanes, and then stores the result. This would be a race condition if there's an outer parallel loop in that dimension, so we'd have to check for that.

ShiftInwardsAndBlend would be similar, but shifting inwards instead of rounding up, so that the overall allocation bounds aren't expanded if the extent is at least one vector. It would be really useful for vectorizing pure vars in update definitions touching inputs and outputs when you expect the extent to be small.

Specifically, I want to use this schedule:

output.update().specialize(output.width() < vec);
output.update().vectorize(x, vec, TailStrategy::ShiftInwardsAndBlend);

Metadata

Metadata

Assignees

Labels

enhancementNew user-visible features or improvements to existing features.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions