You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[LoopVectorize] Add support for vectorisation of simple early exit loops
This patch adds support for vectorisation of a simple class of loops
that typically involves searching for something, i.e.
for (int i = 0; i < n; i++) {
if (p[i] == val)
return i;
}
return n;
or
for (int i = 0; i < n; i++) {
if (p1[i] != p2[i])
return i;
}
return n;
In this initial commit we only vectorise loops with the following
criteria:
1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the
above example. I have referred to such exits as speculative early
exits, to distinguish from existing support for early exits where
the exit-not-taken count is known exactly at compile time.
2. The early exit block dominates the latch block.
3. There are no loads after the early exit block.
4. The loop must not contain reductions or recurrences. I don't
see anything fundamental blocking vectorisation of such loops, but
I just haven't done the work to support them yet.
5. We must be able to prove at compile-time that loops will not
contain faulting loads.
For point 5 once this patch lands I intend to follow up by supporting
some limited cases of faulting loops where we can version the loop
based on pointer alignment. For example, it turns out in the
SPEC2017 benchmark there is a std::find loop that we can vectorise
provided we add SCEV checks for the initial pointer being aligned
to a multiple of the VF. In practice, the pointer is regularly
aligned to at least 32/64 bytes and since the VF is a power of 2, any
vector loads <= 32/64 bytes in size will always fault on the first
lane, following the same behaviour as the scalar loop. Given we
already do such speculative versioning for loops with unknown strides,
alignment-based versioning doesn't seem to be any worse.
This patch makes use of the existing experimental_cttz_elems intrinsic
that's required in the vectorised early exit block to determine the
first lane that triggered the exit. This intrinsic has generic
lowering support so it's guaranteed to work for all targets.
Tests have been added here:
Transforms/LoopVectorize/AArch64/simple_early_exit.ll
0 commit comments