- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
[WIP] New implementation of the slice iterators using NonZero #46223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| I haven't run so many benchmarks yet, but the ones I do are unchanged. These ones below are of very simple loops. from ndarray "cargo bench --bench higher-order":  name                   xx ns/iter  yy ns/iter  diff ns/iter  diff %
 fold_axis              681         676                   -5  -0.73%
 map_axis_0             451         452                    1   0.22%
 map_axis_1             452         453                    1   0.22%
 map_regular            172         169                   -3  -1.74%
 map_stride_double_f64  504         506                    2   0.40%
 map_stride_f64         652         666                   14   2.15%
 map_stride_u32         690         681                   -9  -1.30% | 
| These differences seem small enough to just be noise; is the generated assembly actually identical? | 
| cc @rust-lang/libs | 
| @gankro Sure, I intended to present them as exactly that, noise. However I'll have to find better benchmarks. I removed the zip benchmarks -- those are entirely in a specialization and don't go through Iterator::next(). The PR posted here so that the full test suite is ran and for comments. | 
| Having looked through some benchmarks, nothing stood out. (Some examples a few pct better and apparently some a few pct worse, but it needs to be reduced to very small examples to tell us something.) I found this code gen example that autovectorizes on nightly but not using a compiler with this PR: #[no_mangle]
pub fn zip_and(a: &[i32], b: &[i32]) -> i32 {
    let mut result = 0;
    let mut iter_a = a.iter();
    let mut iter_b = b.iter();
    loop {
        if let Some(&elt_a) = iter_a.next() {
            if let Some(&elt_b) = iter_b.next() {
                result |= elt_a | elt_b;
                continue;
            }
        }
        break;
    }
    result
}A "small" example of 450 lines of unoptimized IR without debug info. This PR removes the assume(!end.is_null()) or equivalent, and maybe that causes this regression. Otherwise I am open to suggestions. (Edit: I checked, it doesn't seem to help to use a  | 
| Yeah, LLVM is inconsistent with its metadata, keeping the  | 
Reimplement the slice iterators using `NonZero` where possible.
1. Iter, IterMut now have a non-nullable field
   a. The usual enum layout optimizations apply
   b. Automatic and more principled nonnull metadata for that pointer
2. Big change for zero sized elements. The end field is a counter.
   The produced pointer value is equivalent to `slice.as_ptr()` and
   thus `&slice[i]` for each index `i` in this case.
* Implementation Notes
** Regular case, T is not zero-sized
(Nothing changes in this case, but this is the documentation in the code.)
The iterator represents a contiguous range of elements.
`ptr` is a pointer to the start of the range.
`end` is a pointer to one past the end of the range.
Initial values are ptr = slice.as_ptr() and end = ptr.offset(slice.len())
The iterator is empty when ptr == end.
The iterator is stepped at the front by incrementing ptr, the
yielded element is the old value of ptr ("post increment" semantic).
The iterator is stepped at the back by decrementing end, the
yielded element is the new value of end ("pre decrement" semantic).
** When T is a Zero-sized type (ZST)
`ptr` is a pointer to the start of the range.
`end` is used as an integer: the count of elements in the range.
Initial values are ptr = slice.as_ptr() and end = self.len() as *const T
The iterator is empty when end == 0
The iterator is stepped at the front by decrementing end.
The iterator is stepped at the back by decrementing end.
For both front and back, the yielded element is `ptr`, which does not change
during the iteration.
** NonZero
`ptr` is derived from `slice.as_ptr()` and just like it is is non-null.
`end` is allowed to be null (used by the ZST implementation); this is
required since for iterating a [(); usize::MAX] all values of usize (and
equivalently the raw pointer) are used, including zero.
    7aa749b    to
    48c2497      
    Compare
  
    | (Rebased & updated with some cleanups, no major changes). I've officially given up trying to fix the regression in #46223 (comment) I prototyped this out of tree, which was a bit quicker, and that "regression" does not care about assumes or other changes I thought were natural. The change to  That's the macro that changes from: macro_rules! make_ref {
    ($ptr:expr) => {{
        let ptr = $ptr;
        if size_from_ptr(ptr) == 0 {
            // Use a non-null pointer value
            &*(1 as *mut _)
        } else {
            &*ptr
        }
    }};
}to this: macro_rules! make_ref {
    ($ptr:expr) => { &*$ptr };
}Even funnier, the macro can even be this (which is nonsense but correct in the new impl) and the vectorization does not regress: macro_rules! make_ref {
    ($ptr:expr) => {{
        let ptr = $ptr;
        if false && size_from_ptr(ptr) == 0 {
            // Use a non-null pointer value
            &*(1 as *mut _)
        } else {
            &*ptr
        }
    }};
}but we must refuse to insert nonsense of that degree; then I think there's no principle but some strange accident that is causing the original #46223 (comment) code to compile well. | 
| @bluss Oh, have you tried having a  | 
| @eddyb oh I think so, I tried a lot of things. I have resolved one bounds check regression in the nom testsuite (related to the unrolling / try_fold code ping @arthurprs ; unregressing it involved just tweaking how the ptrdistance was computed in that loop. This WIP PR is not entirely up to date since I'm still trying to fix regressions locally.) | 
| You mean this #45501? That's an awesome side effect 😃 | 
| Oh just maybe, it was actually just regress (due to this PR) and unregress (due to WIP on this PR). Just thinking of it since it's the same issue. | 
| Hi @bluss, is there anything supposed to happen for this PR to proceed? | 
| @kennytm I've been trying to work on a no-performance-regressions version with no success. I'll have to come back to this with a different goal, probably demonstrating some code savings in some places and code gen regressions in other places. I was using nom benchmarks for the no-performance-regressions goal. cc @Geal | 
| Visiting for triage: just to be clear, it sounds like you're going to continue work in this PR rather than create a new one when ready? | 
| Yes. I just haven't had time, sorry. Someone else can also pick it up. | 
Reimplement the slice iterators using
NonZerowhere possible.a. The usual enum layout optimizations apply
b. Automatic and more principled nonnull metadata for that pointer
The produced pointer value is equivalent to
slice.as_ptr()andthus
&slice[i]for each indexiin this case.Implementation Notes
Regular case, T is not zero-sized
(Nothing changes in this case, but this is the documentation in the code.)
The iterator represents a contiguous range of elements.
ptris a pointer to the start of the range.endis a pointer to one past the end of the range.Initial values are ptr = slice.as_ptr() and end = ptr.offset(slice.len())
The iterator is empty when ptr == end.
The iterator is stepped at the front by incrementing ptr, the
yielded element is the old value of ptr ("post increment" semantic).
The iterator is stepped at the back by decrementing end, the
yielded element is the new value of end ("pre decrement" semantic).
When T is a Zero-sized type (ZST)
ptris a pointer to the start of the range.endis used as an integer: the count of elements in the range.Initial values are ptr = slice.as_ptr() and end = self.len() as *const T
The iterator is empty when end == 0
The iterator is stepped at the front by decrementing end.
The iterator is stepped at the back by decrementing end.
For both front and back, the yielded element is
ptr, which does not changeduring the iteration.
NonZero
ptris derived fromslice.as_ptr()and just like it is is non-null.endis allowed to be null (used by the ZST implementation); this isrequired since for iterating a [(); usize::MAX] all values of usize (and
equivalently the raw pointer) are used, including zero.