Skip to content

Conversation

SimonSapin
Copy link
Contributor

No description provided.

@SimonSapin SimonSapin added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-iterators Area: Iterators labels Jun 8, 2018
@rust-highfive
Copy link
Contributor

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 8, 2018
@SimonSapin
Copy link
Contributor Author

@scottmcm This PR is from your repo. I’m not good at reading assembly but your comment at #27741 (comment) suggests that this is an improvement. Should we land it?

The diff looks good to me.

@kennytm
Copy link
Member

kennytm commented Jun 8, 2018

I've done a simple benchmark and the test code from the comment is improved from 800µs/iter to 600µs/iter.

running 2 tests
test pr_bench  ... bench:     592,516 ns/iter (+/- 42,563)
test std_bench ... bench:     809,164 ns/iter (+/- 54,219)

Source code:

#![feature(try_trait, test)]

extern crate test;

use std::ops::Try;

use test::{Bencher, black_box};

#[no_mangle]
pub fn std_compute(a: u64, b: u64) -> u64 {
    StepByWithoutTryFold(StepBy {
        iter: a..b,
        step: 6,
        first_take: true,
    }).map(|x| x ^ (x - 1)).sum()
}

#[no_mangle]
pub fn pr_compute(a: u64, b: u64) -> u64 {
    StepBy {
        iter: a..b,
        step: 6,
        first_take: true,
    }.map(|x| x ^ (x - 1)).sum()
}

#[bench]
fn std_bench(bencher: &mut Bencher) {
    let a = black_box(1);
    let b = black_box(5000000);
    bencher.iter(|| {
        black_box(std_compute(a, b));
    });
}

#[bench]
fn pr_bench(bencher: &mut Bencher) {
    let a = black_box(1);
    let b = black_box(5000000);
    bencher.iter(|| {
        black_box(pr_compute(a, b));
    });
}

struct StepBy<I> {
    iter: I,
    step: usize,
    first_take: bool,
}

struct StepByWithoutTryFold<I>(StepBy<I>);

impl<I: Iterator> Iterator for StepBy<I> {
    type Item = I::Item;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        if self.first_take {
            self.first_take = false;
            self.iter.next()
        } else {
            self.iter.nth(self.step)
        }
    }

    #[inline]
    fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where
        Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B>
    {
        let mut accum = init;

        if self.first_take {
            self.first_take = false;
            if let Some(x) = self.iter.next() {
                accum = f(accum, x)?;
            } else {
                return Try::from_ok(accum);
            }
        }

        while let Some(x) = self.iter.nth(self.step) {
            accum = f(accum, x)?;
        }
        Try::from_ok(accum)
    }
}

impl<I: Iterator> Iterator for StepByWithoutTryFold<I> {
    type Item = I::Item;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        if self.0.first_take {
            self.0.first_take = false;
            self.0.iter.next()
        } else {
            self.0.iter.nth(self.0.step)
        }
    }
}

@scottmcm
Copy link
Member

scottmcm commented Jun 9, 2018

Hmm, I'd forgotten I'd started this 😅 I see two options:

  1. Just merge this, since it helps -- thanks for benching, @kennytm! (Well, with some tests, especially for the short-circuiting logic, since that's the thing I got wrong the most doing the other try_folds.)

  2. Reimplement StepBy so that the bool flag isn't needed, and thus the manual try_fold is likely also unneeded (since there's nothing to manually hoist). The flag is there so it can use nth, but if there was a method that called next n times, returning the first result, then the flag wouldn't be needed, and it would actually be easier for Range to implement. Sketch of what I mean: https://play.rust-lang.org/?gist=f2c028e6e42e9ce5f3a311b71dcfc3a9&version=nightly&mode=release

Thoughts?

@kennytm
Copy link
Member

kennytm commented Jun 12, 2018

I think we could do the refactoring later.

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 12, 2018

📌 Commit 2d55d28 has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 12, 2018
@bors
Copy link
Collaborator

bors commented Jun 13, 2018

🔒 Merge conflict

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 13, 2018
@Emerentius
Copy link
Contributor

I've rerun @kennytm's code but replaced the std code by the specialization from my PR referencing this issue just above this comment. With that, this PR would lower performance.

test pr_bench             ... bench:     647,483 ns/iter (+/- 12,094)
test specialized_pr_bench ... bench:     413,365 ns/iter (+/- 13,795)
#![feature(try_trait, test, step_trait)]
use std::iter::Step;

extern crate test;

use std::ops::Try;

use test::{Bencher, black_box};

#[no_mangle]
pub fn std_compute(a: u64, b: u64) -> u64 {
    StepByWithoutTryFold(StepBy {
        iter: a..b,
        step: 6,
        first_take: true,
    }).map(|x| x ^ (x - 1)).sum()
}

#[no_mangle]
pub fn pr_compute(a: u64, b: u64) -> u64 {
    StepBy {
        iter: a..b,
        step: 6,
        first_take: true,
    }.map(|x| x ^ (x - 1)).sum()
}

#[bench]
fn specialized_pr_bench(bencher: &mut Bencher) {
    let a = black_box(1);
    let b = black_box(5000000);
    bencher.iter(|| {
        black_box(std_compute(a, b));
    });
}

#[bench]
fn pr_bench(bencher: &mut Bencher) {
    let a = black_box(1);
    let b = black_box(5000000);
    bencher.iter(|| {
        black_box(pr_compute(a, b));
    });
}

struct StepBy<I> {
    iter: I,
    step: usize,
    first_take: bool,
}

struct StepByWithoutTryFold<I>(StepBy<I>);

impl<I: Iterator> Iterator for StepBy<I> {
    type Item = I::Item;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        if self.first_take {
            self.first_take = false;
            self.iter.next()
        } else {
            self.iter.nth(self.step)
        }
    }

    #[inline]
    fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where
        Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B>
    {
        let mut accum = init;

        if self.first_take {
            self.first_take = false;
            if let Some(x) = self.iter.next() {
                accum = f(accum, x)?;
            } else {
                return Try::from_ok(accum);
            }
        }

        while let Some(x) = self.iter.nth(self.step) {
            accum = f(accum, x)?;
        }
        Try::from_ok(accum)
    }
}

impl<T> Iterator for StepByWithoutTryFold<std::ops::Range<T>>
where
    T: Step,
{
    type Item = T;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.0.first_take = false;
        if self.0.iter.start >= self.0.iter.end {
            return None;
        }
        // add 1 to self.step to get original step size back
        // it was decremented for the general case on construction
        if let Some(n) = self.0.iter.start.add_usize(self.0.step+1) {
            let next = std::mem::replace(&mut self.0.iter.start, n);
            Some(next)
        } else {
            let last = self.0.iter.start.clone();
            self.0.iter.start = self.0.iter.end.clone();
            Some(last)
        }
    }
}

@SimonSapin
Copy link
Contributor Author

Thanks for looking into this @Emerentius. Closing in favor of #51601.

@SimonSapin SimonSapin closed this Jun 18, 2018
bors added a commit that referenced this pull request Jun 21, 2018
Specialize StepBy<Range(Inclusive)>

Part of #51557, related to #43064, #31155

As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to
1. the special casing of the first `StepBy::next()` call
2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()`

This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm.
Unsigned integers optimize better than signed ones for some reason.

See the following two links for a comparison.

[godbolt: specialization for ..](https://godbolt.org/g/haHLJr)
[godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6)

`RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness".

The approach can not be used in general, because it would produce side effects of the underlying iterator too early.

May obsolete #51435, haven't checked.
@timvermeulen
Copy link
Contributor

timvermeulen commented May 7, 2019

Since #51601 was reverted, should this be considered again?

Centril added a commit to Centril/rust that referenced this pull request Sep 9, 2019
… r=scottmcm

Override `StepBy::{try_fold, try_rfold}`

Previous PR: rust-lang#51435

The previous PR was closed in favor of rust-lang#51601, which was later reverted. I don't think these implementations will make it harder to specialize `StepBy<Range<_>>` later, so we should be able to land this without any consequences.

This should fix rust-lang#57517 – in my benchmarks `iter` and `iter.step_by(1)` now perform equally well, provided internal iteration is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants