Skip to content

Conversation

the8472
Copy link
Member

@the8472 the8472 commented Jul 24, 2021

With #87168 flattening array::IntoIters is now TrustedLen, the FromIterator implementation for Vec has a specialization for TrustedLen iterators which uses internal iteration. This implements one of the main internal iteration methods on array::Into to optimize the combination of those two features.

This should address the main issue in #87411

# old
test vec::bench_flat_map_collect                         ... bench:   2,244,024 ns/iter (+/- 18,903)

# new
test vec::bench_flat_map_collect                         ... bench:     172,863 ns/iter (+/- 2,141)

@rust-highfive
Copy link
Contributor

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 24, 2021
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@the8472 the8472 added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 24, 2021
```
# old
test vec::bench_flat_map_collect                         ... bench:   2,244,024 ns/iter (+/- 18,903)

# new
test vec::bench_flat_map_collect                         ... bench:     172,863 ns/iter (+/- 2,141)
```
@kennytm
Copy link
Member

kennytm commented Jul 24, 2021

while this LGTM, shouldn't the original issue be addressed by implementing SpecExtend<T, std::array::IntoIter<T>> for Vec<T, A>?

@the8472
Copy link
Member Author

the8472 commented Jul 24, 2021

The original issue involved Flatten which results in several adapters sitting between SpecExtend and the IntoIter.

Comment on lines +132 to +139
(&mut self.alive)
.try_fold::<_, _, Result<_, !>>(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }))
})
.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call fold here instead of try_fold?

Suggested change
(&mut self.alive)
.try_fold::<_, _, Result<_, !>>(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }))
})
.unwrap()
self.alive.fold(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() })
})

Copy link
Member Author

@the8472 the8472 Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, array::IntoIter has a Drop impl, so alive can't be move out, but that would be required to call fold(self), that's why I used try_fold instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472 oops right.

(&mut self.alive).fold(init, ...) should work though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but that would go through impl Iterator for &mut I which is less optimized.

Copy link
Member

@kennytm kennytm Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.alive is a std::ops::Range<usize> and AFAIK there is no special-cased implementation of fold or try_fold for Range<usize> nor &mut Range<usize>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we do the mem::take it will just turn self.alive to 0..0 and then leaks everything which is safe 🙃 (compared with self.alive.clone().fold(...) which will cause double-free).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now tried (&mut self.alive).fold instead of try_fold, it undoes all perfomance gains. I guess somehow the indirection through &mut inhibits optimizations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is due to this not being #[inline]

impl<I: Iterator + ?Sized> Iterator for &mut I {
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
(**self).next()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh.

wdyt should we just add the #[inline] or leave a FIXME comment explaining the performance regression if we use fold instead of try_fold? either way is fine for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a FIXME, changing inlining on such central methods can have mixed impact on compile time even if runtime performance is better, so that should be done on a separate PR.

@kennytm
Copy link
Member

kennytm commented Jul 27, 2021

@bors r+ rollup=iffy

@bors
Copy link
Collaborator

bors commented Jul 27, 2021

📌 Commit 2276c5e has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 27, 2021
@bors
Copy link
Collaborator

bors commented Jul 27, 2021

⌛ Testing commit 2276c5e with merge 99d6692...

@bors
Copy link
Collaborator

bors commented Jul 27, 2021

☀️ Test successful - checks-actions
Approved by: kennytm
Pushing 99d6692 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 27, 2021
@bors bors merged commit 99d6692 into rust-lang:master Jul 27, 2021
@rustbot rustbot added this to the 1.56.0 milestone Jul 27, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 12, 2021
inline next() on &mut Iterator impl

In [rust-lang#87431](https://github.com/rust-lang/rust/pull/87431/files#diff-79a6b417b85ecf4f1a4ef2235135fedf540199caf6e9e1d154ac6a413b40a757R132-R136)   I found that `(&mut range).fold` doesn't optimize well because the default impl for for `fold` on `&mut Iterator` doesn't inline `next`. In that particular case it was worked around by using `try_fold` which takes a `&mut self` instead of `self`.

Let's see if this can be fixed more broadly.
@lcnr lcnr added the A-const-generics Area: const generics (parameters and arguments) label Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-generics Area: const generics (parameters and arguments) merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants