Skip to content

Conversation

@CodeSandwich
Copy link

This is PR for adding Iterator::extend and Iterator::extend_mut as discussed in RFC issue 2339. It was advised to bypass RFC and just implement it.

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aidanhs (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 28, 2018
@scottmcm
Copy link
Member

See also #45840, where I suggested collect_into. extend feels a bit odd since the rest of these helpers have different names. For example, I can say from_iterator and collect and you know what I'm talking about, but this would make extend more ambiguous.

@CodeSandwich
Copy link
Author

extend was suggested by @Centril and I'm really OK with this name:

  • it keeps parity with Extend::extend
  • it's explicit, that the passed collection is extended
  • it's easy to read: take the iterator and EXTEND some collection with it

@Centril Centril added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-accepted Category: A feature request that has been accepted pending implementation. labels Feb 28, 2018
@Centril
Copy link
Contributor

Centril commented Feb 28, 2018

I'm obviously in favor of extend + extend_mut since I suggested it... In addition to @CodeSandwich's excellent arguments, I'd also like to point out that collect_into is longer. This is not directly a problem, but collect_into_mut is really taking it too far in terms of verbosity. I believe extend and extend_mut are short, unambiguous and 🍬.

I'm inclined to change my mind given technical arguments for why Iterator::extend might collide with Extend::extend, but it seems to me highly unlikely that a collision should occur.

@Centril Centril added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-feature-accepted Category: A feature request that has been accepted pending implementation. labels Feb 28, 2018
Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with including these methods... here are some improvement ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a lot of indentation here - run rustfmt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran rustfmt 0.3.8-nightly (346238f 2018-02-04) on libcore, but the results were rather catastrophic. It threw few dozens of errors, even more warnings and modified 950 files. I think I'll just manually format it to fit the convention of other methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeSandwich You could copy only this function into the playground and format it, so that you don't need to care about the rest of libcore.

mar-01-2018 05-25-11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the between passed and collection here and on extend_mut as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing seems to indicate that extended will gain self.count() number of new elements while it may gain zero new elements in the case of sets. I'd rephrase add to include.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be prudent to mention that .by_ref() exists if people are using .take(..) and such things. This also applies to extend_mut.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should mention this in the documentation of Extend with a note that the methods Iterator::extend(_mut) are preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird indentation... rustfmt =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example.. perhaps include a bit longer example as well which is more real-worldy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword as: The mutable reference to the collection is then returned, [..]

@kennytm
Copy link
Member

kennytm commented Feb 28, 2018

I don't like the name extend here since it does the opposite thing as Vec::extend. a_vec.extend(b_iter) produces a_vec + b_iter, while a_iter.extend(b_vec) or a_iter.extend_mut(&mut b_vec) produces b_vec + a_iter.

@Centril
Copy link
Contributor

Centril commented Feb 28, 2018

@kennytm Oh no... non-commutativity strikes back! Still, I think the ordering is legible from the fact that one is an iterator and one is a collection.. But - do you have any suggestion perhaps.. we might be able to find a name better than both extend and collect_into..?

@hanna-kruppe
Copy link
Contributor

Still, I think the ordering is legible from the fact that one is an iterator and one is a collection

That's not true, Extend::extend accepts IntoIterator, so for example v1.extend(v2) and v2.extend(v1) are both possible (and useful) given v1, v2: Vec<_>.

@Centril
Copy link
Contributor

Centril commented Feb 28, 2018

@rkruppe 😢

But once you start applying methods on the iterators you get from one of the vecs, then I think it becomes legible.

I admit extend ain't perfect, but I believe it is better than collect_into... Perhaps there's a better name?

@scottmcm
Copy link
Member

Good point, @rkruppe! I wonder if this helper should thus be on IntoIterator instead, since the blanket impl would still have this work on all iterators and it's also in the prelude so would still be accessible without extra use...

For extend_mut, has anyone tried an impl<'a,E:Extend<A>,A> Extend<A> for &'a mut E? If crater said that's ok (if coherence doesn't mean it's definitely ok?), then there wouldn't need to be two methods; .collect_into(Vec::with_capacity(4)) and .collect_into(&mut v) would just work.

@Centril
Copy link
Contributor

Centril commented Feb 28, 2018

For extend_mut, has anyone tried an impl<'a,E:Extend<A>,A> Extend<A> for &'a mut E?

Interesting idea, hadn't thought of that =)

On naming.. how about iter.add_to(collection) ?

@CodeSandwich
Copy link
Author

@scottmcm Awesome idea, I've added this impl and it works like a charm!

@aidanhs
Copy link
Contributor

aidanhs commented Mar 3, 2018

Arbitrary libs team reassignment - r? @Kimundi

@rust-highfive rust-highfive assigned Kimundi and unassigned aidanhs Mar 3, 2018
@CodeSandwich
Copy link
Author

CodeSandwich commented Mar 4, 2018

@Centril I've renamed extend to collect_into, added impl Extend for &mut Extend, fixed tests, added tests for the new impl, fixed formatting and fixed docs according to your comments. I've amended the changes with force, because it was more of rewrite than fix.

The only thing I didn't do is mentioning by_ref in docs. I don't understand, what's its connection with collect_into and why it's important.

@Centril
Copy link
Contributor

Centril commented Mar 4, 2018

The only thing I didn't do is mentioning by_ref in docs. I don't understand, what's its connection with collect_into and why it's important.

by_ref allows you to do things like:

iter.by_ref().take(5).collect_into(first_5);
iter.collect_into(the_rest);

It is important because it allows the user to consume the elements of an iterator partially while still retaining ownership of the iterator so that it can be reused. I think the idiom can be useful together with .collect_into(..) when dealing with collections that have a fixed size.

@CodeSandwich
Copy link
Author

I'm not trying to be pushy, we have plenty of time and I can't demand anything from volunteers in a FOSS project, but what are next steps of a PR? I think, we've considered all the comments, improved the design as much as possible and the code is finished.

@scottmcm
Copy link
Member

scottmcm commented Mar 5, 2018

There's a reviewer assigned; they'll look at it at some point (or triage will bug them about it). Note that it was a weekend since things stabilized, and many team members don't do rust every day.

Also, the impl addition is insta-stable, so this may need a full-libs-team FCP.

nit: Consider updating the title of the PR, since the approach has changed from extend_mut.

@alexcrichton alexcrichton added S-waiting-on-team DEPRECATED: Use the team-based variants `S-waiting-on-t-lang`, `S-waiting-on-t-compiler`, ... and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 20, 2018
@alexcrichton
Copy link
Member

Ok thanks @aidanhs and for the analysis @scottmcm! I think that's a good sign in that we may be able to move forward with this but crater is by no means exhaustive

Due to the breaking nature of this PR I'm going to tag this as S-waiting-on-team and ask for input from @rust-lang/libs. Libs, how do y'all feel about the APIs proposed here and/or the possible breakage?

@SimonSapin
Copy link
Contributor

SimonSapin commented Mar 20, 2018

For other like me just joining this thread now, the possible breakage being discussed is the new impl<E: Extend> Extend for &mut E colliding with a potential impl Extend for &mut SomeConcreteType outside of std.

Do I read correctly that crater found no such impl collision?

@alexcrichton
Copy link
Member

@SimonSapin correct!

@SimonSapin
Copy link
Contributor

One one hand it seems unlikely that impl Extend for &mut SomeConcreteType would exist, it would only be useful for very specific generic code like collect_into. On the other hand, if some code somewhere does have such an impl it’s unfortunate that there would be no way (or is there?) to rewrite that code such that it works both before and after the addition of impl<E: Extend> Extend for &mut E in std.

That’s one of the principles making breakage acceptable when it’s in type inference: it can be fixed by adding type annotation that are also valid (if unnecessary) before the change.

@Centril
Copy link
Contributor

Centril commented Mar 20, 2018

I'd like to voice my support for impl<E: Extend> Extend for &mut E being added because of the very neat gains in ergonomics that it offers 👍

@Kimundi
Copy link
Contributor

Kimundi commented Mar 21, 2018

I agree with @SimonSapin that we should be careful here, because its definitely a breaking change in the actual sense. But provided we don't get any actual issues with it, it seems a clear improvement.

But one thing got me thinking. The pattern here is one used in other places in std, most notably the Iterator impl for &mut I where I: Iterator. Iterators expose that impl with a legacy by_ref() method, that used to return a extra adapter type before being changed to just return `&mut Self``.

If we provided such a adapter for Extend, we could replace foo.collect_into(&mut bar) with foo.collect_into(bar.by_ref()) without a breaking change. Of course, the fact that that is less ergonomic, and adds a new method to any collection that implements Extend, might make this still unpracticable.

@alexcrichton
Copy link
Member

This was discussed during libs triage today and the conclusion was that we're not willing to add the Extend blanket impl here for mutable references. We concluded that by all measures of our breaking changes policies it's not allowed.

There are alternate possible strategies though with different traits, multiple methods, newtype wrappers, etc. @CodeSandwich would you like to pursue any of those instead?

@alexcrichton
Copy link
Member

We also noted that we probably don't want to go too hog wild with new abstractions as they often incur significant cost and this is a relatively small convenience method which may not justify adding, for example, new traits to the standard library

@CodeSandwich
Copy link
Author

Thank you for looking into this proposition so deeply 👍 The resolution is reasonable, I'll be happy to explore less intrusive solutions

@shepmaster
Copy link
Member

As best I understand, we do not wish to follow this path. I'm going to close the PR. Please reopen if I've misunderstood.

@shepmaster shepmaster closed this Apr 7, 2018
@LukasKalbertodt
Copy link
Contributor

(summary at the end)

Hi everyone!

I dug through several issues and landed here. I'd really like to see Iterator::collect_into land! (by the way, regarding the name discussion, collect_into() is the first name I've come up with when searching for this functionality.)

As far as I understand it, this PR has been closed because the blanket impl<E: Extend> Extend for &mut E is a breaking change. The blanket impl was added, because that way we avoided having two methods (extend and extend_mut). My question: why was the extend_mut method proposed anyway? Sadly, the first commits don't exist anymore, so I couldn't read through those.

As far as see it, the following method should be suitable for pretty much all use cases:

fn collect_into<E: Extend>(self, collection: &mut E)

In the original RFC is a hint why the complexity with BorrowMut/the two methods was introduced:

It also returns the extended collection making chaining elegant.

I don't find this very convincing. In all cases I wanted something like collect_into I already had a collection and just wanted to put all elements of the iterator in the collection. I never wanted to use chaining in that case. The two examples in the RFC that show this don't really convince me either. I'm aware that the examples are minimal, contrived example, but I don't think there are many situation in real code where chaining is useful. And if it is useful, it's easy and non-verbose to write it without chaining. And for what it's worth, Extend::extend also doesn't return &mut Self.


In summary:

  • I want this method
  • As this is a convenience function, I don't think it has to be general enough to suit all use-cases (being useful for most use cases is enough)
  • Therefore I think fn collect_into<E: Extend>(self, collection: &mut E) should be added. It's simple and useful in many cases.

@CodeSandwich Would you be willing to work on a second attempt? If you don't want to or don't have time, I can create another PR (only if people agree with me, of course). In that case: may I reuse parts of your code in this PR?

@CodeSandwich
Copy link
Author

Hi @LukasKalbertodt

It's nice to see, that you support the idea behind this RFC :)

The separation into extend and extend_mut was invented, like you said, to enable working on collections passed by both value and references without the blanket impl. I think that both use cases are important, but you are right, that the reference one covers all use cases, but not always in the convenient way. For example the use case of filling of a vector with predefined size would not benefit from this solution at all. This RFC is not any game changer, but a little papercut fix for convenience and convenience only. I've decided to not force a half-baked solution, but to wait for blanket impls to make it really convenient. Std lib and bad APIs in it are somewhat set in stone and this feature is not really that anticipated.

If you want to reuse my code, go ahead :) I'm not sure, but the license of the whole Rust forces me to make it fully reusable by anyone anyway.

@scottmcm
Copy link
Member

@LukasKalbertodt I think the non-&mut version is particularly nice for things where size_hints are wrong -- like after filters -- and you want .collect(Vec::with_capacity(n)). But I agree that's a stretch case, where careful allocation tweaks are rare enough to be fine with a separate variable.

@LukasKalbertodt
Copy link
Contributor

Mhhh I see. So the "ideal" case would be this:

let vec = (1..100)
    .filter(|i| i % 2 == 0)
    .more_stuff(...)
    .collect_into(Vec::with_capacity(50));

With only the &mut E version I'm proposing it would look like this:

let mut vec = Vec::with_capacity(50);
(1..100)
    .filter(|i| i % 2 == 0)
    .more_stuff(...)
    .collect_into(&mut vec);

Without collect_into it looks like this (or store the iterator chain in a variable and extend(iter) then):

let mut vec = Vec::with_capacity(50);
vec.extend(
    (1..100)
        .filter(|i| i % 2 == 0)
        .more_stuff(...)
);

I agree that the first version looks better than the second one. But I still think the second looks better than the last (so I wouldn't say "doesn't benefit at all").

As @scottmcm already said, this might be rare. I think this really only applies to when we want to "configure" an empty collection. Putting a pre-filled collection by value into collect_into() is probably not useful (with the exception of vec![], filling a collection usually doesn't fit into half a line). So it's about empty collections that can be "configured" somehow. E.g. Vec::with_capacity and HashMap::with_hasher. It would indeed be nice in these situations...

Sadly, I really can't tell how many use cases for collect_into() would be in that category and how many would be in the category where &mut E would be sufficient.

I've decided to not force a half-baked solution, but to wait for blanket impls to make it really convenient.

Is there a clear path how to wait for these blanket impls? I don't think Rust will relax its coherence rules in the near future. And even when specialization finally lands, I don't think the blanket impl that was proposed here would be a lot better. It still breaks code that impl Extend for &mut Foo where Foo doesn't implement Extend (specializing impls must be strictly more special).

So I don't think our little problem is solved by any new language feature that will land in the near future... (that's why I started the discussion here :P)


I'm not sure, but the license of the whole Rust forces me to make it fully reusable by anyone anyway.

I'm just not sure if that license is already valid when your code is not merged yet. 🤷‍♂️

@camsteffen
Copy link
Contributor

camsteffen commented Jun 1, 2020

I would rather have separate methods for accepting a reference or a value. Having one method for both seems over-generalized and ambiguous.

Suggestion:

fn add_to<E: Extend>(self, collection: &mut E);
fn collect_into<E: Extend>(self, collection: E) -> E;

Even though these methods basically do the same thing, the semantics are quite distinct IMO.

  • add_to - "I already have a collection and it may already have some elements. Add to it."
  • collect_into - "Create it. Populate it. Return it."

Notice that the meaning of "collect" stays consistent with the existing collect method. Of course you could do something against this mental model like collect_into(vec![...]), but that should be uncommon if not bad practice. My understanding is that collect_into will usually be used to directly pass a new collection like collect_into(Vec::with_capacity(..)).

Notice the reference variant does not need a return type but the value variant could have #[must_use].

I think it is hard to overstate the readability gains afforded by this improvement. It allows more cases where you can create and consume an Iterator in the same call chain. It doesn't feel right to have to save an Iterator to a variable just to use it with extend on the next line.

@cormacrelf
Copy link
Contributor

cormacrelf commented May 17, 2021

I agree with @camsteffen that multiple methods is the best solution. I would bikeshed the names to collect_into(&mut E)/collect_with(E)->E as argued below, but I generally agree with the need for both methods and the reasons given. I also take @LukasKalbertodt's point that there is no other improvement in ergonomics coming that we need to wait for, so I'm happy to PR this idea anew. I think there is interest enough here and elsewhere that it is still worth pursuing.

While I have much love for the turbofish, collect_with(Vec::new())is dramatically easier to type than the collect::<Vec<_>>(). A big reason is that text editors cannot complete closing angle braces in Rust because of the operator < overload. If such a method existed I doubt I would use the turbofish version or the explicit let v: Vec<_> = annotation ever again. Many people have wanted a sugar for this (eg the chorus here), but the proposal so far of collect_vec() is unnecessarily limiting and doesn't fix it for any of the other collections which are just as painful. collect_vec has been in itertools for a long time now, but nobody should add that dep just to get a tiny bit of syntax sugar, nor really use it at all lest it be the only remaining use of Itertools in a crate. So it has not found a good home there/outside std. I think this is a good enough reason to include an owned collect_with API.

Aforementioned bikeshedding:

I propose, noting the existing collect API:

// fn collect<B>(self) -> B
// where
//     B: FromIterator<Self::Item>;

fn collect_into<E>(self, collection: &mut E)
where
    E: Extend<Self::Item>;

fn collect_with<E>(self, collection: E) -> E
where
    E: Extend<Self::Item>;
  • I agree with @LukasKalbertodt that collect_into is the natural and most google-friendly name for this feature, and it is how I remember how to find this discussion. I believe it should be one of the methods, just the mutable one instead of the owned one.
  • Most importantly, the two methods should have the same prefix for IDE discoverability and for colocation in documentation. They are 3 options for the same functionality. Both new methods are more flexible and general than the original, there is no need to name the mut version completely differently only because it offers slightly more flexibility again. Each can be considered and will be used as a replacement for collect(), especially when optimising code to reduce allocations in combination with Vec::clear(). So I believe both new methods should start with collect_.
  • IMO collect_into is better suited to meaning "into this existing collection" / the &mut E case if it has to be one of the two. If it weren't the mutable case, what could you name the mutable case? There is no good name for it other than collect_into.
  • So what do we name the owned E method? I think collect_with:
    • There is a convention for appending _with to make more general versions of a function. Usually the extra parameter is a closure to produce a value instead of passing a value: iter::once[_with], slice::fill[_with], Ordering::then[_with] etc.
    • What most of the closure-taking _with variants have in common is that they permit overriding Default and/or Clone to furnish your own value. That's exactly what this one is doing compared to plain collect, except nobody needs it to be a closure and it's better that it isn't one. (Think: collect_with(Vec::new) is not worth having to type collect_with(|| Vec::with_capacity(n))).
  • The distinction between collect_into and collect_with is about as intuitive/teachable/memorable as it can really be: into this bucket = mut, with this bucket = pass ownership.
  • This is actually better for discoverability than the solution where &mut E implements Extend. In that case, nobody can tell from the signature that the impl is available, so I'd imagine it takes whatever kind of argument I first saw it used with. Whereas with two methods, your IDE tells you your options.
    • I've personally experienced this learning speed bump with the io::Write trait impl on &mut W, not knowing I could pass a mutable reference basically anywhere a writer was accepted.

Alternative design for comparison

As @alexcrichton mentioned, you could use a newtype wrapper. There is a kind of precedent in the io::Cursor<T> API. It implements Write for both T=Vec<u8> and T=&mut Vec<u8>. You can imagine a Cursor for Extend implementors.

Unfortunately it is bad at serving the original purpose of being syntax sugar. Wrapping in a cursor is about as annoying as let-binding the iterator. Imagine a wrapper type core::iter::Sink<T, E> with Extend impls for E: Extend<T> and &mut E where E: Extend<T>, with only one new method on Iterator.

// then (fine)
let mut vec = (0..100)
    .map(|x| x + 5)
    .collect_bikeshed(Vec::new());

// but then (ew)
use core::iter::Sink;
let sink = Sink::new(&mut vec);
other_numbers
    .into_iter()
    .collect_bikeshed(sink);

// uh oh (borrow checker complains)
vec.push(5);
more_numbers.collect_bikeshed(sink);

// slightly better but still annoying
more.into_iter()
    .collect_bikeshed(Sink::new(&mut vec));

Basically you can see how there would be a need for a method that does collect_bikeshed(Sink::new(buf)) specially for &mut E, because you can't feasibly keep sinks hanging around with their mutable borrows over your collection so you need to keep recreating the Sink, and because the Sink import and usage would otherwise be too annoying for a very common thing. So you're back at the two methods solution.

@inquisitivecrystal
Copy link
Contributor

inquisitivecrystal commented Jul 20, 2021

I'm going to have a go of implementing collect_into and collect_with (and testing and documenting them, which is the substantive part).

@frengor
Copy link
Contributor

frengor commented Jan 15, 2022

I'd really like having collect_into and collect_with methods. Any update on their implementation?

@inquisitivecrystal
Copy link
Contributor

I'd really like having collect_into and collect_with methods. Any update on their implementation?

I gave up on it a while ago, as you've probably guessed on how long it's been since my latest reply. If someone else wants to give it a go, they're welcome to! Someone should probably open an issue to track it though.

@frengor
Copy link
Contributor

frengor commented Jan 15, 2022

I can try to implement it. I'm new to contributing to Rust, should I open a PR and an issue (mentioning the PR in the issue maybe)?

Edit: I've just seen there's a issue template which gives enough information about what a tracking issue is and how it works. I think I'm just going to open a PR then.

@leonardo-m
Copy link

leonardo-m commented Jan 19, 2022

Another that I use often enough is "collect_into_slice" (modified from: https://github.com/kchmck/collect_into_slice ):

trait CollectSlice<'a, A>: Iterator<Item=A> {
    fn collect_into_slice(&mut self, slice: &'a mut [A]) -> &'a [A] {
    fn collect_into_slice_mut(&mut self, slice: &'a mut [A]) -> &'a mut [A] {
}

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Mar 9, 2022
Add Iterator::collect_into

This PR adds `Iterator::collect_into` as proposed by `@cormacrelf` in rust-lang#48597 (see rust-lang#48597 (comment)).
Followup of rust-lang#92982.

This adds the following method to the Iterator trait:

```rust
fn collect_into<E: Extend<Self::Item>>(self, collection: &mut E) -> &mut E
```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 9, 2022
Add Iterator::collect_into

This PR adds `Iterator::collect_into` as proposed by ``@cormacrelf`` in rust-lang#48597 (see rust-lang#48597 (comment)).
Followup of rust-lang#92982.

This adds the following method to the Iterator trait:

```rust
fn collect_into<E: Extend<Self::Item>>(self, collection: &mut E) -> &mut E
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-enhancement Category: An issue proposing an enhancement or a PR with one. S-waiting-on-team DEPRECATED: Use the team-based variants `S-waiting-on-t-lang`, `S-waiting-on-t-compiler`, ... T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.