From d1faa2f6546659b50eaf3c9efa4113b81a2825eb Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Fri, 19 Sep 2014 15:35:34 +1200 Subject: [PATCH 1/4] Change `&` to be a borrow operator. Change the address-of operator (`&`) to a borrow operator. This is an alternative to #241 and #226 (cross-borrowing coercions). The borrow operator would perform as many dereferences as possible and then take the address of the result. The result of `&expr` would always have type `&T` where `T` does not implement `Deref`. --- 0000-borrow.md | 159 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 0000-borrow.md diff --git a/0000-borrow.md b/0000-borrow.md new file mode 100644 index 00000000000..4bb761f698b --- /dev/null +++ b/0000-borrow.md @@ -0,0 +1,159 @@ +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + + +# Summary + +Change the address-of operator (`&`) to a borrow operator. This is an +alternative to #241 and #226 (cross-borrowing coercions). The borrow operator +would perform as many dereferences as possible and then take the address of the +result. The result of `&expr` would always have type `&T` where `T` does not +implement `Deref`. + + +# Motivation + +In Rust the concept of ownership is more important than the precise level of +indirection. Whilst it is important to distinguish between values and references +for performance reasons, Rust's ownership model means it is less important to +know how many levels of indirection are involved in a reference. + +It is annoying to have to write out `&*`, `&**`, etc. to convert from one +pointer kind to another. It is not really informative and just makes reading and +writing Rust more painful. + +It would be nice to strongly enforce the principle that the first type a +programmer should think of for a function signature is `&T` and to discourage +use of types like `&Box` or `Box`, since these are less general. However, +that generality is somewhat lost if the user of such functions has to consider +how to convert to `&T`. + + +# Detailed design + +Writing `&expr` has the effect of dereferencing `expr` as many times as possible +(by calling `deref` from the `Deref` trait or by doing a compiler-built-in +dereference) and taking the address of the result. + +Where `T` is some type that does not implement `Deref`, `&x` will have type `&T` +if `x` has type `T`, `&T`, `Box`, `Rc`, `&Rc`, `Box<&Rc`, and +so forth. + +`&mut expr` would behave the same way but take a mutable reference as the final +step. The expression would have type `&mut T`. The usual rules for dereferencing +and taking a mutable reference would apply, so the programmer cannot subvert +Rust's mutability invariants. + +No coercions may be applied to `expr` in `&expr`, but they may be applied to +`&expr` if it would otherwise be possible. + +Raw pointers would not be dereferenced by `&`. We expect raw pointer +dereferences to be explicit and to be in an unsafe block. So if `x` has type +`&Box<*Gc>`, then `&x` would have type `&*Gc`. Alternatively, we could +make attempting to dereference a raw pointer using `&` a type error, so `&x` +would give a type error and a note advising to use explicit dereferencing. + +Writing `&(expr)` (and similarly for `&mut(expr)`) will have the effect of +taking the address of `expr` (the current semantics of `&expr`). If `expr` has +type `U`, for any `U`, then `&(expr)` will have type `&U`. This syntax is not +the greatest, and I'm very open to other suggestions. In particular writing +`&(some_big_expression)` will give the address-of not borrow behaviour, which +might be confusing. In practice, I hope this works, since when doing explicit +referencing/dereferencing, people often use brackets (e.g., `(&*x).foo()` would +become `(&(*x)).foo()`). I hope these cases are very rare. It is only necessary +when you need an expression to have type `&Rc` or similar, and when that +expression is not the receiver of a method call. + + +# Drawbacks + +Arguably, we should be very explicit about indirection in a systems language, +and this proposal blurs that distinctions somewhat. + +When a function _does_ want to borrow an owning reference (e.g., takes a +`&Box` or `&mut Vec`), it would be more painful to call that function. I +believe this situation is rare, however. + + +# Alternatives + +Take this proposal, but use a different operator. This new operator would have +the semantics proposed here for `&`, and `&` would continue to be an address-of +operator. + +There are two RFCs for different flavours of cross-borrowing: #226 and #241. + +#226 proposes sugaring `&*expr` as `expr` by doing a dereference and then an +address-of. This converts any pointer-like type to a borrowed reference. + +#241 proposes sugaring `&*n expr` to `expr` where `*n` means any number of +dereferences. This converts any borrowed pointer-like type to a borrowed +reference, erasing multiple layers of indirection. + +At a high level, #226 privileges the level of indirection, and #241 privileges +ownership. This RFC is closer to #241 in spirit, in that it erases multiple +layers of indirection and privileges ownership over indirection. + +All three proposals mean less fiddling with `&` and `*` to get the type you want +and none of them erase the difference between a value and a reference (as auto- +borrowing would). + +In many cases this proposal and #241 give similar results. The difference is +that this proposal is linked to an operator and is type independent, whereas +#241 is implicit and depends on the required type. An example which type checks +under #241, but not this proposal is: + +``` +fn foo(x: &Rc) { + let y: &T = x; +} +``` + +Under this proposal you would use `let y = &x;`. + +I believe the advantages of this approach vs an implicit coercion are: + +* better integration with type inference (note no explicit type in the above + example); +* more easily predictable and explainable behaviour (because we always do + as many dereferences as possible, c.f. a coercion which does _some_ number of + dereferences, dependent on the expected type); +* does not complicate the coercion system, which is already fairly complex and + obscure (RFC on this coming up soon, btw). + +The principle advantage of the coercion approach is flexibility, in particular +in the case where we want to borrow a reference to a smart pointer, e.g. +(aturon), + +``` +fn wants_vec_ref(v: &mut Vec) { ... } + +fn has_vec(v: Vec) { + wants_vec_ref(&mut v); // coercing Vec to &mut Vec +} +``` + +Under this proposal `&mut v` would have type `&mut[u8]` so we would fail type +checking (I actually think this is desirable because it is more predictable, +although it is also a bit surprising). Instead you would write `&mut(v)`. (This +example assumes `Deref` for `Vec`, but the point stands without it, in general). + + +# Unresolved questions + +Can we do better than `&(expr)` syntax for address-of? + + +## Slicing + +There is a separate question about how to handle the `Vec` -> `&[T]` and +`String` -> `&str` conversions. We currently support this conversion by calling +the `as_slice` method or using the empty slicing syntax (`expr[]`). If we want, +we could implement `Deref<[T]>` for `Vec` and `Deref` for `String`, +which would allow us to convert using `&*expr`. With this RFC, we could convert +using `&expr` (with RFC #226 the conversion would be implicit). + +The question is really about `Vec`, `String`, and `Deref`, and is mostly +orthogonal to this RFC. As long as we accept this or one of the cross-borrowing +RFCs, then `Deref` could give us 'nice' conversions from `Vec` and `String`. From fd70013408b3b80d3d7daa463007a76f5c75361f Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Mon, 22 Sep 2014 10:06:02 +1200 Subject: [PATCH 2/4] Use addr() rather than &(), add point about receiver conversions, change the discussion to focus more on smart pointers than the Deref trait. --- 0000-borrow.md | 85 ++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 65 insertions(+), 20 deletions(-) diff --git a/0000-borrow.md b/0000-borrow.md index 4bb761f698b..da0db725956 100644 --- a/0000-borrow.md +++ b/0000-borrow.md @@ -7,9 +7,20 @@ Change the address-of operator (`&`) to a borrow operator. This is an alternative to #241 and #226 (cross-borrowing coercions). The borrow operator -would perform as many dereferences as possible and then take the address of the -result. The result of `&expr` would always have type `&T` where `T` does not -implement `Deref`. +would create a borrowed reference to data referenced by any number of smart +pointers or borrowed references. It would be implemented by performing as many +dereferences as possible and then take the address of the result. + +E.g., + +``` +fn foo(x: &Baz) { ... } + +fn bar(y: Rc, z: &Rc<&Baz>) { + foo(&y); // currently: foo(&*y); + foo(&z); // currently: foo(&***y); +} +``` # Motivation @@ -21,7 +32,7 @@ know how many levels of indirection are involved in a reference. It is annoying to have to write out `&*`, `&**`, etc. to convert from one pointer kind to another. It is not really informative and just makes reading and -writing Rust more painful. +writing Rust more painful ("type Tetris"). It would be nice to strongly enforce the principle that the first type a programmer should think of for a function signature is `&T` and to discourage @@ -33,13 +44,19 @@ how to convert to `&T`. # Detailed design Writing `&expr` has the effect of dereferencing `expr` as many times as possible -(by calling `deref` from the `Deref` trait or by doing a compiler-built-in -dereference) and taking the address of the result. +(whether smart pointers or borrowed references) and taking the address of the +result. This is implemented in the same way as the `*` operator, by checking for +borrowed references (or `Gc` or `Box` pointers while these are special-cased by +the compiler) or the `Deref` trait. Where `T` is some type that does not implement `Deref`, `&x` will have type `&T` if `x` has type `T`, `&T`, `Box`, `Rc`, `&Rc`, `Box<&Rc`, and so forth. +Note that this operation depends entirely on the static type of the expression +being borrowed. An expression with generic type and which is not bounded by +`Deref` will not be dereferenced, even if at runtime it is a smart pointer. + `&mut expr` would behave the same way but take a mutable reference as the final step. The expression would have type `&mut T`. The usual rules for dereferencing and taking a mutable reference would apply, so the programmer cannot subvert @@ -54,16 +71,26 @@ dereferences to be explicit and to be in an unsafe block. So if `x` has type make attempting to dereference a raw pointer using `&` a type error, so `&x` would give a type error and a note advising to use explicit dereferencing. -Writing `&(expr)` (and similarly for `&mut(expr)`) will have the effect of -taking the address of `expr` (the current semantics of `&expr`). If `expr` has -type `U`, for any `U`, then `&(expr)` will have type `&U`. This syntax is not -the greatest, and I'm very open to other suggestions. In particular writing -`&(some_big_expression)` will give the address-of not borrow behaviour, which -might be confusing. In practice, I hope this works, since when doing explicit -referencing/dereferencing, people often use brackets (e.g., `(&*x).foo()` would -become `(&(*x)).foo()`). I hope these cases are very rare. It is only necessary -when you need an expression to have type `&Rc` or similar, and when that -expression is not the receiver of a method call. +We would add a function `addr` to the prelude that would fulfill the function of +the current `&` operator, i.e., take a borrowed reference without dereferencing. +It would be defined as: + +``` +#[inline] +fn addr(x: T) -> &T { + &T +} +``` + +Similarly, we would add an `addr_mut` function. Note that we use the new borrow +operator in the definition of `addr`, we get the desired effect because `T` is +not bounded by `Deref`. This illustrates that the borrow operator depends on the +static type of the operand and that the operation is in some ways as fundamental +as the current address-of operator. + +I hope use of these functions are very rare. It is only necessary when you need +an expression to have type `&Rc` or similar, and when that expression is not +the receiver of a method call. # Drawbacks @@ -75,12 +102,17 @@ When a function _does_ want to borrow an owning reference (e.g., takes a `&Box` or `&mut Vec`), it would be more painful to call that function. I believe this situation is rare, however. +Since the behaviour of the borrow operator depends on the static type of its +operand, the behaviour might change if a borrow expression is inlined from a +generic function. This is surprising when compared to the address-of operator, +however, it is similar behaviour to that expected from function/method calls and +the `*` operator (and other overloaded operators). # Alternatives -Take this proposal, but use a different operator. This new operator would have -the semantics proposed here for `&`, and `&` would continue to be an address-of -operator. +Take this proposal, but use a different operator (`~` has been suggested). This +new operator would have the semantics proposed here for `&`, and `&` would +continue to be an address-of operator. There are two RFCs for different flavours of cross-borrowing: #226 and #241. @@ -142,8 +174,21 @@ example assumes `Deref` for `Vec`, but the point stands without it, in general). # Unresolved questions -Can we do better than `&(expr)` syntax for address-of? +## Receiver conversions + +We currently allow very flexible type conversions in method calls and fields +accesses (i.e., using the dot operator). These are fairly unpredictable and a +little out of place in Rust since they auto-reference (blurring the line between +value and reference). It strikes me that the most common case is for converting +to `&self`, it might be possible to change the current receiver conversion to be +an implicit version of the borrow operator. I believe that would be more +predictable, more consistent, and easier to explain. However, it is clearly +less flexible, so the question is 'how much code would break?'. + +## `ref` +Using `ref` in a pattern has similar behaviour to using `&` in an expression. +Should it have the borrow or address-of semantics? ## Slicing From cbb376dc9f9b99a4b95c613d0f0e19ced3c01e61 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Wed, 15 Oct 2014 14:46:11 +1300 Subject: [PATCH 3/4] Change addr fn to AddressOf trait --- 0000-borrow.md | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/0000-borrow.md b/0000-borrow.md index da0db725956..88bd38676e1 100644 --- a/0000-borrow.md +++ b/0000-borrow.md @@ -71,22 +71,32 @@ dereferences to be explicit and to be in an unsafe block. So if `x` has type make attempting to dereference a raw pointer using `&` a type error, so `&x` would give a type error and a note advising to use explicit dereferencing. -We would add a function `addr` to the prelude that would fulfill the function of -the current `&` operator, i.e., take a borrowed reference without dereferencing. -It would be defined as: +We would add an `AddressOf` trait to the prelude that would fulfill the function +of the current `&` operator, i.e., take a borrowed reference without +dereferencing. It would be defined as: ``` -#[inline] -fn addr(x: T) -> &T { - &T +trait AddressOf { + fn address_of(&self) -> &Self; + fn address_of_mut(&mut self) -> &mut Self; +} + +impl AddressOf for T { + #[inline] + fn address_of(&self) -> &T { + self + } + + #[inline] + fn address_of_mut(&mut self) -> &mut T { + self + } } ``` -Similarly, we would add an `addr_mut` function. Note that we use the new borrow -operator in the definition of `addr`, we get the desired effect because `T` is -not bounded by `Deref`. This illustrates that the borrow operator depends on the -static type of the operand and that the operation is in some ways as fundamental -as the current address-of operator. +To get get the address of some value `foo`, you would write `foo.address_of()`. +This trait relies on the auto-ref behaviour of methods on their receivers and +the way that mechanism prefers to do as few references as possible. I hope use of these functions are very rare. It is only necessary when you need an expression to have type `&Rc` or similar, and when that expression is not From ab88e6c34dedb5e2aed2419a253d108d781a3eec Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Thu, 23 Oct 2014 18:03:55 +1300 Subject: [PATCH 4/4] Add some discussion of `ref` in pattern matching and some other bits and pieces. --- 0000-borrow.md | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/0000-borrow.md b/0000-borrow.md index 88bd38676e1..09d6dd77e79 100644 --- a/0000-borrow.md +++ b/0000-borrow.md @@ -67,7 +67,7 @@ No coercions may be applied to `expr` in `&expr`, but they may be applied to Raw pointers would not be dereferenced by `&`. We expect raw pointer dereferences to be explicit and to be in an unsafe block. So if `x` has type -`&Box<*Gc>`, then `&x` would have type `&*Gc`. Alternatively, we could +`&Box<*Rc>`, then `&x` would have type `&*Rc`. Alternatively, we could make attempting to dereference a raw pointer using `&` a type error, so `&x` would give a type error and a note advising to use explicit dereferencing. @@ -102,6 +102,34 @@ I hope use of these functions are very rare. It is only necessary when you need an expression to have type `&Rc` or similar, and when that expression is not the receiver of a method call. +There would be no change to reference types, `&T` would mean the same as it does +today. + +There would also be no change to using `&` in pattern matching. That is, `&` in +pattern matching would match the behaviour of the `&` type, not the `&` +operator. This is logical since the `&` operator is no longer a type +construction operation. It is slightly unfortunate that the type introduction +and elimination syntax do not correspond exactly (but they would correspond +better than `*` does). + +## ref + +I think `ref` should continue to have the same behaviour it does today. It is a +bit of a shame that `&` and `ref` would have different effects, but given that +they are completely different syntax-wise, I think this is OK. In support of +keeping `ref` as is: + +* if you want the borrow operator behaviour, you can always just use `&` on the + variable; +* unlike in expression position, there is no way to call a function to get a + reference (if we took the borrow operator behaviour); +* in my experience, when using `&` in expression position, I want to borrow the + contents, but when using `ref` I want the operand, but I just want it 'by reference'; +* pattern matching is very structural, and it just **feels** better that here we + are precise about a reference and not unwrapping too; +* I think the above points hold especially true when considering `mut ref`, see + also the comments below about mutable references to collections in the + 'unresolved questions' section. # Drawbacks @@ -195,10 +223,6 @@ an implicit version of the borrow operator. I believe that would be more predictable, more consistent, and easier to explain. However, it is clearly less flexible, so the question is 'how much code would break?'. -## `ref` - -Using `ref` in a pattern has similar behaviour to using `&` in an expression. -Should it have the borrow or address-of semantics? ## Slicing @@ -212,3 +236,11 @@ using `&expr` (with RFC #226 the conversion would be implicit). The question is really about `Vec`, `String`, and `Deref`, and is mostly orthogonal to this RFC. As long as we accept this or one of the cross-borrowing RFCs, then `Deref` could give us 'nice' conversions from `Vec` and `String`. + +However, it is worth considering the `&mut` situation in particular. One place +it seems sensible to want a mutable reference is when mutating owning +collections. If we want to add or remove an element from a `Vec` (for example) +we do want a mutable reference to the `Vec` itself and not a mutable slice view +of the data inside. Even though this situation is sensible, I believe it is rare +enough that using a function will suffice. I believe many such instances are in +pattern matching, and `ref` is not affected by this proposal.