Skip to content

Conversation

@Diggsey
Copy link
Contributor

@Diggsey Diggsey commented May 22, 2020

There's UB in the current Arc implementation, see:

rust/src/liballoc/sync.rs

Lines 866 to 870 in c60b675

#[inline(never)]
unsafe fn drop_slow(&mut self) {
// Destroy the data at this time, even though we may not free the box
// allocation itself (there may still be weak pointers lying around).
ptr::drop_in_place(&mut self.ptr.as_mut().data);

This races with Weak::inner:

rust/src/liballoc/sync.rs

Lines 1680 to 1683 in c60b675

#[inline]
fn inner(&self) -> Option<&ArcInner<T>> {
if is_dangling(self.ptr) { None } else { Some(unsafe { self.ptr.as_ref() }) }
}

Calling drop_in_place mutates the data value, which AIUI is UB when elsewhere we have a live shared reference to the containing struct (unless we mark the field as internally mutable somehow).

The implementation for this was a little tricky: we can't just switch to using an UnsafeCell<ManuallyDrop<T>> as this changes the variance of Arc and Weak from covariant to invariant, so instead we make sure to never dereference the stored pointer: we always cast it to ArcInner before dereferencing.

Follow-up from #72443

cc @danielhenrymantilla

@rust-highfive
Copy link
Contributor

r? @dtolnay

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 22, 2020
Copy link
Contributor

@danielhenrymantilla danielhenrymantilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(First of all let's cc @RalfJung for their UB expertise)

Hmmm, I think we can make fewer changes: I don't think that the definition for Arc needs to be changed; AFAIK, the "only" / main offender here is a Weak calling .inner() while a unique Arc is around, since that Arc believes it is allowed to do whatever it wants with the data, when, because of that callable .inner() (transitively yielding a &'_ T shared reference to the data) on a simultaneous Weak instance, it cannot.

So I suggest keeping everything as it was, except for the signature of the returned value of .inner(), which should be changed to return an Option<_> of:

  • either &'_ ArcInner<UnsafeCell<MaybeUninit<T>> (the MaybeUninit part setting things up for further evolutions, such as the one in your other PR),

  • or, more simply, &'_ ArcInner<1-ZST> (which would make .inner() effectively equivalent to a .counters()
    getter returning a struct Counters<'inner> { strong: &'inner AtomicUsize, weak: &'inner AtomicUsize }, which is fine since that's exactly what .inner() should be doing before knowing that strong ≥ 1!).

In both cases the core idea is that a Weak gets to look at its counters, mainly the strong one, before ever trying to assert anything about the .data / to peek at it. This way, a unique Arc owner can effectively take ownership of the ArcInner's .data by clearing that .strong counter, so as to no longer worry about remaining Weaks.


Finally, remains the question of what the fields of Weak ought to be. Given that the pointer may already be dangling, and that therefore the blessed way to dereference that pointer is through the guarded .inner() getter, I'd say that we could let the field(s) remain as is. If, on the other hand, we really want to enforce / express at the type-level that using that getter is mandatory, we could change Weak to become:

+ struct ErasedPayload;
+ type Counters = ArcInner<ErasedPayload>; // thanks to `#[repr(C)]`-ness of `ArcInner`

struct Weak<T : ?Sized> {
-   ptr: NonNull<ArcInner<T>>,
+   ptr: NonNull<Counters>,
+   _phantom: PhantomData<ArcInner<T>>,
}

@RalfJung
Copy link
Member

Calling drop_in_place mutates the data value, which AIUI is UB when elsewhere we have a live shared reference to the containing struct (unless we mark the field as internally mutable somehow).

So Weak::inner can be called even when we do not have a guarantee that the value will actually stay around? If so, yes that is definitely wrong. I suppose the reference remains dereferencable because we do not deallocate the ArcInner until all Weak are gone, but it does not point to an actually initialized element of ArcInner<T> any more (though ManuallyDrop would be enough -- ManuallyDrop witnesses that dropping cannot break the validity invariant of a type).

@RalfJung
Copy link
Member

Okay so I agree there is a race, e.g. when someone calls strong_count while at the same time the corresponding Arc::drop_slow is called. Good catch!

But IMO the "most obvious" solution to this is to make Weak::inner return a raw pointer, not a shared reference. Would that not work?

@Diggsey
Copy link
Contributor Author

Diggsey commented May 23, 2020

I think both of you are not wrong - there are other ways that may require fewer changes to make this sound. However, I chose this way because I think it's easier to reason about.

Without the changes to Arc there are some extra subtleties: for example, as long as Arc::drop_slow obtains a mutable reference to inner:

ptr::drop_in_place(&mut self.ptr.as_mut().data);

(specifically, as soon as this executes: self.ptr.as_mut()) then we're in a dangerous spot: this mutable reference implies that we can mutate any part of "inner" without synchronisation, including the reference counts (eg. the safe method AtomicUsize::get_mut). If we change Weak to use raw pointers, it will still need to create a shared reference to the first reference count, and that will alias with the mutable reference from Arc::drop_slow.

From a soundness perspective, I think this is OK (although I'm not exactly confident) because we're only accessing the data field through this mutable reference? I just take issue with how subtle these interactions are, and it would be easy for someone to come along in the future and not realise how fragile this is.

On the other hand, using an UnsafeCell around the data field means we can freely use shared references to the interior without worrying about aliasing until we actually need to access the data field. Furthermore, because we can never obtain a mutable reference, we know that we must always use atomic accesses to get at the reference counts.

The only thing I don't like about this approach is the need to cast "ptr" back and forth to solve the variance issue: ideally we could just always use the UnsafeCell<ManuallyDrop<T>> variant, but there's no syntax for doing this yet.

@Diggsey
Copy link
Contributor Author

Diggsey commented May 23, 2020

@danielhenrymantilla
Re: your second point, I tried something like that initially, but any approach which tries to "erase" the pointer type runs into problems when we try and implement CoerceUnsized: the compiler doesn't like it unless we have a non-phantom-data field which uses T.

@RalfJung
Copy link
Member

RalfJung commented May 23, 2020

However, I chose this way because I think it's easier to reason about.

There are very few situations when references are easier to reason about than raw pointers. You are doing some rather extreme hacks here with InternalArcInner, nothing about that is easy to reason about in my book.

Raw pointers make this a lot simpler, and avoid all these hacks you need to "tame" shared references and avoid all the problems. Working with aliasing data is exactly what raw pointers are for. Sure, we could replace almost every *const T by &UnsafeCell<T>, but... ugh, no, let's not do that but let's instead use the tool that was designed for this job.

If we change Weak to use raw pointers, it will still need to create a shared reference to the first reference count, and that will alias with the mutable reference from Arc::drop_slow.

If Weak actually reads or writes the weak counters, whether it uses shared references or raw pointers does not matter -- it is incorrect either way as uniqueness of the mutable reference is violated. If Weak does not access the counters, then a shared reference is fine because there's an UnsafeCell around each count. (But then why would it create a shared reference in the first place?)

EDIT: Oh wait, the mutable reference is only used for data, right? The right fix is to change &mut self.ptr.as_mut().data to not use as_mut() (which creates a mutable reference) but as_ptr() (which creates a raw pointer).
If everyone involved uses raw pointers, all reference-related problems disappear. That's why this is the much cleaner solution than transmuting around with InternalArcInner and UnsafeCell.

@Diggsey
Copy link
Contributor Author

Diggsey commented May 23, 2020

EDIT: Oh wait, the mutable reference is only used for data, right? The right fix is to change &mut self.ptr.as_mut().data to not use as_mut() (which creates a mutable reference) but as_ptr() (which creates a raw pointer).

Yeah this is what I was trying to get at.

If everyone involved uses raw pointers, all reference-related problems disappear. That's why this is the much cleaner solution than transmuting around with InternalArcInner and UnsafeCell.

Consistently using raw pointers everywhere where "inner" is accessed would be a much larger change than this PR, and that style of programming is really not very ergonomic in Rust. Using it in some places but not others means you still have to worry about reference-related problems, so why not embrace shared references everywhere?

I don't really see this as any different from an RwLock - we have an atomic counter which keeps track of the number of accessors, and we get mutable access to the interior when we can guarantee having exclusive access. But you wouldn't argue that we should be using raw pointers to build an RwLock? An UnsafeCell is the obvious container for marking part of a struct as being internally mutable, and ManuallyDrop is the obvious container to allow us to drop the inner value before the ArcInner itself.

The only hacky part here as far as I'm concerned is the variance issue, which requires us to cast the pointer type back and forth, but this is encapsulated in the "inner()" method, so most of the code doesn't have to concern itself with that.

@RalfJung
Copy link
Member

RalfJung commented May 24, 2020

Consistently using raw pointers everywhere where "inner" is accessed would be a much larger change than this PR, and that style of programming is really not very ergonomic in Rust. Using it in some places but not others means you still have to worry about reference-related problems, so why not embrace shared references everywhere?

I don't think changes everywhere would be needed. And I think selectively using raw pointers in what you identified to be the "critical" parts (dropping, where we need a mutable reference only to the data, and Weak, which could be used post-drop) is a lot easier to review, reason about, and understand than having two differently typed views of the same memory. I consider the latter programming technique to be a last resort, only to be used when no other solution is feasible.

It is true that raw pointers are less ergonomic. To make them more ergonomic, we need to get a better understanding of the kind of patterns one would use them for. We'll never get that understanding if we turn ourselves upside down to avoid raw pointers wherever possible.

I don't really see this as any different from an RwLock - we have an atomic counter which keeps track of the number of accessors, and we get mutable access to the interior when we can guarantee having exclusive access.

If that were true, we should make Arc invariant in T like RwLock is. My main issue with your approach is not that you use UnsafeCell, it is that you have two different types and transmute the same memory around to switch between which type is being used for that memory. That is a big "code smell" IMO and should be avoided if at all possible.

I maintain that using raw pointers would be a lot cleaner. If you find someone else to r+ this as-is, I won't block it, but I will not approve this change until raw pointers have at least been tried.

@Diggsey
Copy link
Contributor Author

Diggsey commented May 24, 2020

I don't think changes everywhere would be needed. And I think selectively using raw pointers in what you identified to be the "critical" parts (dropping, where we need a mutable reference only to the data, and Weak, which could be used post-drop) is a lot easier to review, reason about, and understand than having two differently typed views of the same memory. I consider the latter programming technique to be a last resort, only to be used when no other solution is feasible.

These "critical parts" include "Arc::drop_slow", and every single usage of Weak::inner() - that's still quite a lot. Furthermore we'd be using the ability to get a raw pointer to a field, and the explicit syntax for that hasn't even been implemented yet: #64490.

If that were true, we should make Arc invariant in T like RwLock

I view this as a case where we are enforcing a property (covariance) through our safe interface, and the compiler simply has no way of knowing that. In most cases, we have special types and traits to express these properties (Send, Sync, etc.) but in this case we don't. Ideally we would have a syntax like this:

unsafe impl<T> Covariant<T> for Arc<T> {}
unsafe impl<T> Covariant<T> for Weak<T> {}

To allow variance to be overridden. If we had that syntax then we wouldn't need to do any casting back and forth at all.

My main issue with your approach is not that you use UnsafeCell, it is that you have two different types and transmute the same memory around to switch between which type is being used for that memory. That is a big "code smell" IMO and should be avoided if at all possible.

I agree that this is not ideal, I just see it as a temporary evil that is fairly well encapsulated, and can be got rid of once we have some way to override variance.

@Diggsey
Copy link
Contributor Author

Diggsey commented May 24, 2020

@RalfJung I opened #72533 which has a version using raw pointers.

@RalfJung
Copy link
Member

Furthermore we'd be using the ability to get a raw pointer to a field, and the explicit syntax for that hasn't even been implemented yet: #64490.

Correction: it hasn't been stabilized yet. &raw (const|mut) foo is implemented.

I think libstd is just the right place to start using it.

I agree that this is not ideal, I just see it as a temporary evil that is fairly well encapsulated, and can be got rid of once we have some way to override variance.

I find it unlikely that we will have such a way in the near or medium-term future.

Thanks a lot for preparing that other PR! I much prefer that approach.

@Diggsey Diggsey closed this May 26, 2020
@Diggsey Diggsey deleted the fix-arc-ub branch May 26, 2020 00:03
@dtolnay dtolnay added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 26, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 26, 2020
Resolve UB in Arc/Weak interaction (2)

Use raw pointers to avoid making any assertions about the data field.

Follow up from rust-lang#72479, see that PR for more detail on the motivation.

@RalfJung I was able to avoid a lot of the changes to `Weak`, by making a helper type (`WeakInner`) - because of auto-deref and because the fields have the same name, the rest of the code continues to compile.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 27, 2020
Resolve UB in Arc/Weak interaction (2)

Use raw pointers to avoid making any assertions about the data field.

Follow up from rust-lang#72479, see that PR for more detail on the motivation.

@RalfJung I was able to avoid a lot of the changes to `Weak`, by making a helper type (`WeakInner`) - because of auto-deref and because the fields have the same name, the rest of the code continues to compile.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 27, 2020
Resolve UB in Arc/Weak interaction (2)

Use raw pointers to avoid making any assertions about the data field.

Follow up from rust-lang#72479, see that PR for more detail on the motivation.

@RalfJung I was able to avoid a lot of the changes to `Weak`, by making a helper type (`WeakInner`) - because of auto-deref and because the fields have the same name, the rest of the code continues to compile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants