Skip to content

Conversation

@andersio
Copy link
Member

@andersio andersio commented Jun 15, 2017

Preceded by #453.

What is the race in particular?

Let's look at how the compiled assembly does the work:

Thread A                           | Thread B
------------------------------------------------------------------
Lock `updateLock`.                 |
                                   |
Pack the new enum.                 |
(with retain)                      |
                                   |
Load the current `self.state`.     | Load `self.state`.
                                   |
Store to `self.state`.             | Retain the enum.
                                   |
Release the old enum.              | Call the observers.
                                   |
Unlock `updateLock`.               |

It seems safe, especially since writes are serialised.

But what if we shift the timeline of thread B a little bit further?

Thread A                           | Thread B
------------------------------------------------------------------
Lock `updateLock`.                 |
                                   |
Pack the new enum.                 |
(with retain)                      |
                                   |
Store to `self.state`.         <<!!!!!>> Load `self.state`.
                                   |
Release the old enum.          <<!!!!!>> Retain the enum.
                                   |
Unlock `updateLock`.               | Call the observers.

Data races now exist due to ARC not being part of the atomic read/write. Our RCU model considers only the atomicity of self.state alone without the hidden side effect of ARC.

While the contention window is extremely small (a few instruction cycles) thus being extremely hard to expose, it is possible on paper that the AliveState is released before a concurrent relaxed reader retains it.

What is the solution proposed by this PR?

The Signal core now tracks the number of pending state updates.

If a sender detects that there is any pending state update, it would acquire also updateLock just to exploit the serialisation to eliminate any release-retain race with the state updates.

Signal.Core.send locks updateLock whenever it accesses state.

What is the cost of the solution?

It requires an Int32 read for every value event delivered, which is offset by another change in the patch. It also requires atomic increments & decrements before and after state updates.

What else is changed in this PR?

  1. Since Swift retains the enum as it loads it, even if only the tag is probed, the relaxed terminating check is also prone to the race.

    The terminating check now relies instead on a special negative value in the new atomic counter, which would be asserted as the state is bumped to terminating.

  2. All the state management is encapsulated in CoreBlackBox, which Signal.Core now inherits from.

  3. Recursive terminations are no longer prioritized, and are now prone to competition with concurrent senders (if any). This is only a change in the implementation detail, as the Signal contract does not guarantee any deterministic order between concurrent senders.

    Note that the removal offsets the event delivery overhead brought by the atomic counter.

Checklist

  • Updated CHANGELOG.md.

@andersio andersio added the bug label Jun 15, 2017
@andersio andersio force-pushed the signal-thread-safety branch 2 times, most recently from 696c4fa to 6c70122 Compare June 15, 2017 09:32
assert(count == -1)
break
}
} while !updaterCount.swap(from: updaterCount.value, to: updaterCount.value + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a poor imitation of a spin/unfair lock. I'd prefer to see us use Atomic like in #453—at least to reestablish correctness. Afterwards we could consider an optimization like this.

Copy link
Member Author

@andersio andersio Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a lock, but a custom atomic increment/decrement that is aware of the special -1 case. To some extent it is just a simple ARC clone.

Unlike a spin lock, forward progress is guaranteed even with priority inversion (even if that would mean cache line ping-pong).

Copy link
Member Author

@andersio andersio Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to see us use Atomic like in #453—at least to reestablish correctness. Afterwards we could consider an optimization like this.

To be fair, the Signal core itself is already an Atomic and acts like one for all writes, just with the storage and the lock inlined.

updateState(_:) is just a synonym of modify(_:).

@andersio
Copy link
Member Author

andersio commented Jun 15, 2017

Speed stat
Source (the Signal version), x86-64, i5 6250U
Note: ARMv8 might have a slightly different characteristics.

// this branch
@test(): avg 1725 ns; min 1449 ns

// this branch without the atomic counter tracking pending state updates
// (always lock on observer read)
@test(): avg 1900 ns; min 1613 ns
// (locking also on terminating check)
@test(): avg 3054 ns; min 2514 ns

// master
@test(): avg 1260 ns; min 1039 ns

// #453
@test(): avg 2376 ns; min 2045 ns

Hmm, investigating the mysterious bump.

@andersio andersio force-pushed the signal-thread-safety branch from 6c70122 to e9d3f6c Compare June 15, 2017 14:48
@andersio andersio force-pushed the signal-thread-safety branch from e9d3f6c to 1ad2524 Compare June 15, 2017 14:51
@andersio
Copy link
Member Author

andersio commented Jun 15, 2017

It turns out that locking on the hot path with 1ad2524 is less harmful than undoing the inlining of updateLock and the state storage, or the solution I initially pushed.
🤦‍♂️

+281 −192 vs +9 -6.

@test(): avg 1350 ns; min 1104 ns

Kudos to os_unfair_lock.

@andersio andersio modified the milestone: 2.0 Jun 15, 2017
@andersio
Copy link
Member Author

andersio commented Jun 15, 2017

Since both reads and writes now are locked in all circumstances, the two state box types can be removed. This pushes the overhead down to the current level.

🤦‍♂️

@test(): avg 1276 ns; min 1045 ns
@test(): avg 1249 ns; min 1043 ns
@test(): avg 1250 ns; min 1043 ns
@test(): avg 1229 ns; min 1043 ns
@test(): avg 1269 ns; min 1039 ns
@test(): avg 1244 ns; min 1051 ns

Copy link
Contributor

@mdiep mdiep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just needs an update to the documentation.

/// Used to indicate if the `Signal` has deinitialized.
private var hasDeinitialized: Bool

fileprivate init(_ generator: (Observer) -> Disposable?) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above documentation needs to be updated to reflect that the lock must be held to read the state.

@andersio
Copy link
Member Author

andersio commented Jun 16, 2017

(Fun fact: Swift 4 runs everything slower by ~100ns.)

//
// Related PR:
// https://github.com/ReactiveCocoa/ReactiveSwift/pull/112
if .shouldDispose == self.tryToCommitTermination() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the terminal path and the value path run this at the end.

self.stateLock.lock()
if result == .none, case .terminating = self.state {
self.stateLock.unlock()
result = self.tryToCommitTermination()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryToCommitTermination already locks the state to check if it is terminating or not. So the locking here is pointless.

@andersio andersio requested a review from mdiep June 16, 2017 16:50
@andersio andersio force-pushed the signal-thread-safety branch from df9066b to d734a11 Compare June 16, 2017 18:55
@andersio andersio force-pushed the signal-thread-safety branch from d734a11 to ab84a10 Compare June 16, 2017 19:03
// Note that this cannot be `try` since any concurrent observer bag
// manipulation might then cause the terminating state being missed.
stateLock.lock()
if case .terminating = state {
Copy link
Member Author

@andersio andersio Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resurrected because acquiring the spinlock is consistently & considerably faster than calling tryToCommitTermination() on my machine (>5% per observer callout). 🤦‍♂️

@mdiep mdiep merged commit b4f73c1 into master Jun 17, 2017
@mdiep mdiep deleted the signal-thread-safety branch June 17, 2017 00:08
andersio added a commit that referenced this pull request Jun 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants