Improve `Signal` state management to avoid an ARC race. #456

andersio · 2017-06-15T09:05:21Z

Preceded by #453.

What is the race in particular?

Let's look at how the compiled assembly does the work:

Thread A                           | Thread B
------------------------------------------------------------------
Lock `updateLock`.                 |
                                   |
Pack the new enum.                 |
(with retain)                      |
                                   |
Load the current `self.state`.     | Load `self.state`.
                                   |
Store to `self.state`.             | Retain the enum.
                                   |
Release the old enum.              | Call the observers.
                                   |
Unlock `updateLock`.               |

It seems safe, especially since writes are serialised.

But what if we shift the timeline of thread B a little bit further?

Thread A                           | Thread B
------------------------------------------------------------------
Lock `updateLock`.                 |
                                   |
Pack the new enum.                 |
(with retain)                      |
                                   |
Store to `self.state`.         <<!!!!!>> Load `self.state`.
                                   |
Release the old enum.          <<!!!!!>> Retain the enum.
                                   |
Unlock `updateLock`.               | Call the observers.

Data races now exist due to ARC not being part of the atomic read/write. Our RCU model considers only the atomicity of self.state alone without the hidden side effect of ARC.

While the contention window is extremely small (a few instruction cycles) thus being extremely hard to expose, it is possible on paper that the AliveState is released before a concurrent relaxed reader retains it.

What is the solution proposed by this PR?

~~The Signal core now tracks the number of pending state updates.~~

~~If a sender detects that there is any pending state update, it would acquire also updateLock just to exploit the serialisation to eliminate any release-retain race with the state updates.~~

Signal.Core.send locks updateLock whenever it accesses state.

What is the cost of the solution?

~~It requires an Int32 read for every value event delivered, which is offset by another change in the patch. It also requires atomic increments & decrements before and after state updates.~~

What else is changed in this PR?

~~Since Swift retains the enum as it loads it, even if only the tag is probed, the relaxed terminating check is also prone to the race.~~

~~The terminating check now relies instead on a special negative value in the new atomic counter, which would be asserted as the state is bumped to terminating.~~
~~All the state management is encapsulated in CoreBlackBox, which Signal.Core now inherits from.~~
Recursive terminations are no longer prioritized, and are now prone to competition with concurrent senders (if any). This is only a change in the implementation detail, as the Signal contract does not guarantee any deterministic order between concurrent senders.

~~Note that the removal offsets the event delivery overhead brought by the atomic counter.~~

Checklist

~~Updated CHANGELOG.md.~~

mdiep · 2017-06-15T12:27:41Z

Sources/Signal.swift

+					assert(count == -1)
+					break
+				}
+			} while !updaterCount.swap(from: updaterCount.value, to: updaterCount.value + 1)


This seems like a poor imitation of a spin/unfair lock. I'd prefer to see us use Atomic like in #453—at least to reestablish correctness. Afterwards we could consider an optimization like this.

This is not a lock, but a custom atomic increment/decrement that is aware of the special -1 case. To some extent it is just a simple ARC clone.

Unlike a spin lock, forward progress is guaranteed even with priority inversion (even if that would mean cache line ping-pong).

I'd prefer to see us use Atomic like in #453—at least to reestablish correctness. Afterwards we could consider an optimization like this.

To be fair, the Signal core itself is already an Atomic and acts like one for all writes, just with the storage and the lock inlined.

updateState(_:) is just a synonym of modify(_:).

andersio · 2017-06-15T13:11:44Z

Speed stat
Source (the Signal version), x86-64, i5 6250U
Note: ARMv8 might have a slightly different characteristics.

// this branch
@test(): avg 1725 ns; min 1449 ns

// this branch without the atomic counter tracking pending state updates
// (always lock on observer read)
@test(): avg 1900 ns; min 1613 ns
// (locking also on terminating check)
@test(): avg 3054 ns; min 2514 ns

// master
@test(): avg 1260 ns; min 1039 ns

// #453
@test(): avg 2376 ns; min 2045 ns

Hmm, investigating the mysterious bump.

andersio · 2017-06-15T14:52:51Z

It turns out that locking on the hot path with 1ad2524 is less harmful than undoing the inlining of updateLock and the state storage, or the solution I initially pushed.
🤦‍♂️

+281 −192 vs +9 -6.

@test(): avg 1350 ns; min 1104 ns

Kudos to os_unfair_lock.

andersio · 2017-06-15T17:25:35Z

Since both reads and writes now are locked in all circumstances, the two state box types can be removed. This pushes the overhead down to the current level.

🤦‍♂️

@test(): avg 1276 ns; min 1045 ns
@test(): avg 1249 ns; min 1043 ns
@test(): avg 1250 ns; min 1043 ns
@test(): avg 1229 ns; min 1043 ns
@test(): avg 1269 ns; min 1039 ns
@test(): avg 1244 ns; min 1051 ns

mdiep

Looks great! Just needs an update to the documentation.

mdiep · 2017-06-16T13:00:41Z

Sources/Signal.swift

-		/// Used to indicate if the `Signal` has deinitialized.
-		private var hasDeinitialized: Bool
-
 		fileprivate init(_ generator: (Observer) -> Disposable?) {


The above documentation needs to be updated to reflect that the lock must be held to read the state.

andersio · 2017-06-16T16:26:43Z

(Fun fact: Swift 4 runs everything slower by ~100ns.)

andersio · 2017-06-16T16:48:18Z

Sources/Signal.swift

+			//
+			// Related PR:
+			// https://github.com/ReactiveCocoa/ReactiveSwift/pull/112
+			if .shouldDispose == self.tryToCommitTermination() {


Both the terminal path and the value path run this at the end.

andersio · 2017-06-16T16:49:35Z

Sources/Signal.swift

-				self.stateLock.lock()
-				if result == .none, case .terminating = self.state {
-					self.stateLock.unlock()
-					result = self.tryToCommitTermination()


tryToCommitTermination already locks the state to check if it is terminating or not. So the locking here is pointless.

andersio · 2017-06-16T19:04:41Z

Sources/Signal.swift

+				// Note that this cannot be `try` since any concurrent observer bag
+				// manipulation might then cause the terminating state being missed.
+				stateLock.lock()
+				if case .terminating = state {


This is resurrected because acquiring the spinlock is consistently & considerably faster than calling tryToCommitTermination() on my machine (>5% per observer callout). 🤦‍♂️

andersio added the bug label Jun 15, 2017

andersio force-pushed the signal-thread-safety branch 2 times, most recently from 696c4fa to 6c70122 Compare June 15, 2017 09:32

mdiep reviewed Jun 15, 2017

View reviewed changes

andersio force-pushed the signal-thread-safety branch from 6c70122 to e9d3f6c Compare June 15, 2017 14:48

Be conservative in locking in Signal.Core.send for now.

1ad2524

andersio force-pushed the signal-thread-safety branch from e9d3f6c to 1ad2524 Compare June 15, 2017 14:51

andersio modified the milestone: 2.0 Jun 15, 2017

Remove AliveState and TerminatingState.

557979e

mdiep requested changes Jun 16, 2017

View reviewed changes

Renamed updateLock to stateLock. Updated the documentation.

a0e7f74

Clean tryToCommitTermination() up.

a407b6e

andersio commented Jun 16, 2017

View reviewed changes

andersio requested a review from mdiep June 16, 2017 16:50

mdiep approved these changes Jun 16, 2017

View reviewed changes

andersio force-pushed the signal-thread-safety branch from df9066b to d734a11 Compare June 16, 2017 18:55

Revert removal of terminating check fast path. Remove OperationResult.

ab84a10

andersio force-pushed the signal-thread-safety branch from d734a11 to ab84a10 Compare June 16, 2017 19:03

andersio commented Jun 16, 2017

View reviewed changes

mdiep approved these changes Jun 16, 2017

View reviewed changes

mdiep merged commit b4f73c1 into master Jun 17, 2017

mdiep deleted the signal-thread-safety branch June 17, 2017 00:08

andersio added a commit that referenced this pull request Jun 17, 2017

Update the changelog for #456. [skip ci]

bc6bc6e

Improve Signal state management to avoid an ARC race. #456

Improve Signal state management to avoid an ARC race. #456

Uh oh!

Conversation

andersio commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the race in particular?

What is the solution proposed by this PR?

What is the cost of the solution?

What else is changed in this PR?

Checklist

Uh oh!

mdiep Jun 15, 2017

Choose a reason for hiding this comment

Uh oh!

andersio Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andersio Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andersio commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andersio commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andersio commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdiep left a comment

Choose a reason for hiding this comment

Uh oh!

mdiep Jun 16, 2017

Choose a reason for hiding this comment

Uh oh!

andersio commented Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andersio Jun 16, 2017

Choose a reason for hiding this comment

Uh oh!

andersio Jun 16, 2017

Choose a reason for hiding this comment

Uh oh!

andersio Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve `Signal` state management to avoid an ARC race. #456

Improve `Signal` state management to avoid an ARC race. #456

andersio commented Jun 15, 2017 •

edited

Loading

andersio Jun 15, 2017 •

edited

Loading

andersio Jun 15, 2017 •

edited

Loading

andersio commented Jun 15, 2017 •

edited

Loading

andersio commented Jun 15, 2017 •

edited

Loading

andersio commented Jun 15, 2017 •

edited

Loading

andersio commented Jun 16, 2017 •

edited

Loading

andersio Jun 16, 2017 •

edited

Loading