-
-
Notifications
You must be signed in to change notification settings - Fork 225
Implement process and instance isolation #4498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: version6
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## version6 #4498 +/- ##
===========================================
Coverage ? 76.36%
===========================================
Files ? 392
Lines ? 14741
Branches ? 2969
===========================================
Hits ? 11257
Misses ? 2772
Partials ? 712 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@sentry review |
| { | ||
| throw new InvalidOperationException("Cache directory already locked."); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache coordinator acquisition failure throws an InvalidOperationException, but this could make it impossible to create multiple SDK instances in some scenarios. Consider making this non-fatal and falling back to a non-cached transport or using a different isolation strategy.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Could this be a severe issue?
But was should happen if this fails?
I believe the code path comes from creating a SentryClient via the Hub.
But we don't have too much control in this CachingTransport-.ctor here.
Hmmm ...
I wonder if we should avoid trying to Acquire the Lock-File in the .ctor,
but instead in the Initialize ... or the Worker-Task ... but then the error may happen quite late.
🤔 something is troubling me but I can't quite put my finger on it just yet 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the SDK currently works... so I don't believe it's an issue. If it is, I think we could address in a separate PR.
sentry-dotnet/src/Sentry/Internal/Http/CachingTransport.cs
Lines 81 to 83 in ee30f23
| _isolatedCacheDirectoryPath = | |
| options.TryGetIsolatedCacheDirectoryPath() ?? | |
| throw new InvalidOperationException("Cache directory or DSN is not set."); |
| /// <remarks> | ||
| /// This method can throw all of the same exceptions that the <see cref="FileStream"/> constructor can throw. The | ||
| /// caller is responsible for handling those exceptions. | ||
| /// </remarks> | ||
| public override bool TryCreateLockFile(string path, out Stream fileStream) | ||
| { | ||
| // Note that FileShare.None is implemented via advisory locks only on macOS/Linux... so it will stop | ||
| // other .NET processes from accessing the file but not other non-.NET processes. This should be fine | ||
| // in our case - we just want to avoid multiple instances of the SDK concurrently accessing the cache | ||
| fileStream = new FileStream( | ||
| path, | ||
| FileMode.OpenOrCreate, | ||
| FileAccess.ReadWrite, | ||
| FileShare.None); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TryCreateLockFile method can throw all FileStream constructor exceptions but always returns true. This is inconsistent with the interface design pattern where 'Try' methods typically return false on failure rather than throwing. Consider catching exceptions and returning false, or rename the method to indicate it can throw.
| /// <remarks> | |
| /// This method can throw all of the same exceptions that the <see cref="FileStream"/> constructor can throw. The | |
| /// caller is responsible for handling those exceptions. | |
| /// </remarks> | |
| public override bool TryCreateLockFile(string path, out Stream fileStream) | |
| { | |
| // Note that FileShare.None is implemented via advisory locks only on macOS/Linux... so it will stop | |
| // other .NET processes from accessing the file but not other non-.NET processes. This should be fine | |
| // in our case - we just want to avoid multiple instances of the SDK concurrently accessing the cache | |
| fileStream = new FileStream( | |
| path, | |
| FileMode.OpenOrCreate, | |
| FileAccess.ReadWrite, | |
| FileShare.None); | |
| return true; | |
| } | |
| public override bool TryCreateLockFile(string path, out Stream fileStream) | |
| { | |
| try | |
| { | |
| fileStream = new FileStream( | |
| path, | |
| FileMode.OpenOrCreate, | |
| FileAccess.ReadWrite, | |
| FileShare.None); | |
| return true; | |
| } | |
| catch | |
| { | |
| fileStream = Stream.Null; | |
| return false; | |
| } | |
| } |
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with the behaviour for CreateFileStream and makes it easier to surface errors to callers... so I'd rather stick with the API we have.
test/Sentry.Tests/HubTests.cs
Outdated
| if (options.Transport is CachingTransport cachingTransport) | ||
| { | ||
| cachingTransport.Dispose(); // Release cache lock so that the cacheDirectory can be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Don't we need to do this for every test?
The method CapturesEventWithContextKey_Implementation does it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need to do it when options.CacheDirectoryPath is set... it's just to ensure the cache dir gets cleaned up. Most tests aren't using a cache dir.
| var cachingTransport = Assert.IsType<CachingTransport>(_fixture.SentryOptions.Transport); | ||
| _ = Assert.IsType<FakeTransport>(cachingTransport.InnerTransport); | ||
|
|
||
| cachingTransport.Dispose(); // Release cache lock so that the cacheDirectory can be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: don't we need to do this on a Test-Class level instead, so that it applies to every Test-Method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only necessary because we set CacheDirectoryPath on line 1628.
It's a bit messy but only really a problem in tests, I think... SDK users won't need to worry about this.
| { | ||
| throw new InvalidOperationException("Cache directory already locked."); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Could this be a severe issue?
But was should happen if this fails?
I believe the code path comes from creating a SentryClient via the Hub.
But we don't have too much control in this CachingTransport-.ctor here.
Hmmm ...
I wonder if we should avoid trying to Acquire the Lock-File in the .ctor,
but instead in the Initialize ... or the Worker-Task ... but then the error may happen quite late.
🤔 something is troubling me but I can't quite put my finger on it just yet 🤔
|
|
||
| _options.LogDebug("found abandoned cache: {0}", dir); | ||
| using var coordinator = new CacheDirectoryCoordinator(dir, _options.DiagnosticLogger, _options.FileSystem); | ||
| if (!coordinator.TryAcquire()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Could we run into a concurrency issue?
E.g.
- this recycler has started and tries to acquire a lock-file
- in the
.ctorwe have also acquired a lock-file already, in a directory below - now we try to move files that contain a lock-file that we still have a reference via
Streamon
Essentially now retrying 3 times without any effect?
But I may misunderstand something there 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock files sit in the parent/root cache directory. The isolated cache folders are all children of that root folder, so none of the operations on the isolated cache folders should affect the locks. I've added a code comment to make this clearer:
sentry-dotnet/src/Sentry/Internal/CacheDirectoryCoordinator.cs
Lines 22 to 23 in ada094b
| // Note this creates a lock file in the cache directory's parent directory... not in the cache directory itself | |
| _lockFilePath = $"{cacheDir}.lock"; |
|
|
When I changed the merge target, it didn't automatically trigger a build, so I triggered this manually: All green 🟢 |
|
@Flash0ver I think the only remaining piece of feedback to address (not sure what happened to it) is how we handle upgrading from previous SDK versions, if people have files in their cache when they do so (since we're changing the cache directory). Options:
I think since this could potentially result in lost client data, it's worth opting for door number 2 here... but only if we're broadly happy with the PR aside from that. I addressed all the review feedback and changed the merge target to version 6. |
Resolves #2033, Resolves #1067
Note
I'm not 100% happy with the structure of this code - it impacted quite a few mostly unrelated tests so it's kind of "leaky" at the moment. @Flash0ver if you've got ideas on alternative ways this could be done, I'm all ears.
Interprocess Locking
This turns out to be surprisingly difficult.
Something like a named Mutex could be used for this. However Mutexes are thread affine so they need to be released on the same thread that acquired them. Other than setting aside a dedicated thread for the lifetime of the Hub, there's no easy way to make that work.
A named Semaphore would be another option, but these aren't supported on macOS.
In the end I went with the "lock file" solution that @lucas-zimerman suggested. Keeping a lock file specific to each cache directory and opening this with FileShare.None gives us a kind of file based named mutex that we can use to avoid having multiple instances of the Hub trying to process files from the same cache directory.