Skip to content

Conversation

@jamescrosswell
Copy link
Collaborator

@jamescrosswell jamescrosswell commented Sep 5, 2025

Resolves #2033, Resolves #1067

Note

I'm not 100% happy with the structure of this code - it impacted quite a few mostly unrelated tests so it's kind of "leaky" at the moment. @Flash0ver if you've got ideas on alternative ways this could be done, I'm all ears.

Interprocess Locking

This turns out to be surprisingly difficult.

Something like a named Mutex could be used for this. However Mutexes are thread affine so they need to be released on the same thread that acquired them. Other than setting aside a dedicated thread for the lifetime of the Hub, there's no easy way to make that work.

A named Semaphore would be another option, but these aren't supported on macOS.

In the end I went with the "lock file" solution that @lucas-zimerman suggested. Keeping a lock file specific to each cache directory and opening this with FileShare.None gives us a kind of file based named mutex that we can use to avoid having multiple instances of the Hub trying to process files from the same cache directory.

@jamescrosswell jamescrosswell marked this pull request as ready for review September 11, 2025 22:25
cursor[bot]

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Sep 11, 2025

Codecov Report

❌ Patch coverage is 74.62687% with 34 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (version6@130650d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/Sentry/Internal/Http/CachingTransport.cs 61.11% 17 Missing and 4 partials ⚠️
src/Sentry/Internal/CacheDirectoryCoordinator.cs 81.96% 9 Missing and 2 partials ⚠️
src/Sentry/Internal/ReadOnlyFilesystem.cs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             version6    #4498   +/-   ##
===========================================
  Coverage            ?   76.36%           
===========================================
  Files               ?      392           
  Lines               ?    14741           
  Branches            ?     2969           
===========================================
  Hits                ?    11257           
  Misses              ?     2772           
  Partials            ?      712           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bruno-garcia
Copy link
Member

@sentry review

Comment on lines +86 to +88
{
throw new InvalidOperationException("Cache directory already locked.");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache coordinator acquisition failure throws an InvalidOperationException, but this could make it impossible to create multiple SDK instances in some scenarios. Consider making this non-fatal and falling back to a non-cached transport or using a different isolation strategy.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could this be a severe issue?

But was should happen if this fails?
I believe the code path comes from creating a SentryClient via the Hub.

But we don't have too much control in this CachingTransport-.ctor here.

Hmmm ...
I wonder if we should avoid trying to Acquire the Lock-File in the .ctor,
but instead in the Initialize ... or the Worker-Task ... but then the error may happen quite late.

🤔 something is troubling me but I can't quite put my finger on it just yet 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the SDK currently works... so I don't believe it's an issue. If it is, I think we could address in a separate PR.

_isolatedCacheDirectoryPath =
options.TryGetIsolatedCacheDirectoryPath() ??
throw new InvalidOperationException("Cache directory or DSN is not set.");

Comment on lines +31 to +46
/// <remarks>
/// This method can throw all of the same exceptions that the <see cref="FileStream"/> constructor can throw. The
/// caller is responsible for handling those exceptions.
/// </remarks>
public override bool TryCreateLockFile(string path, out Stream fileStream)
{
// Note that FileShare.None is implemented via advisory locks only on macOS/Linux... so it will stop
// other .NET processes from accessing the file but not other non-.NET processes. This should be fine
// in our case - we just want to avoid multiple instances of the SDK concurrently accessing the cache
fileStream = new FileStream(
path,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None);
return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TryCreateLockFile method can throw all FileStream constructor exceptions but always returns true. This is inconsistent with the interface design pattern where 'Try' methods typically return false on failure rather than throwing. Consider catching exceptions and returning false, or rename the method to indicate it can throw.

Suggested change
/// <remarks>
/// This method can throw all of the same exceptions that the <see cref="FileStream"/> constructor can throw. The
/// caller is responsible for handling those exceptions.
/// </remarks>
public override bool TryCreateLockFile(string path, out Stream fileStream)
{
// Note that FileShare.None is implemented via advisory locks only on macOS/Linux... so it will stop
// other .NET processes from accessing the file but not other non-.NET processes. This should be fine
// in our case - we just want to avoid multiple instances of the SDK concurrently accessing the cache
fileStream = new FileStream(
path,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None);
return true;
}
public override bool TryCreateLockFile(string path, out Stream fileStream)
{
try
{
fileStream = new FileStream(
path,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None);
return true;
}
catch
{
fileStream = Stream.Null;
return false;
}
}

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the behaviour for CreateFileStream and makes it easier to surface errors to callers... so I'd rather stick with the API we have.

Comment on lines 2204 to 2206
if (options.Transport is CachingTransport cachingTransport)
{
cachingTransport.Dispose(); // Release cache lock so that the cacheDirectory can be removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Don't we need to do this for every test?

The method CapturesEventWithContextKey_Implementation does it too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to do it when options.CacheDirectoryPath is set... it's just to ensure the cache dir gets cleaned up. Most tests aren't using a cache dir.

var cachingTransport = Assert.IsType<CachingTransport>(_fixture.SentryOptions.Transport);
_ = Assert.IsType<FakeTransport>(cachingTransport.InnerTransport);

cachingTransport.Dispose(); // Release cache lock so that the cacheDirectory can be removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: don't we need to do this on a Test-Class level instead, so that it applies to every Test-Method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only necessary because we set CacheDirectoryPath on line 1628.

It's a bit messy but only really a problem in tests, I think... SDK users won't need to worry about this.

Comment on lines +86 to +88
{
throw new InvalidOperationException("Cache directory already locked.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could this be a severe issue?

But was should happen if this fails?
I believe the code path comes from creating a SentryClient via the Hub.

But we don't have too much control in this CachingTransport-.ctor here.

Hmmm ...
I wonder if we should avoid trying to Acquire the Lock-File in the .ctor,
but instead in the Initialize ... or the Worker-Task ... but then the error may happen quite late.

🤔 something is troubling me but I can't quite put my finger on it just yet 🤔


_options.LogDebug("found abandoned cache: {0}", dir);
using var coordinator = new CacheDirectoryCoordinator(dir, _options.DiagnosticLogger, _options.FileSystem);
if (!coordinator.TryAcquire())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could we run into a concurrency issue?

E.g.

  • this recycler has started and tries to acquire a lock-file
  • in the .ctor we have also acquired a lock-file already, in a directory below
  • now we try to move files that contain a lock-file that we still have a reference via Stream on

Essentially now retrying 3 times without any effect?
But I may misunderstand something there 🤔

Copy link
Collaborator Author

@jamescrosswell jamescrosswell Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock files sit in the parent/root cache directory. The isolated cache folders are all children of that root folder, so none of the operations on the isolated cache folders should affect the locks. I've added a code comment to make this clearer:

// Note this creates a lock file in the cache directory's parent directory... not in the cache directory itself
_lockFilePath = $"{cacheDir}.lock";

@jamescrosswell jamescrosswell changed the base branch from main to version6 November 5, 2025 01:04
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against efaa83e

@jamescrosswell
Copy link
Collaborator Author

When I changed the merge target, it didn't automatically trigger a build, so I triggered this manually:
https://github.com/getsentry/sentry-dotnet/actions/runs/19088039748

All green 🟢

@jamescrosswell
Copy link
Collaborator Author

@Flash0ver I think the only remaining piece of feedback to address (not sure what happened to it) is how we handle upgrading from previous SDK versions, if people have files in their cache when they do so (since we're changing the cache directory).

Options:

  1. Deal with it via upgrade docs, in the Breaking Changes section of the changelog
  2. Modify the code to automatically import envelopes from the cache root directory (we could maybe remove that code in a future v7 release).

I think since this could potentially result in lost client data, it's worth opting for door number 2 here... but only if we're broadly happy with the PR aside from that.

I addressed all the review feedback and changed the merge target to version 6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Errors when initializing Sentry twice with caching enabled Offline caching should support concurrent process instances

5 participants