-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Milestone
Description
This issue looks similar to #3870, however it happens when using latest version of Akka.NET.
OS: Windows Server 2016
Platform: .NET Core 3.1
Akka.NET packages: 1.4.0-beta14 (used in a cluster)
Scenario:
-
Akka.Persistence.SqlServer.Journal.BatchingSqlServerJournal raises exception with message "Circuit Breaker is open; calls are failing fast", most likely due to a temporary db outage
-
Attempt to recover state of some persistent actors fail with RecoveryTimedOutException. Here's a typical sequence of events, taken from our log:
Started (Akka.Pattern.BackoffOnRestartSupervisor)
now supervising akka://Oddjob/system/sharding/upload/0/ps~msui30002111/msc:ps~msui30002111
now watched by [akka://Oddjob/system/sharding/upload/0/ps~msui30002111#1585193596]
now watched by [akka://Oddjob/system/recoveryPermitter#1099929798
Spawned MediaSetController actor
now watched by [akka://Oddjob/system/sharding/upload/0#1224240942]
Started (Akkling.Persistence.FunPersistentActor`1[System.Object])
Restoring state from snapshot
(after 1 minute)
["", null, "Akka.Persistence.RecoveryTimedOutException: Recovery timed out, didn't get event within 60s, highest sequence number seen 312."] {AckInfo} {Exception}
Passivating started on entity "ps~msui30002111"
received AutoReceiveMessage <Terminated>: [akka://Oddjob/system/sharding/upload/0/ps~msui30002111#1585193596] - ExistenceConfirmed=True
Entity stopped after passivation ["ps~msui30002111"]
- Once a persistent actor fails with such exception, it is stuck until the system is restarted. Other actors may be successfully recovered.