Skip to content

Some persistent actors are stuck with RecoveryTimedOutException after circuit breaker opens #4265

@object

Description

@object

This issue looks similar to #3870, however it happens when using latest version of Akka.NET.

OS: Windows Server 2016
Platform: .NET Core 3.1
Akka.NET packages: 1.4.0-beta14 (used in a cluster)

Scenario:

  1. Akka.Persistence.SqlServer.Journal.BatchingSqlServerJournal raises exception with message "Circuit Breaker is open; calls are failing fast", most likely due to a temporary db outage

  2. Attempt to recover state of some persistent actors fail with RecoveryTimedOutException. Here's a typical sequence of events, taken from our log:

Started (Akka.Pattern.BackoffOnRestartSupervisor)
now supervising akka://Oddjob/system/sharding/upload/0/ps~msui30002111/msc:ps~msui30002111
now watched by [akka://Oddjob/system/sharding/upload/0/ps~msui30002111#1585193596]
now watched by [akka://Oddjob/system/recoveryPermitter#1099929798
Spawned MediaSetController actor
now watched by [akka://Oddjob/system/sharding/upload/0#1224240942]
Started (Akkling.Persistence.FunPersistentActor`1[System.Object])
Restoring state from snapshot
(after 1 minute)
["", null, "Akka.Persistence.RecoveryTimedOutException: Recovery timed out, didn't get event within 60s, highest sequence number seen 312."] {AckInfo} {Exception}
Passivating started on entity "ps~msui30002111"
received AutoReceiveMessage <Terminated>: [akka://Oddjob/system/sharding/upload/0/ps~msui30002111#1585193596] - ExistenceConfirmed=True
Entity stopped after passivation ["ps~msui30002111"]

  1. Once a persistent actor fails with such exception, it is stuck until the system is restarted. Other actors may be successfully recovered.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions