Skip to content

Conversation

@Aaronontheweb
Copy link
Member

Backport to v1.5

This is a backport of #7847 to the v1.5 maintenance branch.

Problem

Akka.Remote server starts successfully even when the application lacks permissions to access the SSL certificate's private key. The server appears healthy but fails when clients attempt to connect, causing:

  • Hard-to-diagnose TLS handshake failures during runtime
  • Silent failures that only appear when connections arrive
  • Poor operational experience for administrators

Solution

Certificate Validation at Startup

Added ValidateCertificate() method to SslSettings that:

  • Checks Certificate.HasPrivateKey
  • Tests both RSA and ECDSA private key access (using GetRSAPrivateKey() and GetECDsaPrivateKey())
  • Throws ConfigurationException with clear error message on failure

Fail-Fast in Listen()

Call validation in Listen() method before server socket binds to ensure fail-fast behavior at startup.

Comprehensive Tests

  • Server fails at startup with inaccessible private key ✅
  • Server starts successfully with valid certificate ✅
  • Server starts successfully without SSL ✅
  • Updated existing tests to validate fail-fast behavior ✅

Changes

Files Modified

  1. Akka.Remote/Transport/DotNetty/DotNettyTransportSettings.cs - Added ValidateCertificate() method
  2. Akka.Remote/Transport/DotNetty/DotNettyTransport.cs - Call validation before server bind
  3. Akka.Remote.Tests/Transport/DotNettyCertificateValidationSpec.cs - New test suite
  4. Akka.Remote.Tests/Transport/DotNettyTlsHandshakeFailureSpec.cs - Updated for fail-fast

Impact

Breaking Change (Expected)

Existing misconfigured deployments will now fail at startup instead of silently starting with broken TLS. This is correct behavior - fail-fast is better than silent failure.

Migration

If ActorSystem fails with:

ConfigurationException: SSL certificate private key exists but cannot be accessed.

Fix: Grant the application user read permissions to the certificate's private key:

$cert = Get-ChildItem Cert:\LocalMachine\My\<thumbprint>
$keyPath = $cert.PrivateKey.CspKeyContainerInfo.UniqueKeyContainerName
$keyFile = Get-ChildItem "$env:ProgramData\Microsoft\Crypto\RSA\MachineKeys\$keyPath"
icacls $keyFile.FullName /grant "DOMAIN\AppUser:R"

Related

Checklist

  • Cherry-picked from dev
  • All changes included
  • Tests included
  • Backward compatible (fail-fast only affects misconfigured systems)

…kkadotnet#7847)

* Fix: Validate SSL certificate private key access at server startup

**Problem**: Akka.Remote server starts successfully even when the application
lacks permissions to access the SSL certificate's private key. The server appears
healthy but fails when clients attempt to connect, making issues hard to diagnose.

**Root Cause**: Certificate loading in DotNettyTransportSettings only validates
that the certificate EXISTS in the Windows certificate store, not whether the
application can ACCESS the private key. Private key access is checked separately
by Windows ACL, which can fail even when Certificate.HasPrivateKey returns true.

**Solution**:
1. Add ValidateCertificate() method to SslSettings class that:
   - Checks Certificate.HasPrivateKey
   - Actually tests private key access with GetRSAPrivateKey() (not just presence)
   - Throws ConfigurationException with clear error message on failure

2. Call validation in Listen() method before server socket binds:
   - Ensures fail-fast behavior at startup
   - Prevents server from running in broken state
   - Provides clear error message for administrators

3. Add comprehensive tests:
   - Server should fail at startup with inaccessible private key
   - Server should start successfully with valid certificate
   - Server should start successfully without SSL

**Impact**:
- Existing misconfigured deployments will now fail at startup (correct behavior)
- Clear error messages guide administrators to fix permissions
- No breaking changes for correctly configured systems
- Related to Freshdesk akkadotnet#538 (BNSF Railway)

Fixes akkadotnet#538

* Update DotNettyTlsHandshakeFailureSpec to validate fail-fast behavior

**Changes**:
1. Renamed first test to `Server_should_fail_at_startup_with_certificate_without_private_key`
   - Now validates that server FAILS AT STARTUP with bad certificate
   - Tests fail-fast behavior instead of runtime TLS handshake failure

2. Removed redundant `Server_side_tls_handshake_failure_should_shutdown_server` test
   - This test validated the OLD (incorrect) behavior where server starts successfully
   - Now impossible with fail-fast validation in place
   - Scenario already covered by the updated first test

3. Kept `Client_side_tls_handshake_failure_should_shutdown_client` unchanged
   - Still valid - tests client-side validation failure
   - Not affected by server startup validation

**Result**: Tests now validate correct fail-fast behavior at server startup

* Add ECDSA private key validation and improve disposal pattern

Addresses review feedback from @Arkatufus:

**Changes**:
1. Check both RSA and ECDSA private keys
   - SslStream supports both RSA and ECDSA certificates
   - GetRSAPrivateKey() returns null for ECDSA certs (and vice versa)
   - Validation now checks both key types to match TLS handler behavior

2. Use `using` statements for proper disposal
   - Prevents resource leaks if exception is thrown
   - Both rsaKey and ecdsaKey are properly disposed
   - Exception-safe resource management

**TLS Handler Relationship**:
The TLS handler uses `TlsHandler.Server(Settings.Ssl.Certificate)` which
internally extracts either RSA or ECDSA private keys via SslStream. Our
validation now matches this behavior by checking both key types.

**Behavior**:
- RSA certificate: GetRSAPrivateKey() succeeds, GetECDsaPrivateKey() returns null ✅
- ECDSA certificate: GetECDsaPrivateKey() succeeds, GetRSAPrivateKey() returns null ✅
- Neither accessible: Both return null, validation fails with clear error ✅
- Permission denied: CryptographicException caught, clear error message ✅
@Aaronontheweb Aaronontheweb added this to the 1.5.52 milestone Oct 2, 2025
@Aaronontheweb Aaronontheweb merged commit 5994efc into akkadotnet:v1.5 Oct 2, 2025
6 of 11 checks passed
@Aaronontheweb Aaronontheweb deleted the backport/tls-certificate-validation-v1.5 branch October 2, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant