Skip to content

subscribe: true flag in PUT operations doesn't work correctly for initiating peers #1765

@sanity

Description

@sanity

subscribe: true flag in PUT operations doesn't work correctly for initiating peers

Problem Description

When a client performs a PUT operation with subscribe: true, the operation times out because the subscription logic blocks the PUT response from being sent. This prevents applications like River from successfully creating contracts and causes the entire operation to fail.

Root Cause: Blocking Subscription Logic

The fundamental issue: In /crates/core/src/operations/put.rs, when handling subscribe: true, the code calls start_subscription_request() which can initiate a GET operation that blocks, preventing the PUT operation from completing and sending its response to the client.

Detailed Analysis

  1. Blocking Call Chain:

    // In put.rs line ~490
    if subscribe {
        super::start_subscription_request(op_manager, key, false, HashSet::new()).await;
    }
    new_state = Some(PutState::Finished { key });  // This happens AFTER subscription
  2. The subscription request can trigger a GET:

    • start_subscription_request() may need to fetch the contract if not locally available
    • This GET operation can timeout (especially in distributed scenarios)
    • While waiting for the GET, the PUT operation cannot transition to Finished state
    • Without the Finished state, no response is sent to the client
    • Client times out waiting for PUT response
  3. Architectural Constraint:

    • OpManager has lifetime constraints that prevent spawning the subscription as a truly independent async task
    • Cannot use tokio::spawn due to 'static lifetime requirements
    • The subscription logic is inherently tied to the PUT operation's lifetime

Current Behavior (Causes Timeout)

When a peer initiates a PUT with subscribe: true:

  1. Client sends PUT request with subscribe: true to local peer
  2. Local peer processes PUT and stores contract
  3. BLOCKS: Local peer attempts subscription which may trigger GET operation
  4. GET operation times out or takes too long
  5. PUT never transitions to Finished state
  6. Client never receives PUT response
  7. Client times out (e.g., "Timeout waiting for PUT response after 10 seconds")

Evidence from Testing

Gateway Test Framework Results

Starting gateway test framework with local build
Testing River multi-user chat:
- Room creation (PUT with subscribe:true): TIMEOUT
- After changing to subscribe:false: Still TIMEOUT (reveals deeper issue)

Direct Testing

Modified /crates/core/src/operations/put.rs to move state transition before subscription:

// Mark operation as finished BEFORE subscription
new_state = Some(PutState::Finished { key });

// Start subscription if requested - do this AFTER marking as finished
if subscribe {
    // Subscription logic here (still partially blocking but PUT completes)
}

Result: PUT response is sent, but subscription may not complete properly.

Why This Wasn't Caught Earlier

  1. Test Mode Differences:

    • Unit tests may not experience the same network delays
    • freenet local mode has different code paths than freenet network
    • Integration tests might use different timeout values
  2. Subscription Complexity:

    • The subscription logic involves multiple async operations
    • Race conditions and timing issues are environment-dependent
    • Gateway setups have additional network latency

Attempted Fixes

Fix 1: Reorder Operations (Partial Success)

Move state transition before subscription:

  • ✅ PUT response is sent immediately
  • ⚠️ Subscription may not complete properly
  • ⚠️ Still architecturally problematic

Fix 2: Async Subscription (Failed)

Attempt to spawn subscription as independent task:

  • ❌ Cannot use tokio::spawn due to OpManager lifetime
  • ❌ OpManager not 'static, contains non-Send types

Fix 3: Local-Only Subscription (Insufficient)

Only register local subscription without network request:

  • ✅ No blocking
  • ❌ Peer not registered in remote subscription tree
  • ❌ Won't receive updates from other peers

Recommended Solution

Short-term Workaround

Applications should avoid subscribe: true and instead:

// 1. PUT without subscribe
let put_request = ContractRequest::Put {
    contract: contract_container,
    state: wrapped_state,
    subscribe: false,  // Avoid blocking issue
};

// 2. After successful PUT, explicitly SUBSCRIBE
let subscribe_request = ContractRequest::Subscribe {
    key: contract_key,
    summary: None,
};

Long-term Fix Options

  1. Redesign Subscription Architecture:

    • Decouple subscription from PUT operation lifecycle
    • Use message passing to trigger subscription after PUT completes
    • Requires significant architectural changes
  2. Queue-Based Approach:

    • Queue subscription requests to be processed after PUT completes
    • Add a subscription queue to OpManager
    • Process queue periodically or after operation completion
  3. Two-Phase PUT:

    • Phase 1: Store contract and send response
    • Phase 2: Background subscription (fire-and-forget)
    • Accept that subscription might fail silently

Impact

  • River chat: Room creation times out, messages cannot be sent
  • Any app using subscribe: true: PUT operations timeout
  • Performance: 10+ second timeouts degrade user experience
  • Reliability: Subscription state inconsistent between peers

Related Code Locations

  • /crates/core/src/operations/put.rs - Lines 490-551 (blocking subscription)
  • /crates/core/src/operations/subscribe.rs - Line 70 (requires local contract)
  • /crates/core/src/op_storage/mod.rs - OpManager lifetime constraints
  • /crates/core/src/client_events/mod.rs - Client response handling

Test Case

Added test in /crates/core/tests/operations.rs:

#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn test_put_subscribe_enables_update() -> TestResult {
    // Test verifies PUT with subscribe:true enables UPDATE
    // Currently passes in network mode but River still fails
}

Questions for Core Team

  1. Is the blocking behavior of subscribe: true intentional or a bug?
  2. Can OpManager be refactored to support spawning detached tasks?
  3. Should subscribe: true be deprecated in favor of explicit SUBSCRIBE?
  4. Is there a way to make subscription truly non-blocking without architectural changes?

Current Status

  • Immediate issue identified: Subscription blocks PUT response
  • Workaround implemented in River (uses subscribe:false + explicit SUBSCRIBE)
  • Core issue requires architectural decision from team
  • Tests added but full River functionality still failing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions