Skip to content

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Oct 14, 2025

  • do not use testing.Context() for topologytests, since it gets cancelled before t.Cleanup() runs
  • broadcast btcc._seqCond.Broadcast() to avoid a lock and a potential hang in between if ctx.Err() != nil and _seqCond.Wait()
  • add cancel cause to see who was cancelling the contexts

98% of the test failures were from testing.Context() in couchbase_lite_mock_peer_test.go but running the tests a large number of times showed that there was a case where there was a hang in this (old) code:

for btcc._seqLast <= since {
			if ctx.Err() != nil {
				btcc.seqLock.Unlock()
				return
			}
			// block until new seq
			base.DebugfCtx(ctx, base.KeySGTest, "OneShotChangesSince: since=%d, _seqLast=%d - waiting for new sequence", since, btcc._seqLast)
             // ERROR: ctx is canceled and _seqCond.Broadcast() is called before the _seqCond.Wait(), so the following line blocks.
			btcc._seqCond.Wait()
			// Check to see if we were woken because of Close()
			if ctx.Err() != nil {
				btcc.seqLock.Unlock()
				return
			}

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

- do not use testing.Context() for topologytests, since it gets cancelled before t.Cleanup() runs
- switch BlipTesterCollectionClient._seqCond to be channels to avoid a lock and a potential hang in between if ctx.Err()
  != nil and _seqCond.Wait()
@Copilot Copilot AI review requested due to automatic review settings October 14, 2025 16:55
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses test flakiness in topology tests by fixing context cancellation issues and replacing sync.Cond with a channel-based notification system to prevent race conditions in push replication. The primary issue was that testing.Context() was being cancelled before t.Cleanup() runs, causing test failures, and a potential hang between context cancellation checks and condition variable waits.

Key changes:

  • Replaced testing.Context() with context.WithoutCancel() to prevent premature cancellation
  • Refactored BlipTesterCollectionClient to use channels instead of sync.Cond for sequence notifications
  • Added cancel causes for better debugging of context cancellations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
topologytest/couchbase_lite_mock_peer_test.go Uses context.WithoutCancel() to prevent test context cancellation issues
rest/utilities_testing_blip_client.go Major refactor replacing sync.Cond with channel-based notifications and adding cancel causes

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +1526 to 1530
// reawaken any waiting push loops to check for cancellation,
// This is a workaround for a race between ctx.Err() != nil check and btcc._seqCond.Wait()
btcc._seqCond.Broadcast()
}, 20*time.Second, 10*time.Millisecond)

Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Broadcast() call is executed on every polling iteration (every 10ms for up to 20 seconds). This could lead to unnecessary overhead. Consider moving the Broadcast() call outside the polling loop or adding a mechanism to only call it when needed.

Suggested change
// reawaken any waiting push loops to check for cancellation,
// This is a workaround for a race between ctx.Err() != nil check and btcc._seqCond.Wait()
btcc._seqCond.Broadcast()
}, 20*time.Second, 10*time.Millisecond)
// No need to broadcast on every iteration; a single broadcast before and after the loop suffices.
}, 20*time.Second, 10*time.Millisecond)
// Final broadcast to ensure all waiters are woken up after the loop
btcc._seqCond.Broadcast()

Copilot uses AI. Check for mistakes.

@torcolvin torcolvin merged commit d22c958 into main Oct 15, 2025
42 checks passed
@torcolvin torcolvin deleted the CBG-4926 branch October 15, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants