Skip to content

Conversation

@ravi100k
Copy link
Collaborator

@ravi100k ravi100k commented Sep 3, 2025

  1. Fixed k8s version.
  2. Fixed acquire lock and release lock for csi driver.
  3. If error while acquiring lock restart the pod.

1. Fixed k8s version.
2. Fixed aquire lock and release lock for csi driver.
3. Added 10 min function call to relase the lock which might be stuck for more than 30 min.
@ravi100k ravi100k requested a review from trondmy-hs September 3, 2025 12:59
@ravi100k ravi100k self-assigned this Sep 3, 2025
@lpabon
Copy link

lpabon commented Sep 3, 2025

Please add unit tests

@ravi100k ravi100k requested a review from lpabon September 4, 2025 05:03
Ravi Kumar added 3 commits September 4, 2025 18:28
1. Fixed old test case.
2. Added new test case for aquiring lock and its clean up.
1. Added 30 sec time for aquire the lock, if not then crash and start fresh.
1. Removed atomic and unsued code and simplified code.
lctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()

if err := lk.lock(lctx); err != nil {
Copy link

@lpabon lpabon Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is good, but you do not need to implement your own locks using semaphores or using ctx to be able to crash to free the lock. You can use

func lock(mu *sync.Mutex, timeout time.Duration) bool {
	// Create a channel to signal that the lock has been acquired.
	// A buffered channel of size 1 is used to prevent the goroutine
	// from blocking if the lock is acquired immediately.
	locked := make(chan struct{}, 1)

	// Start a new goroutine to acquire the lock.
	go func() {
		mu.Lock()
		locked <- struct{}{} // Signal that the lock is held.
	}()

	// Use a select statement to wait for either the lock signal or the timeout.
	select {
	case <-locked:
		// We received the signal, meaning the lock was acquired.
		return true
	case <-time.After(timeout):
		// The timeout duration passed without acquiring the lock.
		return false
                // return false or crash right here

	}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored the sync.Mutex + goroutine + channel + timeout approach. While it looks simpler, there are a couple of pitfalls in Go:

Ownership mismatch: the goroutine calling mu.Lock() is the one that owns the lock, but the caller would be the one calling mu.Unlock(). In Go this is undefined behavior and can panic.

Goroutine leaks: if the timeout path fires, the spawned goroutine may still eventually acquire the lock. Since no one will Unlock() it, the lock remains held forever, leading to deadlock.

Go semantics: this is one reason Go’s sync.Mutex doesn’t provide TryLock or timeout APIs — it’s very easy to misuse safely.

With the current semaphore.Weighted + ctx.WithTimeout approach, the caller that acquires the lock is always the one releasing it, and timeouts are handled cleanly without leaks. This keeps the ownership semantics correct and avoids the goroutine problem.


In Go, the goroutine that calls mu.Lock() is the one expected to later call mu.Unlock().
That’s how sync.Mutex is designed — lock ownership isn’t transferable.

But in here

func lock(mu *sync.Mutex, timeout time.Duration) bool {
    locked := make(chan struct{}, 1)

    go func() {
        mu.Lock()           // <--- Goroutine acquires the lock
        locked <- struct{}{}
    }()

    select {
    case <-locked:
        return true
    case <-time.After(timeout):
        return false
    }
}

The goroutine acquires the lock (mu.Lock()).

The caller will later try to unlock:

if lock(&mu, 1*time.Second) {
    defer mu.Unlock()  // <-- different goroutine calls Unlock
}

This is undefined behavior in Go — the runtime can panic because the goroutine that locked the mutex is not the one unlocking it.

@ravi100k ravi100k requested a review from lpabon September 19, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants