bug: Severe Lock Convoy in DecayMap and bbolt backend on High Core CPU under heavy Load

This is not really an unknown "security" issue as any off the shelf solvers (especially GPU powered ones which this is not) on GitHub will be able to achieve this (and thus would all be technically "PoC"s). 

Thus I believe it is more of a performance bug. 

**Describe the bug**

Currently what happens:

- Last exclusive writer calls .Unlock(),
- up to GOMAXPROC pending reader acquisition wake up at once -
- some goroutines piled up on the runtime immediately ask for .Lock() 
- all GOMAXPROC threds serializes themselves in CAS spin loops (or, park but by the time it resolves  the CPU has expended more cycles in total than what would be needed to compute the proof).

Solutions:

1. Quick but mid: Do not do .RUnlock() then immediately .Lock(), it creates synchronization points and causes lock convoys. Defer all decay deletion requests through a channel into one dedicated cleanup coroutine that batch process them.
2. More reliable: Consider using sharded or lock free data structures (i.e. challenge states packed entirely into atomic 64-bit numbers) in the long run for better scaling.

**To Reproduce**

You need a high core processor to reproduce this, but since you have a 7950X3D I have an 7950X, I am pretty certain the situations will be similar.

Hit a local instance with a native solver and notice that the solver is emitting proofs faster than Anubis can respond to requests even under "extreme suspiction" (4).


```
> target/release/simd-mcaptcha live --api-type anubis --host http://localhost:8923/ \
                                                    --n-workers 64 &

You are hitting host http://localhost:8923/, n_workers: 64
[0.0s] proofs accepted: 0, failed: 0, 5s: 0.0pps, 5s_failed: 0.0rps, 0.00% iowait
[5.0s] proofs accepted: 53805, failed: 0, 5s: 10761.0pps, 5s_failed: 0.0rps, 74.91% http_wait
[10.0s] proofs accepted: 108805, failed: 0, 5s: 11000.0pps, 5s_failed: 0.0rps, 74.34% http_wait
[15.0s] proofs accepted: 164656, failed: 0, 5s: 11170.2pps, 5s_failed: 0.0rps, 73.92% http_wait
[20.0s] proofs accepted: 220786, failed: 0, 5s: 11226.0pps, 5s_failed: 0.0rps, 73.65% http_wait
[25.0s] proofs accepted: 277543, failed: 0, 5s: 11351.4pps, 5s_failed: 0.0rps, 73.43% http_wait
[30.0s] proofs accepted: 335189, failed: 0, 5s: 11529.2pps, 5s_failed: 0.0rps, 73.10% http_wait
[35.0s] proofs accepted: 392865, failed: 0, 5s: 11535.2pps, 5s_failed: 0.0rps, 72.80% http_wait
[40.0s] proofs accepted: 450552, failed: 0, 5s: 11537.4pps, 5s_failed: 0.0rps, 72.56% http_wait
[45.0s] proofs accepted: 508698, failed: 0, 5s: 11629.2pps, 5s_failed: 0.0rps, 72.35% http_wait
[50.0s] proofs accepted: 566663, failed: 0, 5s: 11593.0pps, 5s_failed: 0.0rps, 72.20% http_wait
[55.0s] proofs accepted: 624373, failed: 0, 5s: 11542.0pps, 5s_failed: 0.0rps, 72.08% http_wait
[60.0s] proofs accepted: 681909, failed: 0, 5s: 11507.2pps, 5s_failed: 0.0rps, 71.99% http_wait


```

Instrument your server to verify the lock convoy:


```go
func init() {
	runtime.SetMutexProfileFraction(1000) // 1 out of 1000 mutex ops

	go http.ListenAndServe(":9000", nil)
}
```

Collect `go pprof` for a minute while loading it.



**Expected behavior**

Regardless whether native solvers are considered "fair exchange", "bypass", "ethical", "attack", or not (and I do not have any interest in arguing with that). As a "security solution" against mass scrapers it should maintain a load asymmetry (i.e. cost less to book-keep the challenges than for the client to solve challenges) under any circumstance.

**Screenshots**

This is classic wakeup "thundering herd" where every write->read transition lock up the runtime:
 
<img width="2796" height="1234" alt="Image" src="https://github.com/user-attachments/assets/0f93fcd5-8714-4e8d-8ca8-f61c094db647" />


Server version: v1.22.0-12-g9430d0e

**Additional context**


Policy deployed: 

```yaml
thresholds:
  # For clients that are browser like and have gained many points from custom rules
  - name: extreme-suspicion
    expression: weight >= 0
    action: CHALLENGE
    challenge:
      # https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
      algorithm: fast
      difficulty: 4
      report_as: 4

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bug: Severe Lock Convoy in DecayMap and bbolt backend on High Core CPU under heavy Load #1103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug: Severe Lock Convoy in DecayMap and bbolt backend on High Core CPU under heavy Load #1103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions