Skip to content

bug: Severe Lock Convoy in DecayMap and bbolt backend on High Core CPU under heavy Load #1103

@eternal-flame-AD

Description

@eternal-flame-AD

This is not really an unknown "security" issue as any off the shelf solvers (especially GPU powered ones which this is not) on GitHub will be able to achieve this (and thus would all be technically "PoC"s).

Thus I believe it is more of a performance bug.

Describe the bug

Currently what happens:

  • Last exclusive writer calls .Unlock(),
  • up to GOMAXPROC pending reader acquisition wake up at once -
  • some goroutines piled up on the runtime immediately ask for .Lock()
  • all GOMAXPROC threds serializes themselves in CAS spin loops (or, park but by the time it resolves the CPU has expended more cycles in total than what would be needed to compute the proof).

Solutions:

  1. Quick but mid: Do not do .RUnlock() then immediately .Lock(), it creates synchronization points and causes lock convoys. Defer all decay deletion requests through a channel into one dedicated cleanup coroutine that batch process them.
  2. More reliable: Consider using sharded or lock free data structures (i.e. challenge states packed entirely into atomic 64-bit numbers) in the long run for better scaling.

To Reproduce

You need a high core processor to reproduce this, but since you have a 7950X3D I have an 7950X, I am pretty certain the situations will be similar.

Hit a local instance with a native solver and notice that the solver is emitting proofs faster than Anubis can respond to requests even under "extreme suspiction" (4).

> target/release/simd-mcaptcha live --api-type anubis --host http://localhost:8923/ \
                                                    --n-workers 64 &

You are hitting host http://localhost:8923/, n_workers: 64
[0.0s] proofs accepted: 0, failed: 0, 5s: 0.0pps, 5s_failed: 0.0rps, 0.00% iowait
[5.0s] proofs accepted: 53805, failed: 0, 5s: 10761.0pps, 5s_failed: 0.0rps, 74.91% http_wait
[10.0s] proofs accepted: 108805, failed: 0, 5s: 11000.0pps, 5s_failed: 0.0rps, 74.34% http_wait
[15.0s] proofs accepted: 164656, failed: 0, 5s: 11170.2pps, 5s_failed: 0.0rps, 73.92% http_wait
[20.0s] proofs accepted: 220786, failed: 0, 5s: 11226.0pps, 5s_failed: 0.0rps, 73.65% http_wait
[25.0s] proofs accepted: 277543, failed: 0, 5s: 11351.4pps, 5s_failed: 0.0rps, 73.43% http_wait
[30.0s] proofs accepted: 335189, failed: 0, 5s: 11529.2pps, 5s_failed: 0.0rps, 73.10% http_wait
[35.0s] proofs accepted: 392865, failed: 0, 5s: 11535.2pps, 5s_failed: 0.0rps, 72.80% http_wait
[40.0s] proofs accepted: 450552, failed: 0, 5s: 11537.4pps, 5s_failed: 0.0rps, 72.56% http_wait
[45.0s] proofs accepted: 508698, failed: 0, 5s: 11629.2pps, 5s_failed: 0.0rps, 72.35% http_wait
[50.0s] proofs accepted: 566663, failed: 0, 5s: 11593.0pps, 5s_failed: 0.0rps, 72.20% http_wait
[55.0s] proofs accepted: 624373, failed: 0, 5s: 11542.0pps, 5s_failed: 0.0rps, 72.08% http_wait
[60.0s] proofs accepted: 681909, failed: 0, 5s: 11507.2pps, 5s_failed: 0.0rps, 71.99% http_wait


Instrument your server to verify the lock convoy:

func init() {
	runtime.SetMutexProfileFraction(1000) // 1 out of 1000 mutex ops

	go http.ListenAndServe(":9000", nil)
}

Collect go pprof for a minute while loading it.

Expected behavior

Regardless whether native solvers are considered "fair exchange", "bypass", "ethical", "attack", or not (and I do not have any interest in arguing with that). As a "security solution" against mass scrapers it should maintain a load asymmetry (i.e. cost less to book-keep the challenges than for the client to solve challenges) under any circumstance.

Screenshots

This is classic wakeup "thundering herd" where every write->read transition lock up the runtime:

Image

Server version: v1.22.0-12-g9430d0e

Additional context

Policy deployed:

thresholds:
  # For clients that are browser like and have gained many points from custom rules
  - name: extreme-suspicion
    expression: weight >= 0
    action: CHALLENGE
    challenge:
      # https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
      algorithm: fast
      difficulty: 4
      report_as: 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions