-
-
Notifications
You must be signed in to change notification settings - Fork 436
Description
This is not really an unknown "security" issue as any off the shelf solvers (especially GPU powered ones which this is not) on GitHub will be able to achieve this (and thus would all be technically "PoC"s).
Thus I believe it is more of a performance bug.
Describe the bug
Currently what happens:
- Last exclusive writer calls .Unlock(),
- up to GOMAXPROC pending reader acquisition wake up at once -
- some goroutines piled up on the runtime immediately ask for .Lock()
- all GOMAXPROC threds serializes themselves in CAS spin loops (or, park but by the time it resolves the CPU has expended more cycles in total than what would be needed to compute the proof).
Solutions:
- Quick but mid: Do not do .RUnlock() then immediately .Lock(), it creates synchronization points and causes lock convoys. Defer all decay deletion requests through a channel into one dedicated cleanup coroutine that batch process them.
- More reliable: Consider using sharded or lock free data structures (i.e. challenge states packed entirely into atomic 64-bit numbers) in the long run for better scaling.
To Reproduce
You need a high core processor to reproduce this, but since you have a 7950X3D I have an 7950X, I am pretty certain the situations will be similar.
Hit a local instance with a native solver and notice that the solver is emitting proofs faster than Anubis can respond to requests even under "extreme suspiction" (4).
> target/release/simd-mcaptcha live --api-type anubis --host http://localhost:8923/ \
--n-workers 64 &
You are hitting host http://localhost:8923/, n_workers: 64
[0.0s] proofs accepted: 0, failed: 0, 5s: 0.0pps, 5s_failed: 0.0rps, 0.00% iowait
[5.0s] proofs accepted: 53805, failed: 0, 5s: 10761.0pps, 5s_failed: 0.0rps, 74.91% http_wait
[10.0s] proofs accepted: 108805, failed: 0, 5s: 11000.0pps, 5s_failed: 0.0rps, 74.34% http_wait
[15.0s] proofs accepted: 164656, failed: 0, 5s: 11170.2pps, 5s_failed: 0.0rps, 73.92% http_wait
[20.0s] proofs accepted: 220786, failed: 0, 5s: 11226.0pps, 5s_failed: 0.0rps, 73.65% http_wait
[25.0s] proofs accepted: 277543, failed: 0, 5s: 11351.4pps, 5s_failed: 0.0rps, 73.43% http_wait
[30.0s] proofs accepted: 335189, failed: 0, 5s: 11529.2pps, 5s_failed: 0.0rps, 73.10% http_wait
[35.0s] proofs accepted: 392865, failed: 0, 5s: 11535.2pps, 5s_failed: 0.0rps, 72.80% http_wait
[40.0s] proofs accepted: 450552, failed: 0, 5s: 11537.4pps, 5s_failed: 0.0rps, 72.56% http_wait
[45.0s] proofs accepted: 508698, failed: 0, 5s: 11629.2pps, 5s_failed: 0.0rps, 72.35% http_wait
[50.0s] proofs accepted: 566663, failed: 0, 5s: 11593.0pps, 5s_failed: 0.0rps, 72.20% http_wait
[55.0s] proofs accepted: 624373, failed: 0, 5s: 11542.0pps, 5s_failed: 0.0rps, 72.08% http_wait
[60.0s] proofs accepted: 681909, failed: 0, 5s: 11507.2pps, 5s_failed: 0.0rps, 71.99% http_wait
Instrument your server to verify the lock convoy:
func init() {
runtime.SetMutexProfileFraction(1000) // 1 out of 1000 mutex ops
go http.ListenAndServe(":9000", nil)
}Collect go pprof for a minute while loading it.
Expected behavior
Regardless whether native solvers are considered "fair exchange", "bypass", "ethical", "attack", or not (and I do not have any interest in arguing with that). As a "security solution" against mass scrapers it should maintain a load asymmetry (i.e. cost less to book-keep the challenges than for the client to solve challenges) under any circumstance.
Screenshots
This is classic wakeup "thundering herd" where every write->read transition lock up the runtime:
Server version: v1.22.0-12-g9430d0e
Additional context
Policy deployed:
thresholds:
# For clients that are browser like and have gained many points from custom rules
- name: extreme-suspicion
expression: weight >= 0
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
algorithm: fast
difficulty: 4
report_as: 4