Skip to content

Conversation

@mratsim
Copy link
Owner

@mratsim mratsim commented Feb 8, 2022

This implements fast constant-time modular inversion.

Preliminary benchmarks, without Assembly

image
image
image
image
image
image

On BLS12-381, this is almost 8x faster than Niels Möller algorithm (constant-time inversion in GMP) and Fermat's Little Theorem inversion with addition chains.

@mratsim
Copy link
Owner Author

mratsim commented Feb 8, 2022

Discussion of chosen algorithm

There are 3 papers on fast inversion in the past 3 years:

Bernstein-Yang inversion:

Pornin's inversion:

Discussion

This PR implements Bernstein-Yang inversion, there is a sketch of Pornin's inversion at:

Correctly and efficiently implementing Pornin's for generic primes is actually tricky:

  • L22: (u, v) ← (uf₀ + vg₀ mod m, uf₁ + vg₁ mod m)
    This requires efficient modular reduction. This is true for Generalized Mersenne Primes
    like secp256k1 or ED25519 but not BLS12-381.
    Given that Pornin's approach uses divsteps 31 instead of Bernstein 62 (on 64-bit)
    a slow reduction will have twice the impact.
  • BLST's authors delayed the modular reduction but this triggered
    an edge case in fuzzing: supranational/blst@fd45352#commitcomment-66068518
    In the past there was another edge case raised:
  • An efficient implementation requires:
    1. Assembly for cmov in inner loop, leading zero count
    1. fast or delayed/batched modular reduction
    2. an extra bit in the high word for negative integers, making it unsuitable for secp256k1 or P256
      when using a saturated representation.

In particular the inner loop needs to be as streamlined as possible, the lack of cmov and lzcount being platform-dependent makes the inner loop slow in pure Nim/C.
Regarding point 2, delayed/batched modular reduction alone can be done, however Pornin's method relies on an approximation of inputs that needs to be corrected at regular interval and at the computation end. Given the edge cases that popped up in BLST, delaying modular reduction AND correcting the approximation AND doing that constant-time seems fraught with peril.

@mratsim mratsim linked an issue Feb 10, 2022 that may be closed by this pull request
@mratsim mratsim merged commit 53c4db7 into master Feb 10, 2022
@mratsim mratsim deleted the fast-inv branch February 11, 2022 10:39
This was referenced Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement fast inversion for public data

2 participants