Avoid division when decomposing scalars #21

peterdettman · 2014-05-25T14:49:47Z

In secp256k1_gej_split_exp, there are two divisions used. Since the denominator is a constant known at compile-time, each can be replaced by a multiplication followed by a right-shift (and rounding).
This change adds the constants g1, g2 for this purpose and rewrites secp256k1_gej_split_exp accordingly.

The technique is discussed, amongst other places, in the paper "Efficient Software Implementation of Public-Key Cryptography on Sensor Networks Using the MSP430X Microcontroller" (Gouvea, Oliveira, Lopez) - in section 4.3.

sipa · 2014-06-02T17:27:55Z

Nice, this removes the need for secp256k1_num_div entirely.

sipa · 2014-06-15T23:11:22Z

Can you:

Remove the secp256k1_num_div function and its implementations.
Add some commentary about how you came up with the numbers?

peterdettman · 2014-06-21T02:36:03Z

@sipa Apologies for the delay, I'm a bit pushed for time at the moment. I'll return to these pull requests hopefully later this week.

peterdettman · 2014-06-24T04:21:00Z

I've added some comments and removed _num_div, but not ready for merge yet as it's not giving the expected performance improvement. Bear with me while I investigate!

peterdettman · 2014-06-24T09:54:03Z

The split itself is a lot faster, but there's some kind of one-off-error vs the original "bnc1/bnc2" values on a small proportion of inputs, so looking into that now.

peterdettman · 2014-07-16T13:27:15Z

This one's got me a bit confused. Not the "one-off-error", which was actually just the rough edges of what is inherently an approximation (I've increased the precision so now the split is virtually always exactly the same as the old code). Rather, I can't see that performance (of bench) is actually improved.

Naively an improvement is expected since two divisions are replaced by simple truncations (right-shift). If the split code is wrapped in a (pointless) loop of say 10 iterations to try and give it a heavier "weight", then the difference is pretty clear: the new split is something like 1.8x as fast, and it shows in the bench results. However, for the unmodified bench, I think I am measuring the new code as very slightly slower (it's hard to pick the signal out from the noise here though).

So, I'm thinking it has something to do with needing to load g1/g2 (vs 'order') at each call i.e. some sort of memory/cacheing problem. This seems plausible as the split is only needed once per scalar mult, and so the constants could quite well be getting evicted between split calls. (Note that 'order', which is not used in the updated split, is used in the ECDSA code, so perhaps is being cached for the existing split code).

On that theory I tried a few changes like copying things into local variables, using gcc prefetch intrinsics, changing the layout in memory to correspond to the order of usage (g1,g2,a1b2,a2,b1), etc., but without much luck and usually just making things worse!

Anyway, I'd appreciate it if others could perhaps report their timings with/without this patch (of course it's only relevant when the endomorphism is used), and if you have any thoughts on what's going on here performance-wise, I'm all ears.

sipa · 2014-07-16T15:02:22Z

src/num_gmp_impl.h

+        mpn_copyi(r->data, a->data + limbs, r->limbs);
+    } else {
+        mpn_rshift(r->data, a->data + limbs, r->limbs, left);
+        while (r->limbs>1 && r->data[r->limbs-1]==0) r->limbs--;


Who not >0?

I'm guessing you're referring to the "r->limbs>1" test? _num_trunc is just _num_split modified to ignore the low bits (perhaps I could have just extended _num_shift to support large shifts). I gather that a zero valued num still has limbs==1, but in any case, I've just copied and modified... Please clarify if I've misunderstood the question.

sipa · 2014-09-01T21:44:28Z

Indeed, I notice a slight slowdown with this (around 0.4%). I wonder whether the advantage of not needing division in the bignum API isn't already worth it...

sipa · 2014-10-26T11:18:42Z

@gmaxwell review requested

peterdettman · 2014-10-26T11:30:49Z

@sipa I'm still very curious about the performance gap, but I think it's reasonable to merge this since the native num stuff appears to be progressing rapidly. I still expect there will be a net performance gain in batch operations, and we may yet find a way around the single-op deficit.

peterdettman · 2014-11-13T03:39:55Z

Re-measuring with -O3 -march=native, it looks like this PR actually does improve performance (I measure very roughly 0.5% faster on bench_verify).

- In secp256k1_gej_split_exp, there are two divisions used. Since the denominator is a constant known at compile-time, each can be replaced by a multiplication followed by a right-shift (and rounding). - Add the constants g1, g2 for this purpose and rewrite secp256k1_gej_split_exp accordingly. - Remove secp256k1_num_div since no longer used

Reject surjection proofs with trailing garbage

Closes #21

sipa reviewed Jul 16, 2014
View reviewed changes

peterdettman force-pushed the split-without-div branch from fe0e4c4 to 84bcb08 Compare August 20, 2014 04:17

theuni mentioned this pull request Sep 11, 2014

WIP: Internal bignum #61

Closed

peterdettman force-pushed the split-without-div branch 5 times, most recently from 59691ab to 47131be Compare November 4, 2014 12:11

peterdettman force-pushed the split-without-div branch from 47131be to a403074 Compare November 13, 2014 03:24

peterdettman force-pushed the split-without-div branch 2 times, most recently from 5278c94 to 7db5892 Compare November 18, 2014 03:45

peterdettman force-pushed the split-without-div branch from 7db5892 to 960f9a2 Compare November 19, 2014 04:16

sipa mentioned this pull request Dec 1, 2014

Multiplication-only lambda splitter without bignum #127

Merged

gmaxwell closed this Dec 7, 2014

roconnor-blockstream mentioned this pull request Sep 21, 2020

Increase precision of g1 and g2. #822

Closed

tecnovert pushed a commit to tecnovert/secp256k1 that referenced this pull request Apr 18, 2022

Merge pull request bitcoin-core#21 from real-or-random/secp256k1-zkp

f1c601e

Reject surjection proofs with trailing garbage

matteonardelli referenced this pull request in bancaditalia/secp256k1-frost Nov 22, 2023

Merge v0.3.1 in FROST

6a6f3dd

Closes #21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid division when decomposing scalars #21

Avoid division when decomposing scalars #21

Uh oh!

peterdettman commented May 25, 2014

Uh oh!

sipa commented Jun 2, 2014

Uh oh!

sipa commented Jun 15, 2014

Uh oh!

peterdettman commented Jun 21, 2014

Uh oh!

peterdettman commented Jun 24, 2014

Uh oh!

peterdettman commented Jun 24, 2014

Uh oh!

peterdettman commented Jul 16, 2014

Uh oh!

sipa Jul 16, 2014

Uh oh!

peterdettman Jul 20, 2014

Uh oh!

sipa commented Sep 1, 2014

Uh oh!

sipa commented Oct 26, 2014

Uh oh!

peterdettman commented Oct 26, 2014

Uh oh!

peterdettman commented Nov 13, 2014

Uh oh!

Uh oh!

Avoid division when decomposing scalars #21

Avoid division when decomposing scalars #21

Uh oh!

Conversation

peterdettman commented May 25, 2014

Uh oh!

sipa commented Jun 2, 2014

Uh oh!

sipa commented Jun 15, 2014

Uh oh!

peterdettman commented Jun 21, 2014

Uh oh!

peterdettman commented Jun 24, 2014

Uh oh!

peterdettman commented Jun 24, 2014

Uh oh!

peterdettman commented Jul 16, 2014

Uh oh!

sipa Jul 16, 2014

Choose a reason for hiding this comment

Uh oh!

peterdettman Jul 20, 2014

Choose a reason for hiding this comment

Uh oh!

sipa commented Sep 1, 2014

Uh oh!

sipa commented Oct 26, 2014

Uh oh!

peterdettman commented Oct 26, 2014

Uh oh!

peterdettman commented Nov 13, 2014

Uh oh!

Uh oh!