Implements multiset module #477

tomasvdw · 2017-09-19T15:23:46Z

This adds support for Elliptic Curve Multiset operations

Currently benches at about 14us/hash

apoelstra · 2017-09-19T15:32:45Z

Can you change the copyright headers on all the new files to use your name(s) rather than those on the files you templated from?
Can you write a markdown file like this one or this one that explains what you're doing and a bit of motivation?
I have no idea what our policy on accepting upstreamed modules is, other than that I'm pretty sure we have none because this hasn't happened before.

tomasvdw · 2017-09-19T15:40:22Z

1 & 2 will do.

Note that this implements what Pieter Wiulle proposed on the Bitcoin mailing list. It isn't really upstream ATM.

apoelstra · 2017-09-19T15:48:27Z

Ah, that's right, I knew this looked familiar. Even a .md in modules/multiset that was nothing but a link to the mailing list post would be great.

Since this is intended eventually for Bitcoin, my guess is that this will be treated something like the aggsig module, where we'll review the code but nothing can actually be merged until an actual BIP is produced and finalized. We don't want to give the impression that the secp maintainers have any undue control over what proposals eventually are accepted.

The name roller was redundant and confusing

tomasvdw · 2017-09-20T11:09:38Z

I've changed the attribution and added a readme.

I understand if this isn't merged before a BIP is embraced, although you could argue that optional modules can be experimental.

Notes for review:

I believe I may be too eagerly normalizing field elements in secp256k1_multiset_create(). Someone with more knowledge of the normalization behaviour of the field operations may want to optimize some away.
I am not using context, but added it to the API calls for consistency. I am not sure whether this is preferred.

gmaxwell · 2017-09-24T03:14:54Z

We use trial-and-increment which is fast but non-constant time.

I believe that this is slower than shallue van de woestijne at least in batches when a fast legendre symbol is available. It also results in non-unform points, which probably would kill hope of creating a strong security proof (though without creating a pratical weakness).

Currently benches at about 800k/sec

In Bitcoin I am a little doubtful this approach is acceptable because it would add CPU hours to synchronization, multiplicative ( bitcoin/bitcoin#10434 ) has much better performance and a more robust security argument. But is less cachable so it has worse performance at the tip. It's a difficult trade-off.

Inits the multiset with the constant for empty data

This initialization is non-portable; data encoded in a ge_storage_t (or gej) is potentially platform and or compile option specific. What is that value in any case? Looks like a backdoor as is, Since if there is any possible collection of set inputs that sum to that value which can be computed, then the initial state is prefilled with that entry as a member; so the initial state should be a value which is unreachable by any known input. The norm in the codebase is to include sage code to generate and verify all non-trivial constants. Please explain precisely where this constant originates.

I'm somewhat dubious about the point at infinity handling. I guess the idea behind the unexplained 'random' constant is to make it unreachable?

secp256k1_ge_set_gej(&ge_result, &gej_result);

Reprojecting to affine is devastating for performance and is entirely unneeded with the API setup here.

sipa · 2017-09-24T03:33:30Z

We use trial-and-increment which is fast but non-constant time.

I believe that this is slower than shallue van de woestijne at least in batches when a fast legendre symbol is available.

I think that's only true for characteristic-2 curves. Constant-time Shallue-Van de Woestijne here needs 1 inverse and 3 square roots. Even variable-time SVdW would need 1 inverse, 1.5 Jacobi symbols, and 1 square root.

The increment-and-test approach is indeed biased, and I'd prefer to avoid it, but using a hash operation rather than an increment would be fine (variable time doesn't matter here), and should be pretty fast when a fast Jacobi symbol is available (almost 2x the speed of the current PR, as it relies on using sqrt to test quadratic residues).

* Maintain multiset as Jacobian to prevent unneeded Jacobian->affine * Change API to directly add/remove data * Use infinity as empty element * Use trial-and-rehash instead of trial-and-increment

tomasvdw · 2017-09-25T15:01:00Z

Thanks for the input! I've made some improvements.

In Bitcoin I am a little doubtful this approach is acceptable because it would add CPU hours to synchronization, multiplicative ( bitcoin/bitcoin#10434 ) has much better performance and a more robust security argument. But is less cachable so it has worse performance at the tip. It's a difficult trade-off.

It is currently running at ~14us/hash, which won't add CPU hours, but only a few minutes. Also, I am writing this as I am experimenting with hybrid prefix tree/rolling hash approaches for which 768 bytes/rolling hash isn't really an option.

This initialization is non-portable; data encoded in a ge_storage_t (or gej) is potentially platform and or compile option specific. What is that value in any case? Looks like a backdoor as is, Since if there is any possible collection of set inputs that sum to that value which can be computed, then the initial state is prefilled with that entry as a member; so the initial state should be a value which is unreachable by any known input. The norm in the codebase is to include sage code to generate and verify all non-trivial constants. Please explain precisely where this constant originates.

Good point. Using ge_storage_t as multiset was a bit hacky.

I was indeed using the constant to avoid infinite, but this was a rather silly idea. This breaks empty+empty=empty and more important derived properties. I now switched to a portable multiset type which handles infinity and properly using infinity as empty set.

Reprojecting to affine is devastating for performance and is entirely unneeded with the API setup here.

Good point. My reasoning was, to use a 64-byte multiset, but the performance effect is indeed devastating (~25%). I now switched to a 96-byte (jacobian) multiset to avoid this. We could always add a 64 byte multiset_compact format if needed.

using a hash operation rather than an increment would be fine

Using a hash seems better indeed. I've switched to trial-and-hash instead of trial-and-increment.

sipa · 2017-09-25T17:23:30Z

because it would add CPU hours to synchronization

It is currently running at ~14us/hash, which won't add CPU hours, but only a few minutes.

At 14us per operation, synchronization from scratch would gain 3.5 hours of CPU time (there are about 900M inputs+outputs in the chain).

You could argue that in a post-UTXO-sync world that isn't relevant, but we're not there, and I don't think we should disregard the cost for doing a full validation.

I think less than 14us/op is doable though; I'll have a look at the code soon.

tomasvdw · 2017-09-25T18:08:04Z

You could argue that in a post-UTXO-sync world that isn't relevant, but we're not there, and I don't think we should disregard the cost for doing a full validation.

But even for full validation, why would you hash spend TXOs? Surely, you only need to hash the current UTXO set, at ~50 million or ~12 min (with a single thread). And only once.

I think less than 14us/op is doable though; I'll have a look at the code soon.

I would very much appreciate that.

sipa · 2017-09-25T18:13:30Z

But even for full validation, why would you hash spend TXOs? Surely, you only need to hash the current UTXO set, at ~50 million or ~12 min (with a single thread).

Perhaps you're thinking of a different use case than I am. My hope is that every full node can at every point in time know what the hash of their UTXO set is. It could be used to compare the state of nodes, do sanity checking, or as part of a future UTXO commitment in blocks.

All of those things are not possible if you don't know the UTXO set hash during (even initial) synchronization.

tomasvdw · 2017-09-25T18:30:28Z

I have the similar use cases in mind. In the future adding UTXO commitment to blocks, fast syncing, and possibly (with a hybrid prefix tree/rolling hash approach) UTXO proofs.

But I am afraid I do not understand the benefit of processing each historic UTXO change that way. What is the difference between the initial sync implementation:

A. Download all blocks, and for each block update the UTXO set and the commitment.
B. Download all blocks, and for each block update the UTXO set. Then update the commitment from the UTXO set.

I thought the primary advantage of a commutative hash is that A and B yield the same result. So what would be the drawback of B? Regardless of further optimizations, A seems wasteful.

sipa · 2017-09-25T18:39:18Z

If we'd have UTXO commitments (as a consensus rule) in blocks, with (B) you wouldn't be able to validate the correctness of those commitments in the historical chain.

My view is that with commutative hashing the advantage is that you can just start off with a UTXO set and compare it to chain commitments, if that's the security tradeoff you want to make. But I think it's wrong to think that everyone will make it, and disregard the validation cost it entails for those who want full historical validation.

To be clear, I'm not using this as an argument that using secp256k1 multiset hashing is the wrong approach. I'm just pointing out that the performance does matter for some use cases, but perhaps the tradeoffs are still worth it.

tomasvdw · 2017-09-25T18:57:56Z

Agreed. I don't think the 900M existing outputs are relevant in this respect (as they don't have a commitment), but certainly the per block update time does for future full validation.

jonasnick · 2019-07-31T13:41:52Z

src/bench_multiset.c

+
+    ctx = secp256k1_context_create(SECP256K1_CONTEXT_VERIFY);
+
+    run_benchmark("ecdsa_multiset", bench_multiset, bench_multiset_setup, NULL, NULL, 10, 20);


This will report wrong results. The last parameter is supposed to acount for the number of iterations in the loop above. So it should be 10,000 but still it won't properly account for multiset_finalize.

My suggestion is remove multiset_finalize above and change the line to

run_benchmark("ecdsa_multiset_add", bench_multiset, bench_multiset_setup, NULL, NULL, 10, 100000);

which is 12.5us on my machine whereas minimum should be should be about 6us: two hashes (=0.46us), three mul and sqrt (5.2us) and gej_add_ge_var (0.228us).

elichai · 2019-12-10T11:34:54Z

For future reference, I also think this lacks a multiset serialization/parse methods for persistent storage (unless the author expected people to write the 96 byte struct to persistent storage, which would be really inconsistent with this library's API)

…ltiset Summary ------- This is part of initial work for the UTXO Fast Sync project. As per the spec, we will be using the multiset pubkey (rather than the pubkey hash) to identify a UTXO set (or UTXO commitment). Using the pubkey has the advantage that it means we can always reconstruct the multiset object given a serialized pubkey. The code for ec-multiset we have in secp256k1 could only produce a hash of the ec-multiset, and not the pubkey itself. However, the pubkey is kept in memory internally. This MR adds new functions to the secp256k1 lib to retrieve the pubkey: - Serialize an empty or non-empty multiset to a 33 byte compressed pubkey. - Unserialize a 33-byte compressed pubkey to an ec multiset. - Test if an ec-multiset is empty. This MR also adds a bunch of tests for the above. This MR also enables the EC multiset module in secp256k1 because it will be needed for subsequent MRs that will depend on this one. In case you are wondering, it's safe to enable it, since if you don't use it, it does no harm and is inert in the codebase. I checked with Core and ABC and no subsequent work or commits exist for the ec-multiset module, which was originally contributed to ABC here: https://reviews.bitcoinabc.org/D1072 Core has an extant PR for similar work here which was never merged: bitcoin-core/secp256k1#477. With this MR we will have the most feature-rich implementation of the module. Test Plan --------- - `ninja check-secp256k1` (you may have to clear your cmake cache and/or start from a clean build/ directory because of bugs in our cmake building for secp256k1)

real-or-random · 2023-05-08T13:00:23Z

Our parent project Bitcoin Core has MuHash3072 now (https://doxygen.bitcoincore.org/class_mu_hash3072.html). I feel that obsoletes this PR. (But please don't hesitate to disagree.)

Implements multiset module

7b2ec86

tomasvdw added 3 commits September 20, 2017 11:15

Fix attribution

5b5484b

Rename rollers to multiset

746d255

The name roller was redundant and confusing

Add readme

993da60

tomasvdw force-pushed the multiset branch from 802abfe to 993da60 Compare September 20, 2017 11:02

tomasvdw added 2 commits September 25, 2017 16:00

Add readme

7e24b7b

Several enhancements

05121a7

* Maintain multiset as Jacobian to prevent unneeded Jacobian->affine * Change API to directly add/remove data * Use infinity as empty element * Use trial-and-rehash instead of trial-and-increment

jonasnick reviewed Jul 31, 2019

View reviewed changes

fjahr mentioned this pull request Jan 24, 2020

[WIP] Index for UTXO Set Statistics bitcoin/bitcoin#18000

Closed

real-or-random closed this May 8, 2023


		ctx = secp256k1_context_create(SECP256K1_CONTEXT_VERIFY);

		run_benchmark("ecdsa_multiset", bench_multiset, bench_multiset_setup, NULL, NULL, 10, 20);

Implements multiset module #477

Implements multiset module #477

Uh oh!

Conversation

tomasvdw commented Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apoelstra commented Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasvdw commented Sep 19, 2017

Uh oh!

apoelstra commented Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasvdw commented Sep 20, 2017

Uh oh!

gmaxwell commented Sep 24, 2017

Uh oh!

sipa commented Sep 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasvdw commented Sep 25, 2017

Uh oh!

sipa commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasvdw commented Sep 25, 2017

Uh oh!

sipa commented Sep 25, 2017

Uh oh!

tomasvdw commented Sep 25, 2017

Uh oh!

sipa commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasvdw commented Sep 25, 2017

Uh oh!

jonasnick Jul 31, 2019

Choose a reason for hiding this comment

Uh oh!

elichai commented Dec 10, 2019

Uh oh!

real-or-random commented May 8, 2023

Uh oh!

Uh oh!

tomasvdw commented Sep 19, 2017 •

edited

Loading

apoelstra commented Sep 19, 2017 •

edited

Loading

apoelstra commented Sep 19, 2017 •

edited

Loading

sipa commented Sep 24, 2017 •

edited

Loading

sipa commented Sep 25, 2017 •

edited

Loading

sipa commented Sep 25, 2017 •

edited

Loading