-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Stateless execution is one of the defining factors of Ethereum 2.0. It will alleviate the I/O bottleneck faced by Ethereum 1.0 clients and minimize the amount of data needed for validators to begin validating a new shard. Although it solves some issues, it raises equally as many. Here are some characteristics that are important for a robust stateless system:
- proof size
- proof verification time
- value lookup time from proof
- proof merging time
- proof generation
- caching schemes for proof data
- maintaining full state for state providers
- virtual machine design (wasm spawning wasm)
Since May, we've been exploring many different facets of stateless systems including compile-time reflection of Rust structs to sparse merkle trees, efficient in-place lookup strategies for sparse merkle trees, single-pass authentication of merkle proofs, atomic modification of proofs in nested execution, a practical example of a stateless token execution environment (and required tooling), and general designs for a stateless execution environment for user contracts.
As we look towards prototyping an umbrella execution environment, we think it is time to try formalizing some of this toolchain so that more efforts can begin in parallel and efforts are not duplicated across projects and organizations. The main pieces of this toolchain that we see are:
- Runtime
a. Stateless executor for consensus nodes
b. Stateful executor for testing + state providers - Authenticated multi-proof backend
a. Proof generator
b. Multi-proof merging
b. EE & contract level library - Orchestration tools
a. Deployer
b. Testing framework
c. Long running simulator - EE & smart contract languages
- Client libraries (i.e. web3.js)
a. Retrieve proofs from state providers
b. Manage credentials for popular execution environments
Runtime
The runtime environment is the base of the whole toolchain. Its semantics define how the multi-proof backends operate which in turn affect each subsequent aspect of the toolchain. The Ewasm team has done a lot of work defining this engine within the bounds of both Ethereum 1.0 and Ethereum 2.0. We have also begun to experiment with certain runtime heuristics in the ewasm-rt such as wasm-spawning-wasm and atomic proof modification. These areas require more research, but are advancing. The biggest missing pieces to the runtime still are secure metering solutions, efficient bignum host functions and multi-proof dependent middleware for stateful execution. Although the middleware won't be utilized by consensus nodes, it will be important for state providers and proof generation in general.
Authenticated multi-proof backend
The multi-proof system is essentially the backbone of all stateless computation. There are numerous flavors of proof systems: sparse, ssz sparse, patricia, knuth binary, etc and even more ways of serializing them. The actual time and size complexity of each proof system varies and should be subject to more rigorous research and improvement. All of these systems can be abstracted in terms of the toolchain as whole by thinking of them as authenticated key-value stores.
+-------------+ +---------------+ +------+
| data values | < ------ > | authenticator | < ------ > | code |
+-------------+ +---------------+ +------+
They act as an integrity shim on top of the actual data values. The runtime should generally be agnostic of the semantics of the proof backend, but it must make an exception in order to provide a stateful executor. That exception in particular is that it must a) maintain a list of get
and set
operations against the stateful data store, which can later be used to construct the necessary multi-proof and b) support middleware from the proof library which allows for runtime decisions (e.g. what should a get
return for key K
that hasn't been "officially" initialized yet?).
Developing libraries that provide all necessary tools is large effort. I discovered this first hand while developing a tool for composing proofs for sheth
. Most of my work has revolved around implementations of sparse trees, so there may be additional things to consider for other types of trees, but the two areas of work here are a) compile-time reflection of data structures to general indexes and b) efficient multi-proof libraries for general index based proofs.
Orchestration tools
Until now, we've been rolling our own orchestration tools. As we start looking towards building smart contracts on top of an execution environment that needs to be deployed to a runtime, we see the need for consolidation of these tools more clearly. In fact, even the existing tools combined are still not adequate enough to provide the developer experience we are striving for. I believe that we are in desperate need of both a testing framework for stateless contracts and a simulator for long running processes. In both instances, it's important that the tools are highly interoperable with the proofing libraries so that users can mostly ignore their existence.
I think the Truffle is a great baseline for testing DevEx that should be provided for stateless contracts:
contract TestMetaCoin {
function testInitialBalanceUsingDeployedContract() {
MetaCoin meta = MetaCoin(DeployedAddresses.MetaCoin());
Assert.equal(meta.getBalance(tx.origin), 10000, "Owner should have 10000 MetaCoin initially");
}
function testInitialBalanceWithNewMetaCoin() {
MetaCoin meta = new MetaCoin();
Assert.equal(meta.getBalance(tx.origin), 10000, "Owner should have 10000 MetaCoin initially");
}
}
In order to achieve that level of abstraction from the proofing backend, it is critical to throughly define a standard interface for proof libraries.
EE & smart contract languages
Just as most developers don't program in x86 assembly or EVM assembly anymore, we need to investigate high-level solutions for writing EEs and smart contracts. Fortunately, WebAssembly is a target for the LLVM so most languages already compile to it. That is only about a quarter of the story though. Many languages have their own runtimes, garbage collectors, and debugging infrastructure that gets included in binaries. However, writing EE / smart contract code is analogous to embedded systems engineering where every clock cycle and byte matter. We need to develop libraries for these languages and tools such as Chisel to support efficient execution and bundle size. Additionally, multi-proof libraries must be written / implemented natively for other languages.
Client libraries
The final piece to the toolchain puzzle is the client libraries used link application developers with users. Fortunately, there is already a lot of existing work that can be leveraged. For instance, if the other pieces are done correctly it should be fairly to extend web3.js
to first send a transaction to a state provider to calculate the proof before submitting it to the eth 2 mempool. It's likely that most of work will be a) supporting the different flavors of execution environments and b) ironing out the incentive schemes for state providers.