Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Update statement-distribution for asynchronous backing #5055

@rphmeier

Description

@rphmeier

If the runtime API version indicates that asynchronous backing is enabled, we'll need new logic in statement-distribution.

Goals

The statement-distribution subsystem has these high-level goals:

  1. Distribute locally-generated statements about candidates to the network
  2. Participate in the distribution of statements by other validators
  3. Receive relevant statements about candidates from peers
  4. Filter out spam: nonsensical statements or statements by spamming validators

Recap

By statement, we mean something roughly like

enum Statement {
    Seconded(RelayParentHash, CandidateHash),
    Valid(RelayParentHash, CandidateHash),
}

Validators sign these statements about candidates, which in turn are gossiped over the network and can be included alongside the candidate on-chain.

Validators should only issue Valid statements for candidates that they've seen Seconded, and there are restrictions on how many Seconded statements can be produced by a validator, making Seconded statements the core means of preventing spam. In fact, the changes to semantics here as a result of #4963 are the main reason the logic of this subsystem needs to be changed.

Spam Protection

There are two kinds of spam that we are worried about.

  1. Nonsensical or unanchored messages, which can be produced by any rogue validator
  2. Infinite numbers of valid, backed candidates, which any rogue validator group can produce.

Status Quo

Without asynchronous backing, parachain candidates were required to have the most recent relay-chain block as their relay-parent, which in turn implied that no two sequential parachain blocks could have the same relay-parent. Spam prevention was relatively simple: we required that each validator could second no more than 1 candidate per relay-chain block, and that Valid statements could only be sent to peers we were sure had the corresponding Seconded statement - either because they sent it to us, or we sent it to them. Valid statements not corresponding to a known Seconded block could be ignored, and the amount of Seconded statements to consider was bounded by the number of validators.

We exchange Views with our peers, which indicate our current active leaves. Our peers send us only candidates that have anything to do with these active leaves. Each active leaf is a blank slate.

Changes

With asynchronous backing, there's no longer any requirement for sequential parachain blocks to have different relay-parents. However, we have to rein in the number of possible candidates somehow, otherwise malicious validators or validator groups could produce an infinite number of valid parachain blocks and furthermore, an infinite number of valid prospective parachains of infinite length. These are the problems we need to avoid.

The model we introduce is based off of "prospective parachains", where backing subsystems buffer a few parachain blocks off-chain. This is coordinated by the Prospective Parachains subsystem #4963 . The buffers are actually trees with an enforced max depth. The prospective parachains subsystem is designed to work hand-in-hand with higher level spam prevention to avoid the trees growing too large - and statement-distribution fulfills that role.

Each new active leaf is no longer guaranteed to be a blank slate, but instead will initially be populated by a FragmentTree from the set of already known candidates. Peers keep track of which candidates they have sent and received messages for.

Candidates don't have a globally unique depth, and they don't even have a unique depth per active-leaf. It is legal, although unexpected, for head-data to cycle and relay-parents can now stay the same. That is, each fragment tree gives a set of depths for any candidate, which is usually either empty or has length 1, except for pathological parachains. This set of valid depths is defined as the depths in the tree at which the candidate appears.

Outdated first attempt (see #5055 (comment))

This is probably the trickiest conceptual bit, but here goes: we keep candidates and messages referencing them around until the union of the valid depths is empty at all active leaves. The valid depths for a candidate trend towards the empty set because the candidate is eventually either included or orphaned.

So here's how we use depths to prevent spam:

  1. We keep the rule that we only accept Valid messages after becoming aware of Seconded messages for a candidate
  2. We accept only one Seconded message per validator per depth per active-leaf. Seconded messages which refer to candidates that would have an empty set of valid depths at all of our active leaves are considered spam.

We already know which active-leaves our peers are aware of through their views, and we're aware of which candidates we've received from/sent to our peers, which means that it's not too involved for us to build up a view of their own FragmentTrees which we can use to send them things.

So, in practice, this 'depth' approach also means that validators have to choose a fork of the parachain to stick to: it sets an upper bound of n_validators*max_depth nodes in the tree, not n_validators^max_depth.

Practicalities

Most of the issues in implementation will revolve around race conditions:

  1. Prospective fragment trees are actually handled within another subsystem. But we need to maintain some kind of mutable fragment trees within the statement-distribution subsystem itself in order to handle incoming streams of messages
  2. So far, we've been imagining that the Prospective Parachains subsystem will inform backing subsystems about changes to fragment trees via signals from the overseer (Overseer: allow subsystems to instruct the overseer to send signals #4961), but such signals will race against active leaf updates and view updates to our peers. Instead, we might have backing subsystems pull the data from the Prospective Parachains subsystem.

Metadata

Metadata

Assignees

Labels

I8-refactorCode needs refactoring.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions