Update statement-distribution for asynchronous backing

If the runtime API version indicates that asynchronous backing is enabled, we'll need new logic in statement-distribution.

## Goals

The statement-distribution subsystem has these high-level goals:
  1. Distribute locally-generated statements about candidates to the network
  2. Participate in the distribution of statements by other validators
  3. Receive relevant statements about candidates from peers
  4. Filter out spam: nonsensical statements or statements by spamming validators

## Recap

By statement, we mean something roughly like

```rust
enum Statement {
    Seconded(RelayParentHash, CandidateHash),
    Valid(RelayParentHash, CandidateHash),
}
```

Validators sign these statements about candidates, which in turn are gossiped over the network and can be included alongside the candidate on-chain.

Validators should only issue `Valid` statements for candidates that they've seen `Seconded`, and there are restrictions on how many `Seconded` statements can be produced by a validator, making `Seconded` statements the core means of preventing spam. In fact, the changes to semantics here as a result of #4963 are the main reason the logic of this subsystem needs to be changed.

## Spam Protection

There are two kinds of spam that we are worried about.
  1. Nonsensical or unanchored messages, which can be produced by any rogue validator
  2. Infinite numbers of valid, backed candidates, which any rogue validator _group_ can produce.

### Status Quo

Without asynchronous backing, parachain candidates were required to have the most recent relay-chain block as their relay-parent, which in turn implied that no two sequential parachain blocks could have the same relay-parent. Spam prevention was relatively simple: we required that each validator could second no more than 1 candidate per relay-chain block, and that `Valid` statements could only be sent to peers we were sure had the corresponding `Seconded` statement - either because they sent it to us, or we sent it to them. `Valid` statements not corresponding to a known `Seconded` block could be ignored, and the amount of `Seconded` statements to consider was bounded by the number of validators.

We exchange `View`s with our peers, which indicate our current active leaves. Our peers send us only candidates that have anything to do with these active leaves. Each active leaf is a blank slate.

## Changes

With asynchronous backing, there's no longer any requirement for sequential parachain blocks to have different relay-parents. However, we have to rein in the number of possible candidates _somehow_, otherwise malicious validators or validator groups could produce an infinite number of valid parachain blocks and furthermore, an infinite number of valid prospective parachains of infinite length. These are the problems we need to avoid.

The model we introduce is based off of "prospective parachains", where backing subsystems buffer a few parachain blocks off-chain. This is coordinated by the Prospective Parachains subsystem #4963 . The buffers are actually trees with an enforced max depth. The prospective parachains subsystem is designed to work hand-in-hand with higher level spam prevention to avoid the trees growing too large - and statement-distribution fulfills that role.

Each new active leaf is no longer guaranteed to be a blank slate, but instead will initially be populated by a `FragmentTree` from the set of already known candidates. Peers keep track of which candidates they have sent and received messages for.

Candidates don't have a globally unique depth, and they don't even have a unique depth per active-leaf. It is legal, although unexpected, for head-data to cycle and relay-parents can now stay the same. That is, each fragment tree gives a set of depths for any candidate, which is usually either empty or has length 1, except for pathological parachains. This set of **valid depths** is defined as the depths in the tree at which the candidate appears.

## Outdated first attempt (see https://github.com/paritytech/polkadot/issues/5055#issuecomment-1180914398)

This is probably the trickiest conceptual bit, but here goes: we keep candidates and messages referencing them around until the union of the **valid depths** is empty at all active leaves. The valid depths for a candidate trend towards the empty set because the candidate is eventually either included or orphaned.

So here's how we use depths to prevent spam:
  1. We keep the rule that we only accept `Valid` messages after becoming aware of `Seconded` messages for a candidate
  2. We accept only one `Seconded` message per validator per depth per active-leaf. `Seconded` messages which refer to candidates that would have an empty set of **valid depths** at all of our active leaves are considered spam.

We already know which active-leaves our peers are aware of through their views, and we're aware of which candidates we've received from/sent to our peers, which means that it's not too involved for us to build up a view of their own `FragmentTrees` which we can use to send them things.

So, in practice, this 'depth' approach also means that validators have to choose a fork of the parachain to stick to: it sets an upper bound of n_validators*max_depth nodes in the tree, not n_validators^max_depth.

### Practicalities

Most of the issues in implementation will revolve around race conditions:
  1. Prospective fragment trees are actually handled within another subsystem. But we need to maintain some kind of mutable fragment trees within the statement-distribution subsystem itself in order to handle incoming streams of messages
  2. So far, we've been imagining that the Prospective Parachains subsystem will inform backing subsystems about changes to fragment trees via signals from the overseer (#4961), but such signals will race against active leaf updates and view updates to our peers. Instead, we might have backing subsystems pull the data from the Prospective Parachains subsystem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update statement-distribution for asynchronous backing #5055

Goals

Recap

Spam Protection

Status Quo

Changes

Outdated first attempt (see #5055 (comment))

Practicalities

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update statement-distribution for asynchronous backing #5055

Description

Goals

Recap

Spam Protection

Status Quo

Changes

Outdated first attempt (see #5055 (comment))

Practicalities

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions