Skip to content

storage+indexer: Indexer is not pruned #169

@jmalicevic

Description

@jmalicevic

Summary

Storage usage of CometBFT nodes is becoming a problem for many operators. Reducing this usage is one of the major goals for 2023 (#44 ). To reduce storage usage, CometBFT allows operators to prune blocks below a certain height. However, this pruning does not include data indexed by the indexers.

Problem definition

As part of this effort, people have already gathered some information on the storage usage of CometBFT instances (informalsystems/interchain#1). Based on the reports mentioned in the issue, the storage usage of the kvindexer is almost 50% of the total usage. The fact that pruning is enabled does not impact the indexer as indexer pruning is not implemented at the moment (Jan 2023).

Note that the way the kvindexer is implemented at the moment, it stores all transaction results and block events (begin block and end block - which will be finalize block in the future). Thus, it has a summary of the entire blockchain which is never pruned.

Proposal

  • We need to understand whether pruning the indexer impacts its users. My assumption is that, if users prune the blockstore (blockchain), they would be ok with pruning the indexer as well. But this is hard to know, there can be scenarios where for example relayers do depend on having all transaction events even though the actual transactions do not exist in the blockstore.

  • In any case, providing the users an option to prune the indexer would be of significant importance to some users. This can however be troublesome:

    • We have to find a way on how to identify an index entry - retrieve its key; in order to delete it. For the transaction indexer this is possible because the transaction results are kept in other stores. The key is derived from that (for each transaction it consists of the event attributes and their values- which are stored). Block events are not kept anywhere and it thus makes key retrieval impossible (the event attributes are not retrievable). We need to investigate whether we can use timestamps or heights to prune the store.
    • Pruning the transaction indexer (based on a prototype implementation) can be slow. Thus this can be done as a maintenance task once a day. We would need to test the impact on performance and runtime to establish the impact indexer pruning has on performance.

Even when we take the above in consideration as well as the current considerations on whether we should completely redesign the indexing capabilities of CometBFT(#73 and #82 ), given the huge storage usage of the indexer and the fact that v0.34.x is still widely used, I think it is worth investigating how to reduce it, even if we cannot entirely reduce it.

As we might not be able to be 100% sure that users are ok with pruning the indexer, we should make this a config flag that is false by default. At least for the initial release.

DoD

As a result of this work we will either decide to implement some kind of indexer pruning or have clear arguments as to why we will not do it and how will users problems with regards to its storage use be addressed (new design or alternative indexer implementation),

  • Understand whether users need all the event data to be present in the indexer
  • Will ABCI 2.0 lead to no events being stored anywhere (thus making it impossible to generate event keys for the indexer)
    • If yes - determine whether the indexer can be pruned based on timestamp or height
    • If no - implement indexer pruning.
  • If pruning is implemented run experiments to evaluate impact on performance when indexer pruning is on - large testnet as well as e2e tests.

Metadata

Metadata

Labels

P:storage-optimizationPriority: Give operators greater control over storage and storage optimizationindexerstorage

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions