Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write committees as blobs to storage. #3453

Open
afck opened this issue Mar 3, 2025 · 0 comments
Open

Write committees as blobs to storage. #3453

afck opened this issue Mar 3, 2025 · 0 comments
Assignees

Comments

@afck
Copy link
Contributor

afck commented Mar 3, 2025

Problem

The admin chain subscriptions for committee changes don't scale well: Every new chain sends a cross-chain message to the admin chain to subscribe to notifications about new committees, and whenever there is a new epoch (introducing a new committee), or an old epoch is removed, a cross-chain message is sent to every subscriber (i.e. almost every chain in the system).

Currently these cross-chain messages, as well as every chain state, contains a copy of the committees.

Requirements

All chains still need to have a well-defined last block in each epoch, since at each block height, we need to know the validators are before agreeing on the next block.

They don't need to explicitly acknowledge old epochs expiring, however. (In fact, if an epoch is not trusted anymore, that is immediately the case for all chains; there cannot be an option to delay that.)

Proposal

We are going to replace the old mechanism and get rid of the system channel bottleneck, so we:

  • Remove the CreateCommittee and RemoveCommittee system message variants, and the corresponding system channel subscriptions.
  • Remove the committees from all chain states.

Instead, we will use an event stream and blobs:

  • Whenever a committee is created on the admin chain, it is stored in a new blob, and an event is emitted with:
    • a stream name NEW_EPOCH_STREAM_NAME = &[0],
    • the new epoch number as the event ID, and
    • the blob ID as the payload.
  • Whenever a committee is removed on the admin chain, a new event is emitted with:
    • a stream name REMOVED_EPOCH_STREAM_NAME = &[1],
    • the old epoch number as the event ID, and
    • an empty payload.
  • These events can be handled on other chains with a new ProcessNewEpoch(Epoch) or ProcessRemovedEpoch(Epoch) system operation. Execution of that operation:
    • reads the event from storage and creates an oracle response with its event ID (key and stream name) and content (the blob ID or empty), and
    • updates the chain's set of epochs and committees.
  • Every client will now follow/sync the admin chain:
    • When processing any chain's inbox on the CLI, the admin chain is synced as well, and if there are new epoch events, they are handled, too, in addition to the incoming messages.
    • When running the node service or faucet service, the client makes a gRPC subscription to the admin chain, so it is notified about new blocks and therefore new events. Whenever there is a new epoch event, it is handled on all its owned chains.

This still means that every currently running node service (and faucet) will subscribe to the admin chain, so it still puts some load on the validators, but it removes the on-chain subscriptions and all the cross-chain messages.

@afck afck self-assigned this Mar 3, 2025
@afck afck changed the title Write committees separately in storage. Write committees as blobs to storage. Mar 3, 2025
@ma2bd ma2bd mentioned this issue Mar 8, 2025
ma2bd added a commit that referenced this issue Mar 8, 2025
## Motivation

* As noted by @MathieuDutSik in #3496, LRU-caching is incorrect outside
views.
* However, journaling outside of views is also incorrect

## Proposal

On top of #3501, we simply deactivate the features that require
exclusive access to an object in storage.
* Journaling is not authorized.
* LRU cache should never insert `None` but instead forget about the
deleted key. (Not that we delete blobs but who knows in the future.)

We may want to rename `connect` and `clone_with_root_keys` later in
another PR.

## Test Plan

* CI
* Verified that this solves the issue with #3453

## Release Plan

In principle, this chould be backported to the latest `testnet` branch.

---------

Signed-off-by: Mathieu Baudet <[email protected]>
Co-authored-by: Andreas Fackler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant