Re-design the Schema serialization and code organization

There are many points of criticality in the `Schema` design:

* `Schema` as it is duplicates many metadata, due to how it is taken and serialized as is, including the indexes, rather than distinguishing between the in memory materialized view, and the serialized shape. This makes the metadata size bigger for no good reason, and we have hard constraints on the schema registry size due to the network message size limit and the metadata size limit.
* The double storing of metadata and their index has the downside that every time i need to add a new field, I need to think about backward/frontward compatibility constraints for all the duplicated metadata. This makes the schema registry hard to mantain, hard to apply in-flight changes to env variables, and potentially fragile and brittle.
* Because of the fact that for the "latest" service we store more metadata than for the old service revisions, there are few missing metadata that prevent us from implement new features like "rollback what is the _latest_ service revision" or blue/green deployments.
* On top of this, the `Schema` leaks its internal types used for storing to the Admin API. This is another extremely brittle situation, because changing some field in the Admin REST API potentially breaks backward/frontward compatibility (I think it happened already few times).
* Last but not least, crucial business logic of the schema registry is now split between what's inside `restate_types` and the `updater` in `restate_admin` module. This makes hard to follow what's going on, and is another potential source of bugs.

I would like to go ahead with the following plan:

* [x] ~~Define a new data model for `Schema`, storing the tree of deployments -> services -> handlers. Indexes are built when deserializing the data structure (this happens once in a while anyway).~~ https://github.com/restatedev/restate/commit/77939c5b96a054c9e73b760cfe197d04a2f7c804 -> done in 1.4
* [ ] Reorganize the code, moving updater inside `restate_types`. Doing so allows the internal representation of `Schema` to be public, so can't be leaked anywhere, and makes the code more straightforward to read.
* [ ] Have `Schema` use the new data structure, with new indexes, but still store the previous one
* [ ] Before releasing 1.5, swap the default to store the new data structure. As soon as users will be on 1.5 and perform the first schema update/propagation, the new data structure will be used.
* [ ] Big cleanup

All in all, what's important here is the following:

* Core change is that we don't store the index anymore, meaning slimmer schema with the trade-off that on deserialization we pay a little cost for building the index.
* No **semantical changes** of the **Schema** APIs.
* Code re-organization, hiding things that shouldn't be public and making less error prone schema updates.
* A good cleanup of duplicated types.
* All of this affects only the schema registry.

Some more context of where this came from https://github.com/restatedev/restate/pull/3295#issue-3083705543 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-design the Schema serialization and code organization #3303

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Re-design the Schema serialization and code organization #3303

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions