-
Notifications
You must be signed in to change notification settings - Fork 98
Closed
Description
There are many points of criticality in the Schema
design:
Schema
as it is duplicates many metadata, due to how it is taken and serialized as is, including the indexes, rather than distinguishing between the in memory materialized view, and the serialized shape. This makes the metadata size bigger for no good reason, and we have hard constraints on the schema registry size due to the network message size limit and the metadata size limit.- The double storing of metadata and their index has the downside that every time i need to add a new field, I need to think about backward/frontward compatibility constraints for all the duplicated metadata. This makes the schema registry hard to mantain, hard to apply in-flight changes to env variables, and potentially fragile and brittle.
- Because of the fact that for the "latest" service we store more metadata than for the old service revisions, there are few missing metadata that prevent us from implement new features like "rollback what is the latest service revision" or blue/green deployments.
- On top of this, the
Schema
leaks its internal types used for storing to the Admin API. This is another extremely brittle situation, because changing some field in the Admin REST API potentially breaks backward/frontward compatibility (I think it happened already few times). - Last but not least, crucial business logic of the schema registry is now split between what's inside
restate_types
and theupdater
inrestate_admin
module. This makes hard to follow what's going on, and is another potential source of bugs.
I would like to go ahead with the following plan:
-
Define a new data model for77939c5 -> done in 1.4Schema
, storing the tree of deployments -> services -> handlers. Indexes are built when deserializing the data structure (this happens once in a while anyway). - Reorganize the code, moving updater inside
restate_types
. Doing so allows the internal representation ofSchema
to be public, so can't be leaked anywhere, and makes the code more straightforward to read. - Have
Schema
use the new data structure, with new indexes, but still store the previous one - Before releasing 1.5, swap the default to store the new data structure. As soon as users will be on 1.5 and perform the first schema update/propagation, the new data structure will be used.
- Big cleanup
All in all, what's important here is the following:
- Core change is that we don't store the index anymore, meaning slimmer schema with the trade-off that on deserialization we pay a little cost for building the index.
- No semantical changes of the Schema APIs.
- Code re-organization, hiding things that shouldn't be public and making less error prone schema updates.
- A good cleanup of duplicated types.
- All of this affects only the schema registry.
Some more context of where this came from #3295 (comment)