WAL: Custom Encoding for `Envelope` #3261

muhamadazmy · 2025-05-12T17:52:46Z

WAL: Custom Encoding for Envelope

This PR enables Envelope to apply custom encoding strategies for individual Command variants, selecting the most efficient encoding available for each case.

The design is forward-compatible, allowing future changes to the encoding of specific variants without breaking backward compatibility.

Benches

Benchmarks shows a big improvement for variants that uses protobuf (e.g invoke-cmd in this example).
While also unfortunately shows a small regression in performance for variants that still uses flexbuffers internally.

Those variants will eventually switch to bilrost overtime, so I think even if we have a slight decrease in performance right now
this changeset will allow us to improve this later on.

NOTE: To run the benchmarks locally make sure to modify the envelope StorageEncode implementation
to actually use Custom encoding

Before

Benchmarks results before switching to custom encoding

     Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-5dda2eb5d1331e02)
Gnuplot not found, using plotters backend
replicated-loglet-serde.invoke-cmd/serialize-append
                        time:   [58.196 µs 58.248 µs 58.305 µs]
                        change: [+64.298% +64.636% +64.929%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 50 measurements (12.00%)
  1 (2.00%) low severe
  1 (2.00%) low mild
  4 (8.00%) high mild
replicated-loglet-serde.invoke-cmd/deserialize-append
                        time:   [1.9243 µs 1.9267 µs 1.9295 µs]
                        change: [-4.3010% -4.1884% -4.0626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
replicated-loglet-serde.invoke-cmd/serialize-store
                        time:   [58.438 µs 58.475 µs 58.517 µs]
                        change: [+62.380% +62.647% +62.876%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild

replicated-loglet-serde.invoker-effect/serialize-append
                        time:   [36.209 µs 36.256 µs 36.319 µs]
                        change: [-8.3523% -8.1153% -7.9057%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  1 (2.00%) high mild
  3 (6.00%) high severe
replicated-loglet-serde.invoker-effect/deserialize-append
                        time:   [1.9168 µs 1.9208 µs 1.9262 µs]
                        change: [+1.3047% +1.6301% +1.8847%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  1 (2.00%) high mild
  2 (4.00%) high severe
replicated-loglet-serde.invoker-effect/serialize-store
                        time:   [36.556 µs 36.603 µs 36.663 µs]
                        change: [-8.4178% -8.2808% -8.1429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) high mild
  3 (6.00%) high severe

After

Benchmarks results after switching to custom encoding

     Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-fdc6dca006aa05a2)
Gnuplot not found, using plotters backend
replicated-loglet-serde.invoke-cmd/serialize-append
                        time:   [35.439 µs 35.468 µs 35.500 µs]
                        change: [-39.114% -39.039% -38.965%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
replicated-loglet-serde.invoke-cmd/deserialize-append
                        time:   [1.8101 µs 1.8129 µs 1.8161 µs]
                        change: [-6.0752% -5.9156% -5.7532%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 50 measurements (16.00%)
  1 (2.00%) low mild
  3 (6.00%) high mild
  4 (8.00%) high severe
replicated-loglet-serde.invoke-cmd/serialize-store
                        time:   [35.910 µs 35.944 µs 35.980 µs]
                        change: [-38.591% -38.463% -38.312%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe

replicated-loglet-serde.invoker-effect/serialize-append
                        time:   [39.387 µs 39.453 µs 39.520 µs]
                        change: [+8.3225% +8.5442% +8.7673%] (p = 0.00 < 0.05)
                        Performance has regressed.
replicated-loglet-serde.invoker-effect/deserialize-append
                        time:   [1.8490 µs 1.8517 µs 1.8547 µs]
                        change: [-3.8572% -3.6454% -3.4561%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
replicated-loglet-serde.invoker-effect/serialize-store
                        time:   [39.832 µs 39.868 µs 39.905 µs]
                        change: [+8.6906% +8.8668% +9.0299%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) low mild
  3 (6.00%) high severe

muhamadazmy · 2025-05-12T18:04:49Z

@AhmedSoliman I still need to run some benches to make sure this does not affect performance (in flexbuffers variants cases).

AhmedSoliman · 2025-05-13T09:40:08Z

crates/partition-store/src/lib.rs

@@ -21,7 +21,7 @@ mod owned_iter;
 mod partition_store;
 mod partition_store_manager;
 pub mod promise_table;
-mod protobuf_types;
+pub mod protobuf_types;


This can be done in a follow-up PR:
One idea is to move this along with the storage proto files to restate-storage-api instead and avoid creating a dependency between wal-protocol and partition-store. The aim is to avoid pulling rocksdb as a dependency to wal-protocol.

I was trying to check how hard that will be and I found that protobuf_types also uses some types from the partition-store crate that are not easily movable. So I will have to come back for this later

That would be okay, if we move those types to restate-storage-api, then partition-store crate can depend on it.

To not risk forgetting about this and losing the context, can you file an issue and reference the issue number as a comment in the relevant places so our future selves can remember it?

crates/types/src/errors.rs

AhmedSoliman · 2025-05-13T09:49:17Z

crates/types/src/storage.rs

+    /// Custom encoding
+    Custom = 0xff,


1- Why 0xff? Blocking the max value makes it impossible to do future checks like if kind >= 6 { // not supported }.
2- I'd appreciate if you add a comment explaining this codec type a little more.

I wanted Custom to be distinguished from other types. So I was thinking either 0 or 0xff since custom encoding has no macros support and need to be implemented manually. But it doesn't really matter what value it's set to.

I will add more docs

AhmedSoliman · 2025-05-13T09:49:45Z

crates/wal-protocol/Cargo.toml

@@ -15,13 +15,15 @@ serde = ["dep:serde", "enum-map/serde", "bytestring/serde", "restate-invoker-api
 workspace-hack = { version = "0.1", path = "../../workspace-hack" }
 restate-core = { workspace = true }
 restate-invoker-api = { workspace = true }
+restate-partition-store = { workspace = true }


Here is the culprit mentioned above.

Same as what mentioned above re. referencing the github issue as a comment.

AhmedSoliman · 2025-05-13T09:54:11Z

crates/wal-protocol/src/lib.rs

+    fn default_codec(&self) -> StorageCodecKind {
+        StorageCodecKind::Custom
+    }
+
+    fn encode(&self, buf: &mut ::bytes::BytesMut) -> Result<(), StorageEncodeError> {
+        buf.put_slice(&envelope::encode(self)?);
+        Ok(())
+    }


Would using this encoding by default breaks backward compatibility with v1.2.x and v1.3.2?

The right approach is to only decode the custom encoding in v1.3 and only start using it to encode in v1.4

Oh i see. Thanks for pointing this out. Good point!

AhmedSoliman · 2025-05-13T10:00:05Z

crates/wal-protocol/src/lib.rs

+        #[error("Missing field codec")]
+        MissingFieldCodec,
+        #[error("Unknown command kind")]


our convention for error strings to start with small missing field ...

AhmedSoliman · 2025-05-13T10:02:49Z

crates/wal-protocol/src/lib.rs

+            )
+        }
+
+        fn decode_bilrosst<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {


extra s

Suggested change

fn decode_bilrosst<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {

fn decode_bilrost<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {

AhmedSoliman · 2025-05-13T10:06:03Z

crates/wal-protocol/src/lib.rs

+    pub fn encode(envelope: &super::Envelope) -> Result<Bytes, StorageEncodeError> {
+        // todo(azmy): avoid clone? this will require change to `From` implementation


I'm a little concerned about those extra clones.

Same here, this is why exactly I want to make sure there is no performance regression. I will report the numbers after running the bench

So we actually only copy (right now) with protobuf types so we can convert to the type dto, according to the benchmark this is still gains a lot of performance even with the copy. Hopefully once all variants switches to bilrost this won't be an issue anymore.

Variants that still uses flexbuffer also does not copy

AhmedSoliman · 2025-05-14T07:41:23Z

crates/partition-store/src/lib.rs

@@ -21,7 +21,7 @@ mod owned_iter;
 mod partition_store;
 mod partition_store_manager;
 pub mod promise_table;
-mod protobuf_types;
+pub mod protobuf_types;


That would be okay, if we move those types to restate-storage-api, then partition-store crate can depend on it.

AhmedSoliman · 2025-05-14T08:24:59Z

crates/types/src/storage/encode.rs

+
+/// Utility method to encode a [`bilrost::Message`] type
+pub fn encode_bilrost<T: bilrost::Message>(value: &T) -> Bytes {
+    value.encode_fast().into_vec().into()


Would it still make sense to use encode_fast? Would something like encode_contiguous be a better choice?

I'm also wonder if we should be using the length-delimited variants of these functions, would that help or hurt us?

encode_contiguous is an interesting option. But I am wondering why do you think the length-delimited variant would be useful ?

length-delimited was prior to realizing that you the outer envelope is another bifrost message. Still curious to hear your thoughts on contiguous.

I think contiguous is a good option. Since we need to convert the output to vec anyway after encoding. according to the docs ReverseBuffer::into_vec If this buffer is contiguous and has no spare capacity, the conversion will be performed without copying data. So this might be a nice improvement (the question is why it's not the default for fast_encode()).
Anyway, i changed the default net bilrost encoding now to use encode_contiguous it's part of this PR.

AhmedSoliman · 2025-05-14T08:25:08Z

crates/wal-protocol/src/lib.rs

@@ -8,20 +8,28 @@
 // the Business Source License, use of this software will be governed
 // by the Apache License, Version 2.0.

+use bytes::BufMut;
+use restate_types::storage::encode::encode_serde;


nit: import groups

AhmedSoliman · 2025-05-14T08:26:40Z

crates/wal-protocol/src/lib.rs

+            command: command?,
+        };
+
+        Ok(dto.encode_fast().into_vec().into())


same questions

AhmedSoliman · 2025-05-14T08:29:35Z

crates/wal-protocol/src/lib.rs

+                Command::Invoke(value.try_into()?)
+            }
+            CommandKind::TruncateOutbox => {
+                Command::TruncateOutbox(envelope.command.decode_serde()?)


How do you envision evolving the codec of the variants over time if we are not including a "codec tag" along with it. For instance, If we started serializing truncate as bilrost, or would we change the top-level codec kind on every variant change?

We do include the codec tag along with it. I am only not checking it during decode at the moment because I don't need it. It will only make sense to check this tag when we are actually going to change it.

But if we are decoding messages that was written by a future version, wouldn't we want to tell the user that we can't decode for this reason rather than blindly attempting to deserialize and hoping for the best?

Ah okay, I see what you mean

AhmedSoliman

Nice work. left you a small reminder to add a comment re. the mixups of dependencies.

AhmedSoliman · 2025-05-16T11:07:30Z

crates/encoding/src/lib.rs

@@ -62,7 +62,8 @@ impl_net_serde!(
    String,
    bytes::Bytes,
    bytestring::ByteString,
-    std::time::Duration
+    std::time::Duration,
+    std::path::PathBuf


Why do we need to send PathBuf over the wire?

I think this was left-over from another change that got forgotten during a histedit will remove it

AhmedSoliman · 2025-05-16T11:09:26Z

crates/partition-store/src/lib.rs

@@ -21,7 +21,7 @@ mod owned_iter;
 mod partition_store;
 mod partition_store_manager;
 pub mod promise_table;
-mod protobuf_types;
+pub mod protobuf_types;


To not risk forgetting about this and losing the context, can you file an issue and reference the issue number as a comment in the relevant places so our future selves can remember it?

AhmedSoliman · 2025-05-16T11:10:10Z

crates/wal-protocol/Cargo.toml

@@ -15,13 +15,15 @@ serde = ["dep:serde", "enum-map/serde", "bytestring/serde", "restate-invoker-api
 workspace-hack = { version = "0.1", path = "../../workspace-hack" }
 restate-core = { workspace = true }
 restate-invoker-api = { workspace = true }
+restate-partition-store = { workspace = true }


Same as what mentioned above re. referencing the github issue as a comment.

This PR enables `Envelope` to apply custom encoding strategies for individual Command variants, selecting the most efficient encoding available for each case. The design is forward-compatible, allowing future changes to the encoding of specific variants without breaking backward compatibility. ## Benches Benchmarks shows a big improvement for variants that uses protobuf (e.g invoke-cmd in this example). While also unfortunately shows a small regression in performance for variants that still uses flexbuffers internally. Those variants will eventually switch to bilrost overtime, so I think even if we have a slight decrease in performance right now this changeset will allow us to improve this later on. *NOTE:* To run the benchmarks locally make sure to modify the envelope `StorageEncode` implementation to actually use `Custom` encoding ## Before <details> <summary>Benchmarks results before switching to custom encoding</summary> ``` Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-5dda2eb5d1331e02) Gnuplot not found, using plotters backend replicated-loglet-serde.invoke-cmd/serialize-append time: [58.196 µs 58.248 µs 58.305 µs] change: [+64.298% +64.636% +64.929%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 50 measurements (12.00%) 1 (2.00%) low severe 1 (2.00%) low mild 4 (8.00%) high mild replicated-loglet-serde.invoke-cmd/deserialize-append time: [1.9243 µs 1.9267 µs 1.9295 µs] change: [-4.3010% -4.1884% -4.0626%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 50 measurements (8.00%) 2 (4.00%) high mild 2 (4.00%) high severe replicated-loglet-serde.invoke-cmd/serialize-store time: [58.438 µs 58.475 µs 58.517 µs] change: [+62.380% +62.647% +62.876%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 50 measurements (4.00%) 1 (2.00%) low mild 1 (2.00%) high mild replicated-loglet-serde.invoker-effect/serialize-append time: [36.209 µs 36.256 µs 36.319 µs] change: [-8.3523% -8.1153% -7.9057%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 50 measurements (8.00%) 1 (2.00%) high mild 3 (6.00%) high severe replicated-loglet-serde.invoker-effect/deserialize-append time: [1.9168 µs 1.9208 µs 1.9262 µs] change: [+1.3047% +1.6301% +1.8847%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 50 measurements (6.00%) 1 (2.00%) high mild 2 (4.00%) high severe replicated-loglet-serde.invoker-effect/serialize-store time: [36.556 µs 36.603 µs 36.663 µs] change: [-8.4178% -8.2808% -8.1429%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 50 measurements (10.00%) 2 (4.00%) high mild 3 (6.00%) high severe ``` </details> ### After <details> <summary>Benchmarks results after switching to custom encoding</summary> ``` Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-fdc6dca006aa05a2) Gnuplot not found, using plotters backend replicated-loglet-serde.invoke-cmd/serialize-append time: [35.439 µs 35.468 µs 35.500 µs] change: [-39.114% -39.039% -38.965%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 50 measurements (6.00%) 2 (4.00%) high mild 1 (2.00%) high severe replicated-loglet-serde.invoke-cmd/deserialize-append time: [1.8101 µs 1.8129 µs 1.8161 µs] change: [-6.0752% -5.9156% -5.7532%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 50 measurements (16.00%) 1 (2.00%) low mild 3 (6.00%) high mild 4 (8.00%) high severe replicated-loglet-serde.invoke-cmd/serialize-store time: [35.910 µs 35.944 µs 35.980 µs] change: [-38.591% -38.463% -38.312%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 50 measurements (6.00%) 2 (4.00%) high mild 1 (2.00%) high severe replicated-loglet-serde.invoker-effect/serialize-append time: [39.387 µs 39.453 µs 39.520 µs] change: [+8.3225% +8.5442% +8.7673%] (p = 0.00 < 0.05) Performance has regressed. replicated-loglet-serde.invoker-effect/deserialize-append time: [1.8490 µs 1.8517 µs 1.8547 µs] change: [-3.8572% -3.6454% -3.4561%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 50 measurements (8.00%) 4 (8.00%) high mild replicated-loglet-serde.invoker-effect/serialize-store time: [39.832 µs 39.868 µs 39.905 µs] change: [+8.6906% +8.8668% +9.0299%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 50 measurements (10.00%) 2 (4.00%) low mild 3 (6.00%) high severe ``` </details>

muhamadazmy mentioned this pull request May 12, 2025

Support #[net_serde(skip)] attribute #3242

Merged

muhamadazmy force-pushed the pr3261 branch 2 times, most recently from b61b20b to 8255b3c Compare May 12, 2025 18:02

muhamadazmy requested a review from AhmedSoliman May 12, 2025 18:04

muhamadazmy marked this pull request as ready for review May 12, 2025 18:04

muhamadazmy requested a review from tillrohrmann May 13, 2025 07:53

muhamadazmy force-pushed the pr3261 branch from 8255b3c to b09ab7a Compare May 13, 2025 09:35

muhamadazmy mentioned this pull request May 13, 2025

Bilrost Metadata messages #3212

Closed

AhmedSoliman reviewed May 13, 2025

View reviewed changes

muhamadazmy force-pushed the pr3261 branch 3 times, most recently from 69ec517 to e0b2f7b Compare May 13, 2025 13:22

muhamadazmy requested a review from AhmedSoliman May 13, 2025 13:26

AhmedSoliman reviewed May 14, 2025

View reviewed changes

muhamadazmy force-pushed the pr3261 branch 2 times, most recently from 7d49a15 to 049e800 Compare May 14, 2025 09:28

muhamadazmy requested a review from AhmedSoliman May 14, 2025 10:10

AhmedSoliman approved these changes May 16, 2025

View reviewed changes

muhamadazmy mentioned this pull request May 16, 2025

[follow up] remove dependency on restate-partition-store from wal-protocol #3284

Closed

muhamadazmy force-pushed the pr3261 branch 2 times, most recently from a1073ea to c228d7c Compare May 16, 2025 14:37

muhamadazmy force-pushed the pr3261 branch from c228d7c to 794d6c2 Compare May 16, 2025 14:39

muhamadazmy merged commit 5904f63 into restatedev:main May 16, 2025
54 checks passed

muhamadazmy deleted the pr3261 branch May 16, 2025 15:14

github-actions bot locked and limited conversation to collaborators May 16, 2025

	fn decode_bilrosst<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {
	fn decode_bilrost<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {

		pub fn encode(envelope: &super::Envelope) -> Result<Bytes, StorageEncodeError> {
		// todo(azmy): avoid clone? this will require change to `From` implementation

WAL: Custom Encoding for Envelope #3261

WAL: Custom Encoding for Envelope #3261

Uh oh!

Conversation

muhamadazmy commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benches

Before

After

Uh oh!

muhamadazmy commented May 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AhmedSoliman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WAL: Custom Encoding for `Envelope` #3261

WAL: Custom Encoding for `Envelope` #3261

muhamadazmy commented May 12, 2025 •

edited

Loading