Skip to content

Conversation

muhamadazmy
Copy link
Contributor

@muhamadazmy muhamadazmy commented May 12, 2025

WAL: Custom Encoding for Envelope

This PR enables Envelope to apply custom encoding strategies for individual Command variants, selecting the most efficient encoding available for each case.

The design is forward-compatible, allowing future changes to the encoding of specific variants without breaking backward compatibility.

Benches

Benchmarks shows a big improvement for variants that uses protobuf (e.g invoke-cmd in this example).
While also unfortunately shows a small regression in performance for variants that still uses flexbuffers internally.

Those variants will eventually switch to bilrost overtime, so I think even if we have a slight decrease in performance right now
this changeset will allow us to improve this later on.

NOTE: To run the benchmarks locally make sure to modify the envelope StorageEncode implementation
to actually use Custom encoding

Before

Benchmarks results before switching to custom encoding
     Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-5dda2eb5d1331e02)
Gnuplot not found, using plotters backend
replicated-loglet-serde.invoke-cmd/serialize-append
                        time:   [58.196 µs 58.248 µs 58.305 µs]
                        change: [+64.298% +64.636% +64.929%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 50 measurements (12.00%)
  1 (2.00%) low severe
  1 (2.00%) low mild
  4 (8.00%) high mild
replicated-loglet-serde.invoke-cmd/deserialize-append
                        time:   [1.9243 µs 1.9267 µs 1.9295 µs]
                        change: [-4.3010% -4.1884% -4.0626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
replicated-loglet-serde.invoke-cmd/serialize-store
                        time:   [58.438 µs 58.475 µs 58.517 µs]
                        change: [+62.380% +62.647% +62.876%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild

replicated-loglet-serde.invoker-effect/serialize-append
                        time:   [36.209 µs 36.256 µs 36.319 µs]
                        change: [-8.3523% -8.1153% -7.9057%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  1 (2.00%) high mild
  3 (6.00%) high severe
replicated-loglet-serde.invoker-effect/deserialize-append
                        time:   [1.9168 µs 1.9208 µs 1.9262 µs]
                        change: [+1.3047% +1.6301% +1.8847%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  1 (2.00%) high mild
  2 (4.00%) high severe
replicated-loglet-serde.invoker-effect/serialize-store
                        time:   [36.556 µs 36.603 µs 36.663 µs]
                        change: [-8.4178% -8.2808% -8.1429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) high mild
  3 (6.00%) high severe

After

Benchmarks results after switching to custom encoding
     Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-fdc6dca006aa05a2)
Gnuplot not found, using plotters backend
replicated-loglet-serde.invoke-cmd/serialize-append
                        time:   [35.439 µs 35.468 µs 35.500 µs]
                        change: [-39.114% -39.039% -38.965%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
replicated-loglet-serde.invoke-cmd/deserialize-append
                        time:   [1.8101 µs 1.8129 µs 1.8161 µs]
                        change: [-6.0752% -5.9156% -5.7532%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 50 measurements (16.00%)
  1 (2.00%) low mild
  3 (6.00%) high mild
  4 (8.00%) high severe
replicated-loglet-serde.invoke-cmd/serialize-store
                        time:   [35.910 µs 35.944 µs 35.980 µs]
                        change: [-38.591% -38.463% -38.312%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe

replicated-loglet-serde.invoker-effect/serialize-append
                        time:   [39.387 µs 39.453 µs 39.520 µs]
                        change: [+8.3225% +8.5442% +8.7673%] (p = 0.00 < 0.05)
                        Performance has regressed.
replicated-loglet-serde.invoker-effect/deserialize-append
                        time:   [1.8490 µs 1.8517 µs 1.8547 µs]
                        change: [-3.8572% -3.6454% -3.4561%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
replicated-loglet-serde.invoker-effect/serialize-store
                        time:   [39.832 µs 39.868 µs 39.905 µs]
                        change: [+8.6906% +8.8668% +9.0299%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) low mild
  3 (6.00%) high severe

@muhamadazmy
Copy link
Contributor Author

@AhmedSoliman I still need to run some benches to make sure this does not affect performance (in flexbuffers variants cases).

@muhamadazmy muhamadazmy requested a review from AhmedSoliman May 12, 2025 18:04
@muhamadazmy muhamadazmy marked this pull request as ready for review May 12, 2025 18:04
@muhamadazmy muhamadazmy requested a review from tillrohrmann May 13, 2025 07:53
@@ -21,7 +21,7 @@ mod owned_iter;
mod partition_store;
mod partition_store_manager;
pub mod promise_table;
mod protobuf_types;
pub mod protobuf_types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done in a follow-up PR:
One idea is to move this along with the storage proto files to restate-storage-api instead and avoid creating a dependency between wal-protocol and partition-store. The aim is to avoid pulling rocksdb as a dependency to wal-protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to check how hard that will be and I found that protobuf_types also uses some types from the partition-store crate that are not easily movable. So I will have to come back for this later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be okay, if we move those types to restate-storage-api, then partition-store crate can depend on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not risk forgetting about this and losing the context, can you file an issue and reference the issue number as a comment in the relevant places so our future selves can remember it?

Comment on lines 61 to 62
/// Custom encoding
Custom = 0xff,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- Why 0xff? Blocking the max value makes it impossible to do future checks like if kind >= 6 { // not supported }.
2- I'd appreciate if you add a comment explaining this codec type a little more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted Custom to be distinguished from other types. So I was thinking either 0 or 0xff since custom encoding has no macros support and need to be implemented manually. But it doesn't really matter what value it's set to.

I will add more docs

@@ -15,13 +15,15 @@ serde = ["dep:serde", "enum-map/serde", "bytestring/serde", "restate-invoker-api
workspace-hack = { version = "0.1", path = "../../workspace-hack" }
restate-core = { workspace = true }
restate-invoker-api = { workspace = true }
restate-partition-store = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the culprit mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as what mentioned above re. referencing the github issue as a comment.

Comment on lines 68 to 83
fn default_codec(&self) -> StorageCodecKind {
StorageCodecKind::Custom
}

fn encode(&self, buf: &mut ::bytes::BytesMut) -> Result<(), StorageEncodeError> {
buf.put_slice(&envelope::encode(self)?);
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using this encoding by default breaks backward compatibility with v1.2.x and v1.3.2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right approach is to only decode the custom encoding in v1.3 and only start using it to encode in v1.4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i see. Thanks for pointing this out. Good point!

Comment on lines 281 to 283
#[error("Missing field codec")]
MissingFieldCodec,
#[error("Unknown command kind")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our convention for error strings to start with small missing field ...

)
}

fn decode_bilrosst<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra s

Suggested change
fn decode_bilrosst<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {
fn decode_bilrost<T: bilrost::OwnedMessage>(mut self) -> Result<T, StorageDecodeError> {

Comment on lines +403 to +423
pub fn encode(envelope: &super::Envelope) -> Result<Bytes, StorageEncodeError> {
// todo(azmy): avoid clone? this will require change to `From` implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about those extra clones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this is why exactly I want to make sure there is no performance regression. I will report the numbers after running the bench

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we actually only copy (right now) with protobuf types so we can convert to the type dto, according to the benchmark this is still gains a lot of performance even with the copy. Hopefully once all variants switches to bilrost this won't be an issue anymore.

Variants that still uses flexbuffer also does not copy

@muhamadazmy muhamadazmy force-pushed the pr3261 branch 3 times, most recently from 69ec517 to e0b2f7b Compare May 13, 2025 13:22
@muhamadazmy muhamadazmy requested a review from AhmedSoliman May 13, 2025 13:26
@@ -21,7 +21,7 @@ mod owned_iter;
mod partition_store;
mod partition_store_manager;
pub mod promise_table;
mod protobuf_types;
pub mod protobuf_types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be okay, if we move those types to restate-storage-api, then partition-store crate can depend on it.


/// Utility method to encode a [`bilrost::Message`] type
pub fn encode_bilrost<T: bilrost::Message>(value: &T) -> Bytes {
value.encode_fast().into_vec().into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it still make sense to use encode_fast? Would something like encode_contiguous be a better choice?

I'm also wonder if we should be using the length-delimited variants of these functions, would that help or hurt us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode_contiguous is an interesting option. But I am wondering why do you think the length-delimited variant would be useful ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length-delimited was prior to realizing that you the outer envelope is another bifrost message. Still curious to hear your thoughts on contiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think contiguous is a good option. Since we need to convert the output to vec anyway after encoding. according to the docs ReverseBuffer::into_vec If this buffer is contiguous and has no spare capacity, the conversion will be performed without copying data. So this might be a nice improvement (the question is why it's not the default for fast_encode()).
Anyway, i changed the default net bilrost encoding now to use encode_contiguous it's part of this PR.

@@ -8,20 +8,28 @@
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0.

use bytes::BufMut;
use restate_types::storage::encode::encode_serde;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import groups

command: command?,
};

Ok(dto.encode_fast().into_vec().into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same questions

Command::Invoke(value.try_into()?)
}
CommandKind::TruncateOutbox => {
Command::TruncateOutbox(envelope.command.decode_serde()?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you envision evolving the codec of the variants over time if we are not including a "codec tag" along with it. For instance, If we started serializing truncate as bilrost, or would we change the top-level codec kind on every variant change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do include the codec tag along with it. I am only not checking it during decode at the moment because I don't need it. It will only make sense to check this tag when we are actually going to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we are decoding messages that was written by a future version, wouldn't we want to tell the user that we can't decode for this reason rather than blindly attempting to deserialize and hoping for the best?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I see what you mean

@muhamadazmy muhamadazmy force-pushed the pr3261 branch 2 times, most recently from 7d49a15 to 049e800 Compare May 14, 2025 09:28
@muhamadazmy muhamadazmy requested a review from AhmedSoliman May 14, 2025 10:10
Copy link
Contributor

@AhmedSoliman AhmedSoliman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. left you a small reminder to add a comment re. the mixups of dependencies.

@@ -62,7 +62,8 @@ impl_net_serde!(
String,
bytes::Bytes,
bytestring::ByteString,
std::time::Duration
std::time::Duration,
std::path::PathBuf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to send PathBuf over the wire?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was left-over from another change that got forgotten during a histedit will remove it

@@ -21,7 +21,7 @@ mod owned_iter;
mod partition_store;
mod partition_store_manager;
pub mod promise_table;
mod protobuf_types;
pub mod protobuf_types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not risk forgetting about this and losing the context, can you file an issue and reference the issue number as a comment in the relevant places so our future selves can remember it?

@@ -15,13 +15,15 @@ serde = ["dep:serde", "enum-map/serde", "bytestring/serde", "restate-invoker-api
workspace-hack = { version = "0.1", path = "../../workspace-hack" }
restate-core = { workspace = true }
restate-invoker-api = { workspace = true }
restate-partition-store = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as what mentioned above re. referencing the github issue as a comment.

This PR enables `Envelope` to apply custom encoding strategies for individual Command variants, selecting the most efficient encoding available for each case.

The design is forward-compatible, allowing future changes to the encoding of specific variants without breaking backward compatibility.

## Benches
Benchmarks shows a big improvement for variants that uses protobuf (e.g invoke-cmd in this example).
While also unfortunately shows a small regression in performance for variants that still uses flexbuffers internally.

Those variants will eventually switch to bilrost overtime, so I think even if we have a slight decrease in performance right now
this changeset will allow us to improve this later on.

*NOTE:* To run the benchmarks locally make sure to modify the envelope `StorageEncode` implementation
to actually use `Custom` encoding

## Before
<details>

<summary>Benchmarks results before switching to custom encoding</summary>

```
     Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-5dda2eb5d1331e02)
Gnuplot not found, using plotters backend
replicated-loglet-serde.invoke-cmd/serialize-append
                        time:   [58.196 µs 58.248 µs 58.305 µs]
                        change: [+64.298% +64.636% +64.929%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 50 measurements (12.00%)
  1 (2.00%) low severe
  1 (2.00%) low mild
  4 (8.00%) high mild
replicated-loglet-serde.invoke-cmd/deserialize-append
                        time:   [1.9243 µs 1.9267 µs 1.9295 µs]
                        change: [-4.3010% -4.1884% -4.0626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  2 (4.00%) high mild
  2 (4.00%) high severe
replicated-loglet-serde.invoke-cmd/serialize-store
                        time:   [58.438 µs 58.475 µs 58.517 µs]
                        change: [+62.380% +62.647% +62.876%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild

replicated-loglet-serde.invoker-effect/serialize-append
                        time:   [36.209 µs 36.256 µs 36.319 µs]
                        change: [-8.3523% -8.1153% -7.9057%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  1 (2.00%) high mild
  3 (6.00%) high severe
replicated-loglet-serde.invoker-effect/deserialize-append
                        time:   [1.9168 µs 1.9208 µs 1.9262 µs]
                        change: [+1.3047% +1.6301% +1.8847%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  1 (2.00%) high mild
  2 (4.00%) high severe
replicated-loglet-serde.invoker-effect/serialize-store
                        time:   [36.556 µs 36.603 µs 36.663 µs]
                        change: [-8.4178% -8.2808% -8.1429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) high mild
  3 (6.00%) high severe
```

</details>

### After
<details>

<summary>Benchmarks results after switching to custom encoding</summary>

```
     Running benches/replicated_loglet_serde.rs (target/release/deps/replicated_loglet_serde-fdc6dca006aa05a2)
Gnuplot not found, using plotters backend
replicated-loglet-serde.invoke-cmd/serialize-append
                        time:   [35.439 µs 35.468 µs 35.500 µs]
                        change: [-39.114% -39.039% -38.965%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
replicated-loglet-serde.invoke-cmd/deserialize-append
                        time:   [1.8101 µs 1.8129 µs 1.8161 µs]
                        change: [-6.0752% -5.9156% -5.7532%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 50 measurements (16.00%)
  1 (2.00%) low mild
  3 (6.00%) high mild
  4 (8.00%) high severe
replicated-loglet-serde.invoke-cmd/serialize-store
                        time:   [35.910 µs 35.944 µs 35.980 µs]
                        change: [-38.591% -38.463% -38.312%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe

replicated-loglet-serde.invoker-effect/serialize-append
                        time:   [39.387 µs 39.453 µs 39.520 µs]
                        change: [+8.3225% +8.5442% +8.7673%] (p = 0.00 < 0.05)
                        Performance has regressed.
replicated-loglet-serde.invoker-effect/deserialize-append
                        time:   [1.8490 µs 1.8517 µs 1.8547 µs]
                        change: [-3.8572% -3.6454% -3.4561%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high mild
replicated-loglet-serde.invoker-effect/serialize-store
                        time:   [39.832 µs 39.868 µs 39.905 µs]
                        change: [+8.6906% +8.8668% +9.0299%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) low mild
  3 (6.00%) high severe
```
</details>
@muhamadazmy muhamadazmy merged commit 5904f63 into restatedev:main May 16, 2025
54 checks passed
@muhamadazmy muhamadazmy deleted the pr3261 branch May 16, 2025 15:14
@github-actions github-actions bot locked and limited conversation to collaborators May 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants