Skip to content

Conversation

AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Apr 24, 2025

Network swimlanes categorize network connections into different groups based on their characteristics. This PR is a continuation of the work done in the previous PR, which introduced the concept of swimlanes. The goal is to enhance the network management capabilities of the system by allowing for more granular control and monitoring of network connections.

Most notable changes:

  • We keep a single connection per swimlane instead of a pool of connections. This removes the configuration option num-concurrent-connections.
  • Connections/stream's window sizes are now configurable and are set to reasonable defaults that suit our workloads better than tonic/hyper's defaults.
  • Removal of outbound-queue-length in networking in favor of relying on memory-level settings (stream window sizes)
  • Introducing swimlane in Hello message to communicate the swimlane to the peer during handshake
  • Gossip connections are automatically bidirectional

Stack created with Sapling. Best reviewed with ReviewStack.

Copy link

github-actions bot commented Apr 24, 2025

Test Results

  7 files  +  3    7 suites  +3   4m 18s ⏱️ + 3m 16s
 54 tests + 38   52 ✅ + 36  1 💤 +1  0 ❌ ±0  1 🔥 +1 
223 runs  +191  219 ✅ +187  3 💤 +3  0 ❌ ±0  1 🔥 +1 

For more details on these errors, see this check.

Results for commit a63afa4. ± Comparison against base commit d100b46.

This pull request removes 16 and adds 54 tests. Note that renamed tests count towards both.
dev.restate.sdktesting.tests.AwakeableIngressEndpoint ‑ completeWithFailure(Client)
dev.restate.sdktesting.tests.AwakeableIngressEndpoint ‑ completeWithSuccess(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$NewVersion ‑ completeAwakeable(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$NewVersion ‑ completeRetryableOperation(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$NewVersion ‑ proxyCallShouldBeDone(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$NewVersion ‑ proxyOneWayCallShouldBeDone(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$OldVersion ‑ createAwakeable(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$OldVersion ‑ startOneWayProxyCall(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$OldVersion ‑ startProxyCall(Client)
dev.restate.sdktesting.tests.FrontCompatibilityTest$OldVersion ‑ startRetryableOperation(Client)
…
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[1]
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[2]
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[3]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[1]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[2]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[3]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[1]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[2]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[3]
dev.restate.sdktesting.tests.Combinators ‑ awakeableOrTimeoutUsingAwaitAny(Client)
…

♻️ This comment has been updated with latest results.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different pieces are falling one after another into places. That's very nice to see. Really nice continuation of the started work @AhmedSoliman 👏 LGTM. +1 for merging :-)

let router = guard.router.clone();

let existing_connection = if should_register {
guard.register(connection.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we registering connection twice? Once here and once through the ConnectionTracking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Intentionally though. In this case, the other one can't put back the original connection. This only impacts incoming connections via accept() though.


let reactor = ConnectionReactor::new(connection.clone(), shared, None, self.router.clone());

let previous = self.register(connection.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the explicit register call required here? It looks that it will also happen through the ConnectionTracking trait which is called in the ConnectionReactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we didn't do that, there could be more loopback connections than we need because we let go of the lock between these two steps.

Comment on lines +536 to +540
let direction = if swimlane == Swimlane::Gossip {
ConnectionDirection::Bidirectional
} else {
ConnectionDirection::Forward
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name might be a bit misleading with this change since it suggests that we are creating a forward connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, will rename.

This commit introduces a new network protocol version with automatica backward compatibility layer and an accompanying modified networking API to provider support for the following:
1. Native RPC in protocol level with fabric-level service and service-shard routing. Routing to service shards is offered via an opaque `SortCode`, a u64 value that can be used to route messages to a specific shard of a service (e.g. `PartitionId`, or `LogletId`, and etc.).
2. Support for cancellation of enqueued egress RPC requests if the caller is not interested in the result anymore.
3. Message payload deserialization offloaded from the network reactor to the call-site's thread to reduce the network reactor's CPU usage and reduce the effect of head-of-line blocking caused by expensive messages.
5. Adds the concept of `Service` that can handle `rpc`, `unary`, and `watch` messages.
6. Improved ergonomics for using the message fabric for sending rpc requests, no need to get access to `MessageRouterBuilder` anymore, this unlocks the ability to create arbitrary connections that are self-managed (not tied to connection manager).
7. WIP Introduces `Swimlane`s concept to classify streams/connections into different swim lanes. This will provide isolation between fat data streams and metadata low-latency streams in the future.
8. WIP support for "remote watches". Not fully implemented, but will be available in the future.
9. A variety of fixes and improvements to the existing code.
Network swimlanes categorize network connections into different groups based on their characteristics. This PR is a continuation of the work done in the previous PR, which introduced the concept of swimlanes. The goal is to enhance the network management capabilities of the system by allowing for more granular control and monitoring of network connections.

Most notable changes:
- We keep a single connection per swimlane instead of a pool of connections. This removes the configuration option `num-concurrent-connections`.
- Connections/stream's window sizes are now configurable and are set to reasonable defaults that suit our workloads better than tonic/hyper's defaults.
- Removal of `outbound-queue-length` in networking in favor of relying on memory-level settings (stream window sizes)
- Introducing `swimlane` in `Hello` message to communicate the swimlane to the peer during handshake
- Gossip connections are automatically bidirectional
@AhmedSoliman AhmedSoliman merged commit a63afa4 into main Apr 25, 2025
53 of 55 checks passed
@AhmedSoliman AhmedSoliman deleted the pr3193 branch April 25, 2025 14:34
@github-actions github-actions bot locked and limited conversation to collaborators Apr 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants