Skip to content

Conversation

pcholakov
Copy link
Contributor

@pcholakov pcholakov commented Aug 6, 2025

Adds a new mode which bypasses cluster node enumeration, and instead talks only to a single specified address. This is helpful when the target nodes are behind a load balancer. In the fullness of time we'll move more of the intra-cluster comms behind the cluster control service, so direct connectivity won't be required, but this is a useful workaround to have in the meantime.

~/restate/restatectl % cargo run --bin restatectl -- -S restate-XXXXXXXX.elb.eu-central-1.amazonaws.com status
...
Node Configuration (v13)
 NODE-ID  NAME                                                                                               UPTIME   METADATA  LEADER  FOLLOWER  NODESET-MEMBER  SEQUENCER  ROLES
 N1:2     arn:aws:ecs:eu-central-1:663487780041:task/restate-cluster-pavel/4b7b306b6c4e491ab9f15f397a135d0c  19m 56s            0       0         0               0          admin | http-ingress
 N2:1     arn:aws:ecs:eu-central-1:663487780041:task/restate-cluster-pavel/d8f8ef7f2a64407bad32c68068e20564  20m 32s            43      40        128             43         log-server | worker
 N3:1     arn:aws:ecs:eu-central-1:663487780041:task/restate-cluster-pavel/b4c50ad38f554fca98c7e6b5839d44e1  20m 35s            46      44        128             46         log-server | worker
 N4:1     arn:aws:ecs:eu-central-1:663487780041:task/restate-cluster-pavel/64ea81ceaad741b79e0b3d79bed5bc41  20m 34s            39      44        128             39         log-server | worker
 N5:1     arn:aws:ecs:eu-central-1:663487780041:task/restate-cluster-pavel/3ea5df4ea8f44b5e8dd792dc542c4c6a  19m 29s            0       0         0               0          admin | http-ingress
 N6:1     arn:aws:ecs:eu-central-1:663487780041:task/restate-cluster-pavel/8e9e0c94eca54a5d9631be445533045f  19m 28s            0       0         0               0          admin | http-ingress

Landing on the wrong node (depending on the required role by the subcommand) is reported as an error. When targeting a load balancer, the nodes behind it are expected to be homogenous. If not then calls will fail spuriously:

# this port routes to an admin node
% restatectl -S localhost:25122 status
Node Configuration (v12)
 NODE-ID  NAME   UPTIME  METADATA  LEADER  FOLLOWER  NODESET-MEMBER  SEQUENCER  ROLES
 N1:2     node1  37s     Member    4       3         12              4          http-ingress | log-server | metadata-server | worker
 N2:1     node2  37s     Member    3       5         12              3          admin | http-ingress | log-server | metadata-server | worker
 N3:1     node3  37s     Member    5       4         12              5          admin | http-ingress | log-server | metadata-server | worker

# this port does not
% restatectl -S localhost:5122 status
Error: Single address mode: node http://localhost:5122/ does not have the required role 'admin'. Node has roles: [worker, metadata-server, log-server, http-ingress]

/cc: @jackkleeman

@pcholakov pcholakov requested a review from muhamadazmy August 6, 2025 08:25
@jackkleeman
Copy link
Contributor

Are there still any restatectl operations that rely on finding and talking to a node with a particular role, like the cluster controller?

Copy link

github-actions bot commented Aug 6, 2025

Test Results

  7 files  ± 0    7 suites  ±0   3m 58s ⏱️ + 1m 1s
 54 tests + 2   53 ✅ + 2  1 💤 ±0  0 ❌ ±0 
223 runs  +10  220 ✅ +10  3 💤 ±0  0 ❌ ±0 

Results for commit 5579641. ± Comparison against base commit fb00afb.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@AhmedSoliman AhmedSoliman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is considered as a short-term solution and wouldn't play as an excuse to defer removing the actual requirement altogether.

@@ -9,6 +9,7 @@
// by the Apache License, Version 2.0.

use std::collections::{HashMap, HashSet};
use std::future::Future;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually needed? Future should be in prelude in 2024 edition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my mistake! I didn't realise this is now in the prelude.

@pcholakov pcholakov force-pushed the restatectl-single-address branch from dddd817 to 9e3c519 Compare August 6, 2025 19:58
@pcholakov
Copy link
Contributor Author

Are there still any restatectl operations that rely on finding and talking to a node with a particular role, like the cluster controller?

There are! I made it nicer now - if we land on the wrong role, it will at least error out with something meaningful.

I hope this is considered as a short-term solution and wouldn't play as an excuse to defer removing the actual requirement altogether.

Definitely! I should have made that clearer. I wanted to get this merged as a short-term workaround for a customer who is operating in a restricted environment.

@pcholakov pcholakov force-pushed the restatectl-single-address branch from 9e3c519 to 5579641 Compare August 6, 2025 20:04
Copy link
Contributor

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks good to me. There might be some operations that require landing on a certain role. But maybe metadata operations can always use a single address (no majority consensus). Hence a nice follow up can just use address cli option, and only connect to a different node if we need that certain role?

@muhamadazmy muhamadazmy merged commit 4969b95 into main Aug 7, 2025
27 checks passed
@muhamadazmy muhamadazmy deleted the restatectl-single-address branch August 7, 2025 10:17
@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants