-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.
Overview of the Issue
When I do the consul auto-election testing, I found that when 2/3 node down, the cluster is stopped to work, I know this is right, when I start a node before stopped, now 2/3 node is started, but the log always output election timeout, and cluster still cannot provides services, with version 1.6.10, everything is ok, start with 1.7.0 is not ok, I tested version 1.6.10、1.7.0、1.9.6、1.10.0, only 1.6.10 work normally
Reproduction Steps
Steps to reproduce this issue, eg:
- Create a cluster with 3 server nodes
2.stop 1/3 node
3.stop 2/3 node
4.start 1/2 node before stopped
5.now 2/3 node started
6.cluster is not work, there is no leader
Consul info for both Client and Server
Server info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 95fb95bf
version = 1.7.0
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = false
leader_addr =
server = true
raft:
applied_index = 0
commit_index = 0
fsm_pending = 0
last_contact = never
last_log_index = 68
last_log_term = 6
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc Address:10.6.0.21:8300} {Suffrage:Voter ID:d244e694-c619-cf0a-e3d6-701bd510b70d Address:10.6.0.22:8300} {Suffrage:Voter ID:5fc5e757-a1c5-e6f0-ed28-3149d68e44bf Address:10.6.0.23:8300}]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Candidate
term = 108
runtime:
arch = amd64
cpu_count = 2
goroutines = 73
max_procs = 2
os = linux
version = go1.12.16
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 5
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 14
members = 2
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 14
members = 2
query_queue = 0
query_time = 1
Operating system and Environment details
cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
uname -a
Linux consul-02 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Virtual Machine
Log Fragments
consul members
Node Address Status Type Build Protocol DC Segment
consul-02 10.6.0.22:8301 alive server 1.7.0 2 my-dc-1
consul-03 10.6.0.23:8301 alive server 1.7.0 2 my-dc-1
consul operator raft list-peers
Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)
consul-01
now is stopped
consul-02
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.168Z [WARN] agent.server.raft: Election timeout reached, restarting election
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.168Z [INFO] agent.server.raft: entering candidate state: node="Node at 10.6.0.22:8300 [Candidate]" term=122
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.169Z [WARN] agent.server.raft: unable to get address for sever, using fallback address: id=bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc fallback=10.6.0.21:8300 error="Could not find address for server id bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc"
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.169Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc 10.6.0.21:8300}" error="dial tcp ->10.6.0.21:8300: connect: connection refused"
consul-03
Jun 25 14:40:56 consul-03 consul: 2021-06-25T14:40:56.707Z [INFO] agent.server.raft: entering follower state: follower="Node at 10.6.0.23:8300 [Follower]" leader=
Jun 25 14:41:00 consul-03 consul: 2021-06-25T14:41:00.100Z [ERROR] agent: Coordinate update error: error="No cluster leader"
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.921Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.921Z [INFO] agent.server.raft: entering candidate state: node="Node at 10.6.0.23:8300 [Candidate]" term=128
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.922Z [WARN] agent.server.raft: unable to get address for sever, using fallback address: id=bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc fallback=10.6.0.21:8300 error="Could not find address for server id bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc"
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.923Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc 10.6.0.21:8300}" error="dial tcp ->10.6.0.21:8300: connect: connection refused"
Jun 25 14:41:06 consul-03 consul: 2021-06-25T14:41:06.683Z [INFO] agent.server.raft: duplicate requestVote for same term: term=128
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.135Z [WARN] agent.server.raft: Election timeout reached, restarting election
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.135Z [INFO] agent.server.raft: entering candidate state: node="Node at 10.6.0.23:8300 [Candidate]" term=129
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.136Z [WARN] agent.server.raft: unable to get address for sever, using fallback address: id=bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc fallback=10.6.0.21:8300 error="Could not find address for server id bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc"
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.137Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc 10.6.0.21:8300}" error="dial tcp ->10.6.0.21:8300: connect: connection refused"