Skip to content

Cannot elect leader when cluster nodes up from 1 to 2 #10516

@MasonXon

Description

@MasonXon

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

When I do the consul auto-election testing, I found that when 2/3 node down, the cluster is stopped to work, I know this is right, when I start a node before stopped, now 2/3 node is started, but the log always output election timeout, and cluster still cannot provides services, with version 1.6.10, everything is ok, start with 1.7.0 is not ok, I tested version 1.6.10、1.7.0、1.9.6、1.10.0, only 1.6.10 work normally

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a cluster with 3 server nodes
    2.stop 1/3 node
    3.stop 2/3 node
    4.start 1/2 node before stopped
    5.now 2/3 node started
    6.cluster is not work, there is no leader

Consul info for both Client and Server

Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease =
	revision = 95fb95bf
	version = 1.7.0
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr =
	server = true
raft:
	applied_index = 0
	commit_index = 0
	fsm_pending = 0
	last_contact = never
	last_log_index = 68
	last_log_term = 6
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc Address:10.6.0.21:8300} {Suffrage:Voter ID:d244e694-c619-cf0a-e3d6-701bd510b70d Address:10.6.0.22:8300} {Suffrage:Voter ID:5fc5e757-a1c5-e6f0-ed28-3149d68e44bf Address:10.6.0.23:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Candidate
	term = 108
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 73
	max_procs = 2
	os = linux
	version = go1.12.16
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 5
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 14
	members = 2
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 14
	members = 2
	query_queue = 0
	query_time = 1

Operating system and Environment details

cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
uname -a
Linux consul-02 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Virtual Machine

Log Fragments

consul members
Node Address Status Type Build Protocol DC Segment
consul-02 10.6.0.22:8301 alive server 1.7.0 2 my-dc-1
consul-03 10.6.0.23:8301 alive server 1.7.0 2 my-dc-1
consul operator raft list-peers
Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)

consul-01
now is stopped

consul-02
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.168Z [WARN] agent.server.raft: Election timeout reached, restarting election
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.168Z [INFO] agent.server.raft: entering candidate state: node="Node at 10.6.0.22:8300 [Candidate]" term=122
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.169Z [WARN] agent.server.raft: unable to get address for sever, using fallback address: id=bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc fallback=10.6.0.21:8300 error="Could not find address for server id bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc"
Jun 25 14:40:27 consul-02 consul: 2021-06-25T14:40:27.169Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc 10.6.0.21:8300}" error="dial tcp ->10.6.0.21:8300: connect: connection refused"

consul-03
Jun 25 14:40:56 consul-03 consul: 2021-06-25T14:40:56.707Z [INFO] agent.server.raft: entering follower state: follower="Node at 10.6.0.23:8300 [Follower]" leader=
Jun 25 14:41:00 consul-03 consul: 2021-06-25T14:41:00.100Z [ERROR] agent: Coordinate update error: error="No cluster leader"
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.921Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.921Z [INFO] agent.server.raft: entering candidate state: node="Node at 10.6.0.23:8300 [Candidate]" term=128
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.922Z [WARN] agent.server.raft: unable to get address for sever, using fallback address: id=bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc fallback=10.6.0.21:8300 error="Could not find address for server id bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc"
Jun 25 14:41:01 consul-03 consul: 2021-06-25T14:41:01.923Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc 10.6.0.21:8300}" error="dial tcp ->10.6.0.21:8300: connect: connection refused"
Jun 25 14:41:06 consul-03 consul: 2021-06-25T14:41:06.683Z [INFO] agent.server.raft: duplicate requestVote for same term: term=128
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.135Z [WARN] agent.server.raft: Election timeout reached, restarting election
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.135Z [INFO] agent.server.raft: entering candidate state: node="Node at 10.6.0.23:8300 [Candidate]" term=129
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.136Z [WARN] agent.server.raft: unable to get address for sever, using fallback address: id=bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc fallback=10.6.0.21:8300 error="Could not find address for server id bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc"
Jun 25 14:41:07 consul-03 consul: 2021-06-25T14:41:07.137Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter bfb6b7bc-3cbd-6c1a-b3b2-f22e0c705afc 10.6.0.21:8300}" error="dial tcp ->10.6.0.21:8300: connect: connection refused"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions