-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Describe the bug
I followed this tutorial to create a Vault cluster with Raft as storage. After simulating an outage with all 3 nodes lost and recovering a single node using peers.json, I tried joining new nodes to the recovered node, however after the command (vault operator raft join http:...) its console throws these errors periodically:
[ERROR] storage.raft: failed to send snapshot to: peer="{Non_voter vault_3 127.0.0.1:8401}" error="sync vault\raft\snapshots: Handle non valido." (last part is in italian, same language as OS, it means Invalid Handle)
[ERROR] storage.raft: failed to get log: index=1 error="log not found"
[ERROR] storage.raft: failed to install snapshot: id=bolt-snapshot error="sync vault\raft\snapshots: Handle non valido."
The joining node console throws these errors:
[INFO] storage.raft.snapshot: creating new snapshot: path=vault\raft\snapshots\......
[ERROR] storage.raft.snapshot: failed to move snapshot into place: error="sync vault\raft\snapshots: Handle non valido"
[ERROR] storage.raft.snapshot: failed to finalize snapshot: error="sync vault\raft\snapshots: Handle non valido"
[INFO] storage.raft.snapshot: reaping snapshot: path=vault\raft\snapshots\.......
This behaviour only happens in Windows, using the online enviroment offered in the tutorial it doesn't happen.
To Reproduce
Steps to reproduce the behavior:
Follow the steps in the tutorial until "Retry Join" to create a cluster of 3 nodes (plus a server for the autounseal using Transit Secret Engine.)
Stop all nodes in the cluster.
Recover vault_2 using the peers.json method.
Try joining a new node to vault_2.
As this is a bit tedious to reproduce in Windows, since the automated script offered in the tutorial only works for Linux OS, I made a semi-automated equivalent using bat files:
https://drive.google.com/file/d/1GHbNmBG0niRkIYB6Qc4KVHdGjPrqPPIi/view?usp=sharing
Follow the README to simulate the error.
Expected behavior
The new nodes successfully joins the cluster.
Environment:
- Vault Server Version: 1.7.3
- Vault CLI Version: Vault v1.7.3 (5d517c8)
- Server Operating System/Architecture: Windows 10 x64 20H2
Vault server configuration file(s):
Autounseal vault:
storage "raft" {
path = "./vault"
node_id = "vault_1"
}
listener "tcp" {
address = "127.0.0.1:8200"
cluster_address = "127.0.0.1:8201"
tls_disable = true
}
disable_mlock = true
cluster_addr = "http://127.0.0.1:8201"
api_addr = "http://127.0.0.1:8200"
Cluster nodes (with different api/cluster ports):
storage "raft" {
path = "./vault"
node_id = "vault_2"
}
listener "tcp" {
address = "127.0.0.1:8300"
cluster_address = "127.0.0.1:8301"
tls_disable = true
}
seal "transit" {
address = "http://127.0.0.1:8200"
# token is read from VAULT_TOKEN env
# token = ""
disable_renewal = "false"
key_name = "autounseal"
mount_path = "transit/"
tls_skip_verify = "true"
}
disable_mlock = true
cluster_addr = "http://127.0.0.1:8301"
api_addr = "http://127.0.0.1:8300"
ui = true