-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
(c.f. kubernetes/kubernetes#83028)
For a etcd cluster using hostnames in TLS cert SANs, the etcd client only connects to the 1st etcd member in the endpoints list. Attempts to connect to the other members fail with:
W0924 14:42:29.368784 248333 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://member3.etcd.local:32379 0 }. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for member3.etcd.local, not member1.etcd.local"
This causes an etcd client to become disconnected from the etcd cluster when the 1st etcd member is unavailable.
This appears to occur both with etcd 3.3.13 (which uses the old load balancer code) and 3.1.15 (which uses the new grpc load balancer)
Reproduction steps: https://github.com/jpbetz/etcd/blob/etcd-lb-dnsname-failover/reproduction.md
This can also be reproduced with etcdctl by stopping the 1st etcd and running:
$ etcdctl --cacert ${path-to-checkout-of-this-branch}/integration/fixtures/ca.crt --endpoints=https://member1.etcd.local:2379,https://member2.etcd.local:22379,https://member3.etcd.local:32379 get /
which outputs:
{"level":"warn","ts":"2019-09-24T15:52:25.256-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-3a8f8d78-f6ba-4d72-b3d6-27cd6ce8c943/member1.etcd.local:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for member2.etcd.local, not member1.etcd.local""}
Error: context deadline exceeded
xref: db61ee1