Skip to content

OPENSSL_internal:CERTIFICATE_VERIFY_FAILED #36465

@Noksa

Description

@Noksa

Bug Description

Hello.

I have three primary clusters that were installed using this manual: https://istio.io/latest/docs/setup/install/multicluster/multi-primary/

Everything works as expected but I have a problem with one service (it is couchdb).

For some reason when it restarts other workloads can't connect to it because of the error.

Let's start from the very beginning.

The problem occurs in all clusters but let's do it in the first cluster.

SVC:

➜ ~ k get svc -n kazoo-db --context first | grep "storage-db-svc"
storage-db-svc                  ClusterIP   10.100.26.163    <none>        5984/TCP,5986/TCP   3d16h

Endpoints from another workload:

➜ ~ istioctl pc endpoint --context first crossbar-844cd68fd9-g8q9c | grep -Ei "5984.*storage-db-svc"
10.1.9.200:5984                  HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
10.2.1.109:5984                  HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
192.168.58.110:5984              HEALTHY     FAILED            outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local

The error in the envoy log of crossbar pod:

[2021-12-10T12:36:02.063Z] "GET / HTTP/1.1" 503 UF,URX upstream_reset_before_response_started{connection_failure,TLS_error:_268435581:SSL_routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED} - "TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED" 0 195 45 - "-" "hackney/1.6.4" "9b3d6495-fa43-4054-8d0c-c6f51308b288" "storage-db-svc.kazoo-db:5984" "192.168.58.110:5984" outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local - 10.100.26.163:5984 192.168.46.122:43165 - default

So it looks like we have bad certificate, right? But why? Other workloads work fine between clusters...

Moreover, the 192.168.58.110:5984 address is in the same cluster where we have crossbar-844cd68fd9-g8q9c. So it is even not between clusters, they are located in the same cluster.

And I have the following DR:

│ spec:                                                                                                                                                                                                                                                                        
│   host: storage-db-svc.kazoo-db.svc.cluster.local                                                                                                                                                                                                                           
│   trafficPolicy:                                                                                                                                                                                                                                                             
│     loadBalancer:                                                                                                                                                                                                                                                            
│       localityLbSetting:                                                                                                                                                                                                                                                     
│         enabled: true                                                                                                                                                                                                                                                        
│         failoverPriority:                                                                                                                                                                                                                                                    
│         - topology.istio.io/network                                                                                                                                                                                                                                          
│         - topology.kubernetes.io/region                                                                                                                                                                                                                                      
│         - topology.kubernetes.io/zone                                                                                                                                                                                                                                        
│         - topology.istio.io/subzone                                                                                                                                                                                                                                          
│     outlierDetection:                                                                                                                                                                                                                                                        
│       baseEjectionTime: 30s                                                                                                                                                                                                                                                  
│       consecutive5xxErrors: 1                                                                                                                                                                                                                                                
│       interval: 15s

So to fix it I have the following options:

  • I can remove any other service from the first cluster and my storage-db-svc will work perfectly starting from this point
  • Disable mTLS in storage-db-svc
  • I can delete any pod from any cluster in the mesh and then storage-db-svc will work
  • I can add any another service in the mesh and then storage-db-svc will work
  • and so on... any changes trigger storage-db-svc to work properly

Let's remove some another service from the first cluster:

➜ ~ istioctl pc endpoint --context first crossbar-844cd68fd9-g8q9c | grep -Ei "5984.*storage-db-svc"
10.1.9.200:5984                  HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
10.2.1.109:5984                  HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
192.168.58.110:5984              HEALTHY     FAILED            outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
➜ ~ k delete svc -n smb-cluster1 --context first web-mgmt
service "web-mgmt" deleted
➜ ~ istioctl pc endpoint --context first crossbar-844cd68fd9-g8q9c | grep -Ei "5984.*storage-db-svc"
10.1.9.200:5984                  HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
10.2.1.109:5984                  HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local
192.168.58.110:5984              HEALTHY     OK                outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local

So as you can see after removing another service that is not related to any of those pods/endpoints helped...

So crossbar pod now sends requests to the closest storage as expected:

[2021-12-10T12:47:57.646Z] "GET /accounts/_design/accounts/_view/listing_by_descendants?endkey=%5b%227f466bd0c4bd7dffb5914222e0cd0987%22%2c%7b%7d%5d&limit=51&startkey=%5b%227f466bd0c4bd7dffb5914222e0cd0987%22%2c%22%22%5d HTTP/1.1" 200 - via_upstream - "-" 0 302 141 140 "-" "hackney/1.6.4" "d9c1f39f-2c33-4780-8934-a30a287d2a09" "storage-db-svc.kazoo-db:5984" "192.168.58.110:5984" outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local 192.168.46.122:48840 10.100.26.163:5984 192.168.46.122:55931 - default
[2021-12-10T12:47:58.938Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 182 18 18 "-" "hackney/1.6.4" "f7dbd329-7ccd-4aea-a069-6a407eb1d4b0" "storage-db-svc.kazoo-db:5984" "192.168.58.110:5984" outbound|5984||storage-db-svc.kazoo-db.svc.cluster.local 192.168.46.122:48910 10.100.26.163:5984 192.168.46.122:34079 - default

Could you please explain why it works in this way? What have I missed?

Version

➜ ~ istioctl version
client version: 1.12.0
control plane version: 1.12.0
data plane version: 1.12.0 (15 proxies)
➜ ~ k version
h Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:08:39Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1
➜ ~ h version
version.BuildInfo{Version:"v3.7.2", GitCommit:"663a896f4a815053445eec4153677ddc24a0a361", GitTreeState:"clean", GoVersion:"go1.17.3"}


### Additional Information

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions