Checks/tasks are failing intermittently for git and docker resources - containers has no internet access

We are running Concourse 3.9 on Kubernetes using the official helm chart with a setup of 2 workers. Since a few hours our tasks and checks are failing intermittently. When checking our git resources we often (but not always) get:
```
resource script '/opt/resource/check []' failed: exit status 128

stderr:
Identity added: /tmp/git-resource-private-key (/tmp/git-resource-private-key)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
```

and sometimes also
```
resource script '/opt/resource/check []' failed: exit status 137
```
Inside our tasks (when they sometimes succeed in being triggered) we can get the same failure for checking the git resource but often also get 
```

Error response from daemon: Get https://<our-docker-registry>: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
```
Furthermore, we now also have some checks that appear stuck in the sense that they report `checked 45m 54s ago` even though they should run every minute.

Looking at the worker logs we can see things like 
```
{"timestamp":"1525762694.813524485","source":"guardian","message":"guardian.api.garden-server.attach.failed","log_level":2,"data":{"error":"does not exist","handle":"7f35e2e7-dca2-48c5-5af0-9027a92cc53c","session":"3.1.15"}}
```

as well as

```
{"timestamp":"1525762730.415373564","source":"tsa","message":"tsa.connection.channel.forward-worker.register.start","log_level":1,"data":{"remote":"10.240.0.223:47690","session":"171.1.1.5","worker-address":"10.240.0.10:41187","worker-platform":"linux","worker-tags":""}}
{"timestamp":"1525762730.415424109","source":"guardian","message":"guardian.create-global-iptables-chains.create-started","log_level":1,"data":{"session":"2"}}
2018/05/08 06:58:50 failed to forward remote connection: dial tcp 127.0.0.1:7777: getsockopt: connection refused
{"timestamp":"1525762730.417540312","source":"tsa","message":"tsa.connection.channel.forward-worker.register.failed-to-fetch-containers","log_level":2,"data":{"error":"Get http://api/containers: EOF","remote":"10.240.0.223:47690","session":"171.1.1.5"}}
{"timestamp":"1525762730.420288801","source":"tsa","message":"tsa.connection.channel.forward-worker.register.failed-to-reach-worker","log_level":1,"data":{"baggageclaim-took":"2.558173ms","garden-took":"2.020758ms","remote":"10.240.0.223:47690","session":"171.1.1.5"}}
```

Have tried a pipeline which uses no credentials from Vault but this gets the same issues as well

Running `fly -t target intercept c pipeline/resource` and then doing a `ping www.google.com` successfully resolves the ip address but cannot at all connect and shows 100% packet loss after being interrupted.

```
bash-4.4# ping www.google.com
PING www.google.com (172.217.20.100): 56 data bytes
^C
--- www.google.com ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
```

traceroute furthermore shows it doesn't get past the first router

```
bash-4.4# traceroute www.google.com
traceroute to www.google.com (172.217.20.100), 30 hops max, 46 byte packets
 1  10.254.0.209 (10.254.0.209)  0.014 ms  0.012 ms  0.010 ms
 2  *  *  *
 3  *  *^C
```

Can this be tied to any known issues and are there any suggestions for resolving this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Checks/tasks are failing intermittently for git and docker resources - containers has no internet access #2200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Checks/tasks are failing intermittently for git and docker resources - containers has no internet access #2200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions