-
-
Notifications
You must be signed in to change notification settings - Fork 867
Closed
Labels
Milestone
Description
Due to the nature of radar
and gc
running on separate intervals, and resourceConfigCheckSessionLifecycle
's being based on an (arbitrary?) 1 hour expiry, we've observed a lot of errors from Garden bubbling up into the resource UI:
The logs from papertrail show the following:
Jul 25 11:41:26 worker-1 garden: {"timestamp":"1501008086.000833035","source":"guardian","message":"guardian.create.containerizer-create.watch.watching","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.2.4"}}
Jul 25 11:41:26 worker-1 garden: {"timestamp":"1501008086.225612640","source":"guardian","message":"guardian.create.network.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.5","spec":""}}
Jul 25 11:41:26 worker-1 garden: {"timestamp":"1501008086.227276087","source":"guardian","message":"guardian.create.network.config-create","log_level":1,"data":{"config":{"ContainerHandle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","HostIntf":"wl2a6qv376u8-0","ContainerIntf":"wl2a6qv376u8-1","IPTablePrefix":"w--","IPTableInstance":"l2a6qv376u8","BridgeName":"wbrdg-0afe00a0","BridgeIP":"10.254.0.161","ContainerIP":"10.254.0.162","ExternalIP":"10.1.0.7","Subnet":{"IP":"10.254.0.160","Mask":"/////A=="},"Mtu":1460,"PluginNameservers":null,"OperatorNameservers":[],"AdditionalNameservers":[]},"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.5","spec":""}}
Jul 25 11:41:27 web-1 atc: {"timestamp":"1501008086.437911987","source":"atc","message":"atc.container-collector.run.found-containers-for-deletion","log_level":0,"data":{"created-containers":["83342e40-817a-44d6-757a-59ae1f582d6e","379e2a87-b12f-480a-4098-a2c3e2eed421","458e2243-64dc-4f9f-7043-3117a7f5a9c6","ca52ef17-d524-4638-4178-13abd51688a7","51599ba6-72fc-4887-6229-38df8fd93665","8f91adff-525d-4e6c-69f8-d671d3446044","76db54da-2214-4bf4-4679-ba23a99a8ee9","cd77d11f-6bb9-4f05-7e31-60610b334e00","899060b0-6096-4b6c-7579-0342903f4f48","0d6746bb-95f2-490b-6afc-7ed8404b51dc","a4ea3083-caf4-4a7e-4968-fd89a245b948","987c44dc-41ff-49f8-4394-518fa47eb3b3","63c0e7d3-a7e9-47c7-7a1e-2e6927dd9a84","69fed7fd-23e1-4a19-6444-f241170951ca","c577dd54-c4b1-41d3-480e-fbdc57b63af0","93a65c82-7fd5-4125-4338-7444f03e4d1b","62d0df0e-4484-4888-4bff-ab7fe404a193","1ea15f7c-121b-4537-4a74-038610dfd851","612897d2-2b11-4c06-5578-0ec9f68c63c8","70f47618-3a56-4bb9-6c42-766671e0fb32","2a9c4487-8845-4c5b-62ab-086307a4adab","fe96b1a7-86f1-4a17-7b24-cb56e5014a34"],"creating-containers":["01b3a397-d260-424c-5a66-ea2457414b3e","7ed8dd88-2176-41ff-7b84-72433bce7e69","4e6c0dd1-acc5-4abd-54b6-255f7ecae5f5","d4813953-976c-4727-42cf-903db202a063"],"destroying-containers":["9b83e7a6-ef82-4c63-61ea-a0c306117466","a612b873-29db-49cf-5458-ca2f86f992fa","162ad91b-60e2-45e1-7e9d-8e02c6b67e80","f6200543-f046-4a7e-4997-b0981db99783"],"session":"31.2291"}}
Jul 25 11:41:27 web-1 atc: {"timestamp":"1501008086.528923988","source":"atc","message":"atc.container-collector.run.mark-created-as-destroying.start","log_level":0,"data":{"container":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"31.2291.55","worker":"4fd44d6a-a6c9-4c27-8ef7-259d7c1f6294"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008086.551643133","source":"guardian","message":"guardian.destroy.start","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008086.552307606","source":"guardian","message":"guardian.destroy.destroy.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008086.736886263","source":"guardian","message":"guardian.destroy.destroy.state","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1","state":{"Pid":438672,"Status":"created"}}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008086.737044334","source":"guardian","message":"guardian.destroy.destroy.delete.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1.2"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.111113310","source":"guardian","message":"guardian.create.containerizer-create.watch.done","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.2.4"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.147542953","source":"guardian","message":"guardian.destroy.destroy.delete.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1.2"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.147641897","source":"guardian","message":"guardian.destroy.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.237886906","source":"guardian","message":"guardian.create.network.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.5","spec":""}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.238155127","source":"guardian","message":"guardian.create.create-failed-cleaningup.start","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.238301277","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.started","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.1"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.301817417","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.state-failed-skipping-delete","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","error":"runc state: runc create: exit status 1: container \"7ed8dd88-2176-41ff-7b84-72433bce7e69\" does not exist\n","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.1"}}
Jul 25 11:41:27 worker-1 garden: {"timestamp":"1501008087.301863670","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.finished","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.391334057","source":"guardian","message":"guardian.create.create-failed-cleaningup.volume-creator.destroy.start","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.2.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.391528845","source":"guardian","message":"guardian.create.create-failed-cleaningup.volume-creator.destroy.layer-already-deleted-skipping","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","error":"could not find image: no such id: d25fa5243abb4e6cd9e99d4c06061443519a5185e2f65d6daf08c54de60cd1d5","graphID":"7ed8dd88-2176-41ff-7b84-72433bce7e69","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.2.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.391585350","source":"guardian","message":"guardian.create.create-failed-cleaningup.volume-creator.destroy.finished","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.2.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.391625881","source":"guardian","message":"guardian.create.create-failed-cleaningup.depot.destroy.started","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.3.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.392038584","source":"guardian","message":"guardian.create.create-failed-cleaningup.depot.destroy.finished","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.3.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.392084122","source":"guardian","message":"guardian.create.create-failed-cleaningup.cleanedup","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.392116547","source":"guardian","message":"guardian.api.garden-server.create.failed","log_level":2,"data":{"error":"open /proc/438672/ns/net: no such file or directory","request":{"Handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","GraceTime":0,"RootFSPath":"raw:///var/vcap/data/baggageclaim/volumes/live/9159facc-b11d-4e56-762c-e2d4c5c9ba38/volume","BindMounts":[{"src_path":"/var/vcap/data/baggageclaim/volumes/live/e7ce916f-e1e5-49ab-5fe2-6bd71488608e/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"4.1.224969"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.450836182","source":"guardian","message":"guardian.destroy.volume-creator.destroy.start","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.2.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.450944424","source":"guardian","message":"guardian.destroy.volume-creator.destroy.layer-already-deleted-skipping","log_level":1,"data":{"error":"could not find image: no such id: d25fa5243abb4e6cd9e99d4c06061443519a5185e2f65d6daf08c54de60cd1d5","graphID":"7ed8dd88-2176-41ff-7b84-72433bce7e69","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.2.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.450988770","source":"guardian","message":"guardian.destroy.volume-creator.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.2.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.451024294","source":"guardian","message":"guardian.destroy.depot.destroy.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.3.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.452475548","source":"guardian","message":"guardian.destroy.depot.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.3.1"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.452504158","source":"guardian","message":"guardian.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568"}}
Jul 25 11:41:29 worker-1 garden: {"timestamp":"1501008089.452531815","source":"guardian","message":"guardian.api.garden-server.destroy.destroyed","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"4.1.224975"}}
Jul 25 11:41:30 web-1 atc: {"timestamp":"1501008089.391695499","source":"atc","message":"atc.syncer.resources:radar.scan-resource.interval-runner.tick.failed-to-create-container-in-garden","log_level":2,"data":{"container":"7ed8dd88-2176-41ff-7b84-72433bce7e69","error":"open /proc/438672/ns/net: no such file or directory","pipeline-scoped-name":"resources:resource:time-resource-image","session":"18.3.412.1.307"}}
Jul 25 11:41:30 web-1 atc: {"timestamp":"1501008089.392815113","source":"atc","message":"atc.syncer.resources:radar.scan-resource.interval-runner.tick.release.released","log_level":0,"data":{"container":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":[5,3147651],"pipeline-scoped-name":"resources:resource:time-resource-image","session":"18.3.412.1.307.5"}}
Yikes, that's a whole lotta logs going on.. lets break it down..
From this set of logs, we can observe the following events:
- Garden is creating the container
7ed8dd88-2176-41ff-7b84-72433bce7e69
- GC kicks in and rudely interrupts by finding a "creating" container to delete;
7ed8dd88-2176-41ff-7b84-72433bce7e69
- Garden starts and succeeds in destroying
7ed8dd88-2176-41ff-7b84-72433bce7e69
- Garden fails to create
7ed8dd88-2176-41ff-7b84-72433bce7e69
- Radar's scanning for the resource fails because the container fails to create in Garden.. its all like: "Where's the Container?"
Acceptance Criteria:
Tests:
Integration test which :
- runs
gc
'sNewContainerCollector
- uses
radar
'sNewResourceScanner
to check for resource versions - reproduces the problem of
resource_config_check_session_id
's beingnil
on containers which are imminently being used to check for new versions of a resource. (see https://github.com/concourse/atc/tree/wip-1297)
AND/OR
Topgun test which:
- interacts with the DB and Garden to reproduce the error we see from Garden (
open /proc/123456/ns/net: no such file or directory
) in the resource UI
Fix:
make the test pass by :
- make it explicit that these containers going away is okay, since radar will pick them back up, and somehow handle the error that happens inside of radar
- make it so that GC doesn't remove the containers radar is about to use, maybe by checking for some kind of lock related to the resource config check session?
Refactoring Opportunities:
- db.ContainerFactory isn't a factory, its a filter
- The ContainerFactory Unit tests rely on the implicit interaction of
resourceConfigCheckSessionLifecycle
being run and cleaning up expired ones, rather than just explicitly removing them from the db. The test should be more clear about this behavior. - Look for ways to make this implicit relationship between radar and GC more explicit