Skip to content

The containers Radar is about to use should not be reaped from underneath it by GC #1413

@topherbullock

Description

@topherbullock

Due to the nature of radar and gc running on separate intervals, and resourceConfigCheckSessionLifecycle's being based on an (arbitrary?) 1 hour expiry, we've observed a lot of errors from Garden bubbling up into the resource UI:

The logs from papertrail show the following:

Jul 25 11:41:26 worker-1 garden:  {"timestamp":"1501008086.000833035","source":"guardian","message":"guardian.create.containerizer-create.watch.watching","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.2.4"}}
Jul 25 11:41:26 worker-1 garden:  {"timestamp":"1501008086.225612640","source":"guardian","message":"guardian.create.network.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.5","spec":""}}
Jul 25 11:41:26 worker-1 garden:  {"timestamp":"1501008086.227276087","source":"guardian","message":"guardian.create.network.config-create","log_level":1,"data":{"config":{"ContainerHandle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","HostIntf":"wl2a6qv376u8-0","ContainerIntf":"wl2a6qv376u8-1","IPTablePrefix":"w--","IPTableInstance":"l2a6qv376u8","BridgeName":"wbrdg-0afe00a0","BridgeIP":"10.254.0.161","ContainerIP":"10.254.0.162","ExternalIP":"10.1.0.7","Subnet":{"IP":"10.254.0.160","Mask":"/////A=="},"Mtu":1460,"PluginNameservers":null,"OperatorNameservers":[],"AdditionalNameservers":[]},"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.5","spec":""}}
Jul 25 11:41:27 web-1 atc:  {"timestamp":"1501008086.437911987","source":"atc","message":"atc.container-collector.run.found-containers-for-deletion","log_level":0,"data":{"created-containers":["83342e40-817a-44d6-757a-59ae1f582d6e","379e2a87-b12f-480a-4098-a2c3e2eed421","458e2243-64dc-4f9f-7043-3117a7f5a9c6","ca52ef17-d524-4638-4178-13abd51688a7","51599ba6-72fc-4887-6229-38df8fd93665","8f91adff-525d-4e6c-69f8-d671d3446044","76db54da-2214-4bf4-4679-ba23a99a8ee9","cd77d11f-6bb9-4f05-7e31-60610b334e00","899060b0-6096-4b6c-7579-0342903f4f48","0d6746bb-95f2-490b-6afc-7ed8404b51dc","a4ea3083-caf4-4a7e-4968-fd89a245b948","987c44dc-41ff-49f8-4394-518fa47eb3b3","63c0e7d3-a7e9-47c7-7a1e-2e6927dd9a84","69fed7fd-23e1-4a19-6444-f241170951ca","c577dd54-c4b1-41d3-480e-fbdc57b63af0","93a65c82-7fd5-4125-4338-7444f03e4d1b","62d0df0e-4484-4888-4bff-ab7fe404a193","1ea15f7c-121b-4537-4a74-038610dfd851","612897d2-2b11-4c06-5578-0ec9f68c63c8","70f47618-3a56-4bb9-6c42-766671e0fb32","2a9c4487-8845-4c5b-62ab-086307a4adab","fe96b1a7-86f1-4a17-7b24-cb56e5014a34"],"creating-containers":["01b3a397-d260-424c-5a66-ea2457414b3e","7ed8dd88-2176-41ff-7b84-72433bce7e69","4e6c0dd1-acc5-4abd-54b6-255f7ecae5f5","d4813953-976c-4727-42cf-903db202a063"],"destroying-containers":["9b83e7a6-ef82-4c63-61ea-a0c306117466","a612b873-29db-49cf-5458-ca2f86f992fa","162ad91b-60e2-45e1-7e9d-8e02c6b67e80","f6200543-f046-4a7e-4997-b0981db99783"],"session":"31.2291"}}
Jul 25 11:41:27 web-1 atc:  {"timestamp":"1501008086.528923988","source":"atc","message":"atc.container-collector.run.mark-created-as-destroying.start","log_level":0,"data":{"container":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"31.2291.55","worker":"4fd44d6a-a6c9-4c27-8ef7-259d7c1f6294"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008086.551643133","source":"guardian","message":"guardian.destroy.start","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008086.552307606","source":"guardian","message":"guardian.destroy.destroy.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008086.736886263","source":"guardian","message":"guardian.destroy.destroy.state","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1","state":{"Pid":438672,"Status":"created"}}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008086.737044334","source":"guardian","message":"guardian.destroy.destroy.delete.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1.2"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.111113310","source":"guardian","message":"guardian.create.containerizer-create.watch.done","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.2.4"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.147542953","source":"guardian","message":"guardian.destroy.destroy.delete.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1.2"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.147641897","source":"guardian","message":"guardian.destroy.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.1"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.237886906","source":"guardian","message":"guardian.create.network.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.5","spec":""}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.238155127","source":"guardian","message":"guardian.create.create-failed-cleaningup.start","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.238301277","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.started","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.1"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.301817417","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.state-failed-skipping-delete","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","error":"runc state: runc create: exit status 1: container \"7ed8dd88-2176-41ff-7b84-72433bce7e69\" does not exist\n","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.1"}}
Jul 25 11:41:27 worker-1 garden:  {"timestamp":"1501008087.301863670","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.finished","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.391334057","source":"guardian","message":"guardian.create.create-failed-cleaningup.volume-creator.destroy.start","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.2.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.391528845","source":"guardian","message":"guardian.create.create-failed-cleaningup.volume-creator.destroy.layer-already-deleted-skipping","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","error":"could not find image: no such id: d25fa5243abb4e6cd9e99d4c06061443519a5185e2f65d6daf08c54de60cd1d5","graphID":"7ed8dd88-2176-41ff-7b84-72433bce7e69","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.2.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.391585350","source":"guardian","message":"guardian.create.create-failed-cleaningup.volume-creator.destroy.finished","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.2.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.391625881","source":"guardian","message":"guardian.create.create-failed-cleaningup.depot.destroy.started","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.3.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.392038584","source":"guardian","message":"guardian.create.create-failed-cleaningup.depot.destroy.finished","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6.3.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.392084122","source":"guardian","message":"guardian.create.create-failed-cleaningup.cleanedup","log_level":1,"data":{"cause":"open /proc/438672/ns/net: no such file or directory","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160563.6"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.392116547","source":"guardian","message":"guardian.api.garden-server.create.failed","log_level":2,"data":{"error":"open /proc/438672/ns/net: no such file or directory","request":{"Handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","GraceTime":0,"RootFSPath":"raw:///var/vcap/data/baggageclaim/volumes/live/9159facc-b11d-4e56-762c-e2d4c5c9ba38/volume","BindMounts":[{"src_path":"/var/vcap/data/baggageclaim/volumes/live/e7ce916f-e1e5-49ab-5fe2-6bd71488608e/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"4.1.224969"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.450836182","source":"guardian","message":"guardian.destroy.volume-creator.destroy.start","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.2.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.450944424","source":"guardian","message":"guardian.destroy.volume-creator.destroy.layer-already-deleted-skipping","log_level":1,"data":{"error":"could not find image: no such id: d25fa5243abb4e6cd9e99d4c06061443519a5185e2f65d6daf08c54de60cd1d5","graphID":"7ed8dd88-2176-41ff-7b84-72433bce7e69","handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.2.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.450988770","source":"guardian","message":"guardian.destroy.volume-creator.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.2.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.451024294","source":"guardian","message":"guardian.destroy.depot.destroy.started","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.3.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.452475548","source":"guardian","message":"guardian.destroy.depot.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568.3.1"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.452504158","source":"guardian","message":"guardian.destroy.finished","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"160568"}}
Jul 25 11:41:29 worker-1 garden:  {"timestamp":"1501008089.452531815","source":"guardian","message":"guardian.api.garden-server.destroy.destroyed","log_level":1,"data":{"handle":"7ed8dd88-2176-41ff-7b84-72433bce7e69","session":"4.1.224975"}}
Jul 25 11:41:30 web-1 atc:  {"timestamp":"1501008089.391695499","source":"atc","message":"atc.syncer.resources:radar.scan-resource.interval-runner.tick.failed-to-create-container-in-garden","log_level":2,"data":{"container":"7ed8dd88-2176-41ff-7b84-72433bce7e69","error":"open /proc/438672/ns/net: no such file or directory","pipeline-scoped-name":"resources:resource:time-resource-image","session":"18.3.412.1.307"}}
Jul 25 11:41:30 web-1 atc:  {"timestamp":"1501008089.392815113","source":"atc","message":"atc.syncer.resources:radar.scan-resource.interval-runner.tick.release.released","log_level":0,"data":{"container":"7ed8dd88-2176-41ff-7b84-72433bce7e69","id":[5,3147651],"pipeline-scoped-name":"resources:resource:time-resource-image","session":"18.3.412.1.307.5"}}

Yikes, that's a whole lotta logs going on.. lets break it down..

From this set of logs, we can observe the following events:

  • Garden is creating the container 7ed8dd88-2176-41ff-7b84-72433bce7e69
  • GC kicks in and rudely interrupts by finding a "creating" container to delete; 7ed8dd88-2176-41ff-7b84-72433bce7e69
  • Garden starts and succeeds in destroying 7ed8dd88-2176-41ff-7b84-72433bce7e69
  • Garden fails to create 7ed8dd88-2176-41ff-7b84-72433bce7e69
  • Radar's scanning for the resource fails because the container fails to create in Garden.. its all like: "Where's the Container?"

Acceptance Criteria:

Tests:

Integration test which :

  • runs gc's NewContainerCollector
  • uses radar's NewResourceScanner to check for resource versions
  • reproduces the problem of resource_config_check_session_id's being nil on containers which are imminently being used to check for new versions of a resource. (see https://github.com/concourse/atc/tree/wip-1297)

AND/OR

Topgun test which:

  • interacts with the DB and Garden to reproduce the error we see from Garden (open /proc/123456/ns/net: no such file or directory) in the resource UI

Fix:

make the test pass by :

  • make it explicit that these containers going away is okay, since radar will pick them back up, and somehow handle the error that happens inside of radar
  • make it so that GC doesn't remove the containers radar is about to use, maybe by checking for some kind of lock related to the resource config check session?

Refactoring Opportunities:

  • db.ContainerFactory isn't a factory, its a filter
  • The ContainerFactory Unit tests rely on the implicit interaction of resourceConfigCheckSessionLifecycle being run and cleaning up expired ones, rather than just explicitly removing them from the db. The test should be more clear about this behavior.
  • Look for ways to make this implicit relationship between radar and GC more explicit

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions