-
Notifications
You must be signed in to change notification settings - Fork 454
Closed
Description
Issue: failed to pull image when tracker pod is cycled
Steps to produce:
I have Kraken setup on a cluster which uses Consul as service discovery.
When a tracker pod is killed and brought back up (i.e. tracker ip addr now has changed), agent however still tries to connect to the dead pod ip, causing the following error:
"transferer download: scheduler: create torrent: download metainfo: network error: Get <>/metainfo: dial tcp <deadpod>:80: connect: no route to host"
Thoughts: I looked at the code and looks like agent has a PassiveRing of tracker and
func (r *dnsResolver) resolve() (stringset.Set, error)
for refreshing new hosts doesn't get called after initialization step. The issue persists as long as the agent pod lives. I added c.ring.Refresh()
to https://github.com/uber/kraken/blob/master/tracker/metainfoclient/client.go#L54. It refreshes tracker hashring and fixes the issue. Should we add Monitor to refresh periodically?
Metadata
Metadata
Assignees
Labels
No labels