Skip to content

Conversation

eloycoto
Copy link
Member

@eloycoto eloycoto commented Jan 24, 2019

The reason for this commit is to avoid large list of identities on fqdn policy
and implement the cleanup process to clean expires IPS and avoid large times on
policyCalculation.

The reason to use a cleanup map in dnsCache is to avoid to loop all the DNS
cache entries, so each second we have a map and a controller will clean the
entries that has some ips expired at that point of time.

The reason to use the controller, is to avoid to have a goroutine with a
timer.Tick each second, and have all in the same goroutine. To avoid leaks,
instead the passing time.Now() use the cache.lastCleanup to get all the entries
from the last run, policy regeneration can take more than 40 seconds in some
huge policies and with this method leak seconds entries int he cleanup map will
not happen.

Tried also with a callback function, but that can block the cache in case of the
long policyCalculation times and DNS proxy will not resolve/response correctly.
Also tried with the channel, but need to have a buffer and the Cilium way is
more related to use a controller.

This PR will not work if #6692 is not merged.

Signed-off-by: Eloy Coto eloy.coto@gmail.com


This change is Reviewable

@eloycoto eloycoto added pending-review area/daemon Impacts operation of the Cilium daemon. labels Jan 24, 2019
@eloycoto eloycoto requested a review from a team January 24, 2019 09:41
daemon/fqdn.go Outdated
@@ -100,6 +101,17 @@ func (d *Daemon) bootstrapFQDN(restoredEndpoints *endpointRestoreState) (err err
fqdn.StartDNSPoller(d.dnsPoller)
}

// Controller to cleanup TTL expired entries from the DNS policies.
controller.NewManager().UpdateController("dns-ttl-cleanup-job", controller.ControllerParams{
RunInterval: time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably the wrong person to review this PR but do we need to run this controller every second? Can't this be event based on the TTL of the entry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, each second the cleanup need to happens to clean the table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the entries don't have a TTL for every second

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple dns queries has multiple DNS TTL, so each second we need to clean the TTL that has expired after X seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge is knowing when to wake up to expire a specific IP and regenerate policy. We know the expiration time of each entry and setting a timer for the next one to expire is straight-forward Figuring out the expire time for the one after the next is annoying. We would need something like a priority queue (with O(logn) best case update). In theory, most updates will add a new time to the end of this queue (i.e. a new latest time). Still, we can have situations where new entries will expire somewhere in the middle.
Binning the entries into seconds seems like the lowest overhead, since we have as many map entries as we have IPs (or less). The lookup is also constant time for a specific second.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that we need 1 second accuracy for the TTL. Isn't it enough to expire every 15 seconds?

Copy link
Contributor

@raybejjani raybejjani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to change, otherwise LGTM!

// CleanupExpiredEntries cleans all the expired entries from the lastTime that
// runs to the give time. It will lock the struct until retrieves all the data.
// It returns the list of names that need to be deleted from the policies.
func (c *DNSCache) CleanupExpiredEntries(expires time.Time) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return lastCleanup as well, this way it will be clear what the time range was for logging etc.

expiredEntries = append(expiredEntries, entries...)
delete(c.cleanup, key)
}
return expiredEntries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now do the actual removal of cacheEntry objects here, we need to do a :

	for _, name := range expiredEntries {
		if entries, exists := c.forward[name]; exists {
			entries.removeExpired(expiredTime)
		}
	}

We also might want to sort and unique the names. If we do both, the loop above already does the uniqueing because of the exists check (if we add to a list only when exists == true).

lastCleanup time.Time

// cleanup is the map of TTL expiration time and the entries that are going
// to expire at that second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that the expiration time is in seconds, and that it is expected that we will delete entries in CleanupExpiredEntries

// When lookupTime is much earlier than time.Now(), we may not expire all
// entries that should be expired, leaving more work for .Lookup.
c.removeExpired(entries, time.Now())
c.addNameToCleanup(entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line needs to be in updateWithEntryIPs. This is because we're tracking by IP, but store the whole DNS lookup. The entry may not actually go into the cache if, for a particular IP, there is another lookup that expires later.
In the cases were, for an IP, we are replacing an existing entry, we might need to also replace the cleanup entry. This is because that IP won't actually expire at the original time anymore. It is a little harmless, though.

daemon/fqdn.go Outdated
@@ -100,6 +101,17 @@ func (d *Daemon) bootstrapFQDN(restoredEndpoints *endpointRestoreState) (err err
fqdn.StartDNSPoller(d.dnsPoller)
}

// Controller to cleanup TTL expired entries from the DNS policies.
controller.NewManager().UpdateController("dns-ttl-cleanup-job", controller.ControllerParams{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right: putting this here makes things a little complicated. We now need a controller for the Endpoint.DNSHistory caches(unless some of the changes below around replacing cleanup entries go in, then it might end up with a steady memory usage).

daemon/fqdn.go Outdated
@@ -100,6 +101,17 @@ func (d *Daemon) bootstrapFQDN(restoredEndpoints *endpointRestoreState) (err err
fqdn.StartDNSPoller(d.dnsPoller)
}

// Controller to cleanup TTL expired entries from the DNS policies.
controller.NewManager().UpdateController("dns-ttl-cleanup-job", controller.ControllerParams{
RunInterval: time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge is knowing when to wake up to expire a specific IP and regenerate policy. We know the expiration time of each entry and setting a timer for the next one to expire is straight-forward Figuring out the expire time for the one after the next is annoying. We would need something like a priority queue (with O(logn) best case update). In theory, most updates will add a new time to the end of this queue (i.e. a new latest time). Still, we can have situations where new entries will expire somewhere in the middle.
Binning the entries into seconds seems like the lowest overhead, since we have as many map entries as we have IPs (or less). The lookup is also constant time for a specific second.

c.removeExpired(entries, c.lastCleanup)
}
}
return uniqExpires, c.lastCleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I meant the original lastCleanup at the start of this function, not the final one (which is going to be roughly expires).

func (c *DNSCache) addNameToCleanup(entry *cacheEntry) {
expiration := int64(entry.TTL) + time.Now().Unix()
expiredEntries, exists := c.cleanup[expiration]
if !exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that the append means we can avoid this if, if you prefer

// delete the entry from the policy when it expires.
// Need to be called with the lock
func (c *DNSCache) addNameToCleanup(entry *cacheEntry) {
expiration := int64(entry.TTL) + time.Now().Unix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that this isn't right. We already set the ExpirationTime in the entry and use that everywhere else. We should probably use it here too, so it is consistent. It will also apply the TTL correctly relative to the lookup time of the data, instead of time.Now() (which will be within the same second most of the time so it would mostly work this way).

@@ -268,6 +323,7 @@ func (c *DNSCache) updateWithEntryIPs(entries ipEntries, entry *cacheEntry) {
c.upsertReverse(ipStr, entry)
}
}
c.addNameToCleanup(entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be called when we call c.upsertReverse (just above) otherwise we introduce expirations that aren't actually happening in the cache.

daemon/fqdn.go Outdated
@@ -100,6 +101,17 @@ func (d *Daemon) bootstrapFQDN(restoredEndpoints *endpointRestoreState) (err err
fqdn.StartDNSPoller(d.dnsPoller)
}

// Controller to cleanup TTL expired entries from the DNS policies.
controller.NewManager().UpdateController("dns-ttl-cleanup-job", controller.ControllerParams{
RunInterval: time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that we need 1 second accuracy for the TTL. Isn't it enough to expire every 15 seconds?

daemon/fqdn.go Outdated
d.dnsRuleGen.ForceGenerateDNS(namesToClean)
return nil
},
StopFunc: func() error { return nil },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be omitted

@eloycoto eloycoto requested review from a team as code owners January 30, 2019 17:17
@@ -165,3 +165,17 @@ func sortedIPsAreEqual(a, b []net.IP) bool {
func prepareMatchName(matchName string) string {
return strings.ToLower(dns.Fqdn(matchName))
}

func KeepUniqueNames(names []string) []string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function KeepUniqueNames should have comment or be unexported

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for only using the slice instead of making a map. In any case, however we do it we should document the function and explain that the output slice is the same or different.


// NewDNSCache returns an initialized DNSCache and set the max host limit to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported function NewDNSCacheWithLimit should be of the form "NewDNSCacheWithLimit ..."

@@ -26,6 +26,9 @@ import (
"github.com/cilium/cilium/pkg/lock"
)

// defines the maximum number of IPs to maintain for each FQDN name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported var FQDNMaxIPperHost should be of the form "FQDNMaxIPperHost ..."

daemon/fqdn.go Outdated
for i, ep := range endpoints {
cfg.Cache.UpdateFromCache(ep.DNSHistory, namesToClean, i == 0)
}
log.Infof("FQDN garbage collector work deleted %d name entries", len(namesToClean))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably a debug... I'm not sure how relevant it is without the list of names deleted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a info message to be able to track this down in the future if any issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrug. People keep complaining to me about log lines.
Anyway, can you make it a scopedLog with controller name or whatever we usually put into the controllers? that will make it easier to filter for it.

daemon/fqdn.go Outdated
}
// A second loop is needed to update the global cache from the
// endpoints cache
for i, ep := range endpoints {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention: The first update from an endpoint will clear the data in cfg.Cache for each of namesToClean. Looping this way is generally safe despite not locking; If a new lookup happens during these updates the new DNS data will be reinserted from the endpoint.DNSHistory cache that made the request.

cleanup map[int64][]string

// overLimit is a map with the host names that has reached the ips limits
// defined by the DNSCache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined by limit, no?

@@ -154,16 +181,124 @@ func (c *DNSCache) updateWithEntry(entry *cacheEntry) {
c.forward[entry.Name] = entries
}
c.updateWithEntryIPs(entries, entry)
if len(entries) > FQDNMaxIPperHost {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now needs to use c.limit

overLimit map[string]bool

// limit is the number of maximum number of IP per host.
limit int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perHostLimit will be more clear when using it later.


// cleanupOverLimitEntries returns the names that has reached the max number of
// IP per host.
// Internally the function sort the entries by the TTL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say sorted by expiration time. TTL might be ambiguous and mean the actual TTL seconds value.

func (c *DNSCache) UpdateFromCache(update *DNSCache) {
// individually. If any item in the names slice will only update that names.
// If the clean argument is set to true, all entries will be clean.
func (c *DNSCache) UpdateFromCache(update *DNSCache, names []string, clean bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names might be better as namesToUpdate or something,

the comment is a little unclear, maybe:
When names is non-nil only those names are updated from update, otherwise all DNS names in update are used.
When clean is true the data in c for each specific name is cleared before the update.

@@ -260,6 +260,10 @@ const (
// ToFQDNsEmitPollerEvents controls if poller lookups are sent as monitor events
ToFQDNsEnablePollerEvents = "tofqdns-enable-poller-events"

// ToFQDNsMaxIPsPerHost defines the maximum number of IPs to maintain
// for each FQDN name in an endpoint's FQDN cache
ToFQDNsMaxIPsPerHost = "tofqdns-max-ip-per-hostname"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tofqdns-endpoint-max-ip-per-hostname might be a little more clear but it is longer.

@eloycoto eloycoto force-pushed the FQDNCacheCleanb branch 5 times, most recently from 73f5cc3 to d9184e2 Compare January 31, 2019 13:53
@coveralls
Copy link

coveralls commented Jan 31, 2019

Coverage Status

Coverage increased (+0.02%) to 42.98% when pulling 0e90cfd on eloycoto:FQDNCacheCleanb into 0b3ed95 on cilium:master.

@eloycoto
Copy link
Member Author

test-me-please

daemon/fqdn.go Outdated
for i, ep := range endpoints {
cfg.Cache.UpdateFromCache(ep.DNSHistory, namesToClean, i == 0)
}
log.Infof("FQDN garbage collector work deleted %d name entries", len(namesToClean))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrug. People keep complaining to me about log lines.
Anyway, can you make it a scopedLog with controller name or whatever we usually put into the controllers? that will make it easier to filter for it.

// UpdateFromCache is a utility function that allows updating a DNSCache
// instance with all the internal entries of another. Latest-Expiration still
// applies, thus the merged outcome is consistent with adding the entries
// individually.
func (c *DNSCache) UpdateFromCache(update *DNSCache) {
// When namesToUpdate is non-nil only those names are updated from update, otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't wholly true. "When namesToUpdate has non-zero length" might be more correct (since someone might pass in an empty list thinking it will be a no-op).

@@ -173,8 +306,19 @@ func (c *DNSCache) UpdateFromCache(update *DNSCache) {

c.Lock()
defer c.Unlock()

for _, newEntries := range update.forward {
if len(namesToUpdate) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's better to convert the string slice to a map when len(namesToUpdate) != 0 instead. This way, the default case just uses update.forward without doing extra work. It probably won't matter too much in the end.

continue
}
if clean {
c.removeExpired(newEntries, time.Now(), time.Now())
Copy link
Contributor

@raybejjani raybejjani Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that this is wrong. c.removeExpired should only work on entries in c, and not from update. The effect here would be that removeReverse inside removeExpired etc. wouldn't work correctly.
There's also another problem with the assumption I made yesterday. If the first endpoint.DNSHistory doesn't have name then we won't ever clean it. Another problem is that the time.Now() here leaves a bit of a race.
So, I think we are better off not having clean in this function, but call ForceExpire before we call this function in the controller. This should keep things relatively consistent (at the cost of a regex compile).

}
if clean {
c.removeExpired(newEntries, time.Now(), time.Now())
}
for _, newEntry := range newEntries {
c.updateWithEntry(newEntry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related with this PR, but could you add a comment like:
using newEntry from update violates encapsulation but these objects are immutable and so it should be harmless to do this (but still a bad idea).

Alternatively, we can copy newEntry (The type is shallow-copyable in general, unless we also play with IP slices, and if we do we deserve bugs).

}
toClean := []string{}

// For global cache the limit maybe is not used at all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without knowing about the various instances of this type that we use, this is a little confusing. If you wanted to make sure it was clear, maybe add this comment where we instantiate fqdn.DefaultDNSCache?

}
}

}

// removeExpired removes expired (or nil) cacheEntry pointers from entries, an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the entry we are expiring, we can also remove it from cleanup. This way if someone forgets to run GC we won't leak memory.


// cleanup is the map of TTL expiration time and the entries that are going
// to expire will be deleted by CleanupExpiredEntries.
cleanup map[int64][]string
Copy link
Contributor

@raybejjani raybejjani Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain how cleanup and overlimit are being used more completely here. The difference between them is a little unclear. It's possible something like expirationTimes is a better name but it is longer.
For cleanup:
cleanup maps the TTL expiration times (in seconds since the epoch) to DNS names that expire in that second. On every new insertion where the new data is actually inserted into the cache (i.e. it expires later than an existing entry) cleanup will be updated. CleanupExpiredEntries cleans up these entries on demand.
Note: Lookup functions will not return expired entries, and this is used to proactively enforce expirations.
Note: It is important to periodically call CleanupExpiredEntries otherwise this map will grow forever.
(this last note isn't needed if we remove entries from cleanup during removeExpired)

For overlimit:
overlimit is a set of DNS names that were over the per-host configured limit when they received an update. The excess IPs will be removed when cleanupOverLimitEntries is called, but will continue to be returned by Lookup until then.
Note: It is important to periodically call GC otherwise this map will grow forever (it is very bounded, however).
(this last note isn't needed if we remove entries from overLimit when we insert IPs).

// When lookupTime is much earlier than time.Now(), we may not expire all
// entries that should be expired, leaving more work for .Lookup.
c.removeExpired(entries, time.Now(), time.Time{})
}

// AddNameToCleanup adds the IP with the given TTL to the the cleanup map to
// delete the entry from the policy when it expires.
// Need to be called with the lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a write lock (I think we make the distinction elsewhere)

reverse: make(map[string]nameEntries),
lastCleanup: time.Now(),
cleanup: map[int64][]string{},
overLimit: map[string]bool{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we set every other field, perhaps we should also set perHostLimit. It is redundant but makes it clear that it was intended. shrug.

@eloycoto eloycoto requested a review from a team as a code owner January 31, 2019 15:52
@raybejjani raybejjani closed this Feb 4, 2019
@raybejjani raybejjani reopened this Feb 4, 2019
@raybejjani
Copy link
Contributor

raybejjani commented Feb 4, 2019

test-me-please (rebased with master)

@raybejjani
Copy link
Contributor

sorry @eloycoto I accidentally pressed close!

@raybejjani raybejjani added this to the 1.4-feature milestone Feb 4, 2019
@eloycoto
Copy link
Member Author

eloycoto commented Feb 4, 2019

test-me-please

@tgraf
Copy link
Member

tgraf commented Feb 4, 2019

Needs a quick refresh:

00:14:45.987     runtime: 104a105
00:14:45.987     runtime: >       --tofqdns-endpoint-max-ip-per-hostname int    Maximum number of IPs to maintain per FQDN name for each endpoint (default 50)
00:14:45.987     runtime: Detected a difference in the cmdref directory
00:14:45.987     runtime: diff -r: diff -r ./Documentation/cmdref/cilium-agent.md /tmp/tmp.20iWrMlkhK/cilium-agent.md
00:14:45.987     runtime: 104a105
00:14:45.987     runtime: >       --tofqdns-endpoint-max-ip-per-hostname int    Maximum number of IPs to maintain per FQDN name for each endpoint (default 50)
00:14:45.987     runtime: Only in ./Documentation/cmdref: cli_index.rst
00:14:45.987     runtime: Only in ./Documentation/cmdref: index.rst
00:14:45.987     runtime: Only in ./Documentation/cmdref: kvstore.rst
00:14:45.987     runtime: Please rerun 'make -C Documentation cmdref' and commit your changes

@eloycoto
Copy link
Member Author

eloycoto commented Feb 4, 2019

test-me-please

Copy link
Member

@ianvernon ianvernon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for CLI

The reason for this commit is to avoid large list of identities on fqdn policy
and implement the cleanup process to clean expires IPS and avoid large times on
policyCalculation.

The reason to use a cleanup map in dnsCache is to avoid to loop all the DNS
cache entries, so each second we have a map and a controller will clean the
entries that has some ips expired at that point of time.

The reason to use the controller, is to avoid to have a goroutine with a
timer.Tick each second, and have all in the same goroutine. To avoid leaks,
instead the passing time.Now() use the cache.lastCleanup to get all the entries
from the last run, policy regeneration can take more than 40 seconds in some
huge policies and with this method leak seconds entries int he cleanup map will
not happen.

Tried also with a callback function, but that can block the cache in case of the
long policyCalculation times and DNS proxy will not resolve/response correctly.
Also tried with the channel, but need to have a buffer and the Cilium way is
more related to use a controller.

Signed-off-by: Eloy Coto <eloy.coto@gmail.com>
Signed-off-by: Eloy Coto <eloy.coto@gmail.com>
On `updateIPsForName` if the new data has already expired or the old
data also expired can be issue in regeneration.

Fix cilium#6796

Signed-off-by: Eloy Coto <eloy.coto@gmail.com>
@eloycoto
Copy link
Member Author

eloycoto commented Feb 5, 2019

test-me-please

@eloycoto
Copy link
Member Author

eloycoto commented Feb 5, 2019

Fixed conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants