Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Oct 3, 2024

Current version of a versioned.Value is the latest committed version. This version should never be cleaned off, as then no value would be available. Currently this can happen if there is a new transaction in progress due to a missing check between the oldest handle version (for the new transaction) and the current version. Add this check using 'min()'.

This fixes the following warning log:
2024-10-03T06:14:48.362931495Z time="2024-10-03T06:14:48.361323126Z" level=warning msg="GetVersionHandle: Handle to a stale version requested, returning oldest valid version instead" oldVersion=1 stacktrace="<stacktrace redacted>" subsys=policy version=2

In the above log 'oldVersion' (1) was the current version that was cleaned off due to the oldest version handle being for a new, non-committed transaction for version 2.

When this line was logged, a handle for version 2 was returned, resulting in a race between the use of that version and that version being committed. Never cleaning off the current version makes sure this does not happen.

Fixes: #34205

Current version of a versioned.Value is the latest committed
version. This version should never be cleaned off, as then no value would
be available. Currently this can happen if there is a new transaction in
progress due to a missing check between the oldest handle version (for
the new transaction) and the current version. Add this check using
'min()'.

This fixes the following warning log:
      2024-10-03T06:14:48.362931495Z time="2024-10-03T06:14:48.361323126Z" level=warning msg="GetVersionHandle: Handle to a stale version requested, returning oldest valid version instead" oldVersion=1 stacktrace="<stacktrace redacted>" subsys=policy version=2

In the above log 'oldVersion' (1) was the current version that was
cleaned off due to the oldest version handle being for a new,
non-committed transaction for version 2.

When this line was logged, a handle for version 2 was returned, resulting
in a race between the use of that version and that version being
committed. Never cleaning off the current version makes sure this does
not happen.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact. labels Oct 3, 2024
@jrajahalme jrajahalme requested a review from a team as a code owner October 3, 2024 07:46
@jrajahalme jrajahalme requested a review from ovidiutirla October 3, 2024 07:46
@jrajahalme
Copy link
Member Author

This happened in https://github.com/cilium/cilium/actions/runs/11156777508/job/31009969527, with this stack trace:

github.com/hashicorp/go-hclog.Stacktrace
	/go/src/github.com/cilium/cilium/vendor/github.com/hashicorp/go-hclog/stacktrace.go:51
github.com/cilium/cilium/pkg/container/versioned.(*Coordinator).getVersionHandleLocked
	/go/src/github.com/cilium/cilium/pkg/container/versioned/value.go:322
github.com/cilium/cilium/pkg/container/versioned.(*Coordinator).GetVersionHandle
	/go/src/github.com/cilium/cilium/pkg/container/versioned/value.go:343
github.com/cilium/cilium/pkg/policy.(*SelectorCache).GetVersionHandle
	/go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:106
github.com/cilium/cilium/daemon/cmd.(*Daemon).GetDNSRules
	/go/src/github.com/cilium/cilium/daemon/cmd/endpoint.go:1166
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF
	/go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:463
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate
	/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:465
github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle
	/go/src/github.com/cilium/cilium/pkg/endpoint/events.go:56
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1
	/go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:244
sync.(*Once).doSlow
	/usr/local/go/src/sync/once.go:76
sync.(*Once).Do
	/usr/local/go/src/sync/once.go:67
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run
	/go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:232

@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme added the release-blocker/1.17 This issue will prevent the release of the next version of Cilium. label Oct 3, 2024
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a while to understand, but I think the crucial point is that acquiring a version handle adds a versionhandle to the coordinators tracked versions, but only the commit of said transaction changes coordinator.version - in this window, the coordinator.version can be cleaned, which is wrong and fixed in this commit.

@jrajahalme jrajahalme added this pull request to the merge queue Oct 3, 2024
Merged via the queue into cilium:main with commit 1b78e91 Oct 3, 2024
68 of 69 checks passed
@jrajahalme jrajahalme deleted the versioned-value-never-clean-off-current-version branch October 3, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-blocker/1.17 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants