Skip to content

conntrack: prevent potential memory leak #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

aroradaman
Copy link
Contributor

Currently, the ConntrackDeleteFilters captures all flow entries it fails to delete and reports them as errors. This behavior can potentially lead to memory leaks in high-traffic systems, where thousands of conntrack flow entries are cleared in a single batch. With this commit, instead of returning all the un-deleted flow entries, we now return a single error message for all of them.

ref: kubernetes/kubernetes#129982

@aroradaman aroradaman force-pushed the conntrack-delete-potential-memory-leak-fix branch from 4a0682e to e97da78 Compare February 5, 2025 16:50
@aroradaman aroradaman force-pushed the conntrack-delete-potential-memory-leak-fix branch from e97da78 to f875a5d Compare February 6, 2025 10:41
@aroradaman aroradaman requested a review from aojea February 6, 2025 10:43
Currently, the ConntrackDeleteFilters captures all flow entries
it fails to delete and reports them as errors. This behavior
can potentially lead to memory leaks in high-traffic systems,
where thousands of conntrack flow entries are cleared in a single
batch. With this commit, instead of returning all the un-deleted
flow entries, we now return a single error message for all of them.

Signed-off-by: Daman Arora <aroradaman@gmail.com>
@aroradaman aroradaman force-pushed the conntrack-delete-potential-memory-leak-fix branch from f875a5d to f87286a Compare February 6, 2025 11:59
@aroradaman aroradaman requested a review from aojea February 6, 2025 12:00
@@ -178,19 +179,20 @@ func (h *Handle) ConntrackDeleteFilters(table ConntrackTableType, family InetFam
req2 := h.newConntrackRequest(table, family, nl.IPCTNL_MSG_CT_DELETE, unix.NLM_F_ACK)
// skip the first 4 byte that are the netfilter header, the newConntrackRequest is adding it already
req2.AddRawData(dataRaw[4:])
if _, err = req2.Execute(unix.NETLINK_NETFILTER, 0); err == nil {
if _, err = req2.Execute(unix.NETLINK_NETFILTER, 0); err == nil || errors.Is(err, fs.ErrNotExist) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fore reference, this is how golang std lib recommends to do this check https://pkg.go.dev/os#IsNotExist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the err with syscall.ENONET, fs.ErrNotExist and unix.ENONET. Only got a match on fs.ErrNotExist .

https://github.com/vishvananda/netlink/compare/main...aroradaman:netlink:debug?expand=1

####################################################### Netlink ConntrackDeleteFilters
error occurred no such file or directory false true false
length 1808537
####################################################### Netlink ConntrackDeleteFilters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is how golang exposes them ... I found it out recently too aojea/kindnet#158

if len(errMsgs) > 0 {
return matched, fmt.Errorf(strings.Join(errMsgs, "; "))
if totalFilterErrors > 0 {
finalErr = errors.Join(finalErr, fmt.Errorf("failed to delete %d conntrack flows with %d filters", totalFilterErrors, len(filters)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was added in 1.20 https://pkg.go.dev/errors#Join

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this breaks the module;

go 1.12

For this one we may be able to just use the old approach (at least, I think it would give mostly the same for now)
(there was another change also requiring newer go versions)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but 1.20 was EOL one year ago devcontainers/images#944 ,

@aojea
Copy link
Contributor

aojea commented Feb 6, 2025

/lgtm

/assign @aboch

PTAL this is causing disruption and a big impact on performance and scalability kubernetes/kubernetes#129982

@aboch
Copy link
Collaborator

aboch commented Feb 6, 2025

LGTM

@aboch aboch merged commit 62fb240 into vishvananda:main Feb 6, 2025
2 checks passed
@aojea
Copy link
Contributor

aojea commented Feb 6, 2025

Thanks 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants