Skip to content

Limit FQDNS match name and pattern expressions to 255 characters #35577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 28, 2024

Conversation

rudrakhp
Copy link
Contributor

@rudrakhp rudrakhp commented Oct 28, 2024

DNS names are limited to 255 chars: https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4
DNS match pattern rules that are excessively long could cause Cilium agent to take a long time to process the rules. This change adds a strict limit of 255 to matchName and configurable limit to matchPattern with default 63.

Ref #21491

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #21491

Limit FQDNS matchName and matchPattern length to 255 characters

@rudrakhp rudrakhp requested review from a team as code owners October 28, 2024 05:06
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 28, 2024
@github-actions github-actions bot added sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. kind/community-contribution This was a contribution made by a community member. labels Oct 28, 2024
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @joestringer who opened the original issue

I'm a bit worried that the limit on match patterns constitutes a breaking change in the current form - if we don't offer some safe upgrade path, it's conceivable that this starts rejecting policies that were previously accepted. Or is there a reason why match patterns would be limited to 63 chars in practice anyway?

@rudrakhp rudrakhp force-pushed the fqdns_match_limits branch 2 times, most recently from 91d693f to bd10bf5 Compare October 28, 2024 08:48
@rudrakhp rudrakhp requested a review from a team as a code owner October 28, 2024 08:48
@rudrakhp rudrakhp requested a review from youngnick October 28, 2024 08:48
@rudrakhp
Copy link
Contributor Author

@bimmlerd thanks for the comments! Do you think a higher default (say 255) would help here? I would expect regex to be smaller than the actual DNS for most cases. It would make sense to limit the pattern length given DNS name itself has a hard limit.
Regarding upgrade path, user can set the flag along with the upgrade to make sure existing policies don't fail. Although a breaking change, do you see this being a hard blocker?

@bimmlerd
Copy link
Member

Do you think a higher default (say 255) would help here?

I'm uncertain. It seems possible to construct contrived examples of matchPatterns which are longer but still valid - and we can't break those without making this a breaking change which would need a bit more consideration.

Regarding upgrade path, user can set the flag along with the upgrade to make sure existing policies don't fail.

Or we could make the matchPattern validation flag default to 0, which would allow any length and warn on very long patterns, i.e. issue a deprecation. Then flip the flag to opt-out or remove the flag outright in a future release. I'd prefer this, because then the users which don't know about this feature/don't read the upgrade guide carefully don't end up with potentially broken policies, which would be bad.

@rudrakhp
Copy link
Contributor Author

Or we could make the matchPattern validation flag default to 0

This makes sense, I have defaulted to 0 and run the validation only if value is non-zero. I guess documenting it as an available option should also work for those who are looking for this specifically to improve performance.

@joestringer
Copy link
Member

Yeah upgrade is the tricky aspect. So generally speaking it should be fine to set a maxLength of 255 for matchPatterns in the CRDs as well, for the same reason that matchNames can only be 255: RFC 1035 Section 2.3.4 states this as the maximum number of octets for a name. It doesn't make sense that a matchPattern would be longer if the actual FQDN is shorter than the matchpattern.

Then yeah having a configurable limit seems fine. Then again, we should really be able to convince ourselves there's a future where that setting will be configured at a lower value and will support the majority of users. Otherwise we're just adding yet one more flag that will be rarely used.

The other angle on upgrade is this suggestion from the original issue, we could potentially do something like this (doesn't need to be done in this PR):

Extend the preflight check to validate whether any existing CNPs or CCNPs have matchpattern / matchnames that exceed the default limits. If yes, highlight the statements. Instruct the user to configure the above Cilium flag to raise the limits to match the policies they use in their environment.

I don't recall where I came up with the idea of limiting matchPatterns to 63 characters. If my original motivation there is that Cilium might take a long time to process such rules, then maybe instead of putting a low limit like 63, Cilium should just export some metric or something that would allow users to identify when the worst case policy calculation takes a long time due to FQDN rules. Or warn on "bad" patterns - long patterns like @bimmlerd suggested, or maybe if there's a matchPattern with excessive wildcards (we could pick a threshold, say 5 or more * characters or something).

@rudrakhp
Copy link
Contributor Author

rudrakhp commented Nov 1, 2024

it should be fine to set a maxLength of 255 for matchPatterns in the CRDs as well

Didn't want to do this as this won't allow users with specific needs to configure it higher.

Then again, we should really be able to convince ourselves there's a future where that setting will be configured at a lower value and will support the majority of users. Otherwise we're just adding yet one more flag that will be rarely used.

To ensure this we could have warnings that inform users to use shorter DNS matchPatterns if length is > 255 to improve performance. Having this knob definitely allows cluster admins to expose cilium functionality while also ensuring bad configurations don't affect overall performance. Not sure about how to do warnings though, let me have a look.

@bimmlerd @joestringer do let me know if anything else makes more sense here. I have already disabled the matchPatterns check by default (value 0).

@joestringer
Copy link
Member

it should be fine to set a maxLength of 255 for matchPatterns in the CRDs as well

Didn't want to do this as this won't allow users with specific needs to configure it higher.

Could you expand more on the use case? If DNS names can't be longer than that, and matchPatterns inherently have a wildcard pattern which expands to one or more characters, then I don't understand how it's possible to use a higher value. Am I missing something?

Then again, we should really be able to convince ourselves there's a future where that setting will be configured at a lower value and will support the majority of users. Otherwise we're just adding yet one more flag that will be rarely used.

To ensure this we could have warnings that inform users to use shorter DNS matchPatterns if length is > 255 to improve performance. Having this knob definitely allows cluster admins to expose cilium functionality while also ensuring bad configurations don't affect overall performance. Not sure about how to do warnings though, let me have a look.

Bear with me, as I'm developing my thinking on this topic as we discuss it. Here's where I'm currently thinking as a direction: We should provide the tooling for interested admins to measure the impact and take action. So metrics totally make sense for this. The more I think about it, the less I'm convinced that cluster admins are going to be able to come up with some reasonable configuration for this flag that realistically limits the performance impact. If there is a specific pattern that causes high performance impact it would be nice to know, but then the solution is to check the policy and change it. Probably a pattern as simple as *.*.*.*.* could have high impact due to the number of wildcards, yet the actual length of the pattern is not that long. (Note I didn't try this out at all, just speculating. Maybe it takes 10 wildcard patterns to trigger bad behaviour, not sure. I think the original idea / concern about performance of regex matching came from reports from oss-fuzz).

The other thing I'm thinking is.. if we are going to impose a limit, then it is nice to have a (hidden) configuration for this option so that in the worst case it is easy to tweak if the new limit causes problems. But if the default is 0 then I'm not sure whether any user will actually configure the option. We have hundreds of flags in Cilium, so if users aren't specifically asking for a new flag for this configuration then maybe we don't need to be exporting an option to them.

@joestringer
Copy link
Member

I don't recall where I came up with the idea of limiting matchPatterns to 63 characters.

Some context here: K8s limits individual label strings to 63 characters. If we ever take a DNS name and try to put it into a k8s API field with that limit, it could cause problems. I don't know exactly whether we ever do that or how that limit might come up, so I couldn't argue to set that limit at this point. Just highlighting this context that I remembered after that post.

@youngnick
Copy link
Contributor

@rudrakhp ping?

@rudrakhp
Copy link
Contributor Author

@youngnick will get back to this, have been busy elsewhere. From what I gather as per the comments:

If DNS names can't be longer than that, and matchPatterns inherently have a wildcard pattern which expands to one or more characters, then I don't understand how it's possible to use a higher value.

@joestringer patterns convoluted enough can lead to the patterns themselves exceeding the 255 limit, although practically that shouldn't be the case. If everyone is of the same opinion I can have a hard limit of 255 for regex as well.

The more I think about it, the less I'm convinced that cluster admins are going to be able to come up with some reasonable configuration for this flag that realistically limits the performance impact. If there is a specific pattern that causes high performance impact it would be nice to know, but then the solution is to check the policy and change it.

Ok this makes sense, length of the regular expression will seldom affect the performance, it's more about how complex the final expression is.
Limiting the scope of this PR, will just have two hard limits on DNS name and patterns. PR for emitting performance metrics for bad patterns can be a good follow up WDYT?

@joestringer
Copy link
Member

Limiting the scope of this PR, will just have two hard limits on DNS name and patterns. PR for emitting performance metrics for bad patterns can be a good follow up WDYT?

Yeah, I think that would be fine. 👍

@rudrakhp rudrakhp force-pushed the fqdns_match_limits branch 2 times, most recently from 0ce744b to e97c97c Compare November 21, 2024 04:55
@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Nov 21, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 21, 2024
@joestringer joestringer changed the title Limits for FQDNS match name and pattern Limit FQDNS match name and pattern expressions to 255 characters Nov 21, 2024
@joestringer
Copy link
Member

/test

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudrakhp Thank you for this update!

@joestringer
Copy link
Member

Many tests were failing, looks like due to ImagePullBackOff issue with cilium-envoy, which I believe was a transient failure on main earlier this week. I clicked the button to rebase the PR to tip of main so we can re-run the testsuite.

@joestringer
Copy link
Member

/test

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@joestringer
Copy link
Member

/test

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
auto-merge was automatically disabled November 28, 2024 04:41

Head branch was pushed to by a user without write access

@aanm aanm enabled auto-merge November 28, 2024 13:04
@aanm
Copy link
Member

aanm commented Nov 28, 2024

/test

@aanm aanm added this pull request to the merge queue Nov 28, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 28, 2024
Merged via the queue into cilium:main with commit abe646f Nov 28, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit DNS matchname/matchpattern rule lengths
7 participants