Skip to content

Conversation

vipul-21
Copy link
Contributor

This PR is following the work done in #38669

  • The PR adds the cilium agent handling of client(sdp) connection.
  • On each request on the StreamPolicyState, CA stores the stream and reuse the same stream to send the policy rules to the client.
  • Cilium agent also waits for the ack from the client for the data it sends.
  • In case of any error on the stream, it is the responsibility of the client to recreate the connection with CA.

For overall flow, please go through the #30984 (comment) and overall SDP PR: #37836

In general the flow is:
The grpc server starts running on a default port if l7proxy and enable-standalone-dns-proxy is true
Cilium agent sends the data to SDP(standalone dns proxy) during endpoint regeneration specifically after the policy cache is updated. There is lock mechanism in the policyCache as well as the FqdnDataServer to make sure that the updates are a snapshot of the current policy state.(and in serialized manner)
StreamPolicyState is called by the client(SDP) on start and cilium agent sents it the current policy rules it has. Otherwise, it stores the stream and write the rules on the same stream for the client to receive.
UpdateMappingRequest is used by the client to send the data(DNS Response mappings) to cilium agent.

Fixes: #30984
Related PRs: #36214, #36215
CFP: cilium/design-cfps#54

@vipul-21 vipul-21 requested a review from a team as a code owner April 28, 2025 23:06
@vipul-21 vipul-21 requested a review from doniacld April 28, 2025 23:06
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 28, 2025
@github-actions github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Apr 28, 2025
@vipul-21
Copy link
Contributor Author

cc: @bimmlerd

@bimmlerd
Copy link
Member

Hi, unfortunately swamped at the moment, and on PTO starting tomorrow; until Monday - I'll review once I'm back (otherwise, you can always request feedback from the folks who engaged with the first PR!)

@vipul-21 vipul-21 requested review from doniacld, joamaki and squeed May 8, 2025 02:48
Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, but looking good from my perspective. Let's still figure out the UpdateSDP part, but at least to me it'd be fine to do that in a follow-up.

@joestringer joestringer added the release-note/misc This PR makes changes that have no direct user impact. label May 23, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 23, 2025
@joestringer
Copy link
Member

I added the release-note/misc label since the feature needs several PRs together to be functional. I'd like the overall feature to be releases-note/major in the release notes assuming that all relevant PRs go in on time, but we probably just need to make that call closer to the time.

@vipul-21 vipul-21 force-pushed the singhvipul/cilium-standalone-2 branch from 28ece30 to 379f654 Compare May 27, 2025 00:07
@vipul-21 vipul-21 requested review from joamaki and squeed May 27, 2025 01:07
@vipul-21
Copy link
Contributor Author

@joestringer Can we add the v1.18 release label to the SDP PRs ? Trying to push this in the v1.18 release.

@joestringer
Copy link
Member

I can put it in the v1.18 milestone, but that only expresses intent. It's still up to the contributors and reviewers to collaborate to meet that date. We don't change release timelines for specific features.

@joestringer joestringer added this to the 1.18 milestone May 28, 2025
@doniacld doniacld removed their request for review June 4, 2025 15:03
@vipul-21
Copy link
Contributor Author

vipul-21 commented Jun 4, 2025

@squeed Friendly for re-review on this. Thanks !

@vipul-21 vipul-21 requested a review from squeed June 9, 2025 15:17
@joestringer joestringer removed this from the 1.18 milestone Jun 13, 2025
@vipul-21 vipul-21 force-pushed the singhvipul/cilium-standalone-2 branch from 379f654 to 2623377 Compare June 14, 2025 20:12
@squeed
Copy link
Contributor

squeed commented Jun 25, 2025

Much cleaner! This is a lot easier to follow :-).

A few more things to take care of. Please squash to one or two commits once you've got this PR where you like. But this is getting close.

@vipul-21 vipul-21 force-pushed the singhvipul/cilium-standalone-2 branch from d049e71 to 8d2f234 Compare June 26, 2025 00:03
@vipul-21 vipul-21 requested review from joamaki and squeed June 26, 2025 00:24
@joestringer joestringer removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jul 1, 2025
@squeed
Copy link
Contributor

squeed commented Jul 2, 2025

/test

- The PR adds the cilium agent handling of client(sdp) connection.
- Using state db to send updates to client with for each connection. Each stream
  has a watch on the policy rules getting updated. On each update, the
  agent send the snapshot of the rules to the client.
- policyRulesTable will be populated as the agent learns about the rules. Each
  write is used as trigger to send the updates to the client.
- In case of any error on the stream, it is the responsibility of the
  client to recreate the connection with CA.
- Move identityToIP map to statedb

Signed-off-by: Vipul Singh <singhvipul@microsoft.com>
@vipul-21 vipul-21 force-pushed the singhvipul/cilium-standalone-2 branch from 8d2f234 to 4319858 Compare July 2, 2025 18:52
@vipul-21
Copy link
Contributor Author

vipul-21 commented Jul 3, 2025

/test

@vipul-21
Copy link
Contributor Author

/ci-clustermesh

@vipul-21
Copy link
Contributor Author

/ci-integration

@vipul-21
Copy link
Contributor Author

/ci-clustermesh

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 15, 2025
@joestringer
Copy link
Member

@vipul-21 Why were the tests failing? I see several re-runs of CI, but no evidence of investigation of the failures.

The CI Failure Triage page in the docs describes the expected pattern when encountering a failure. If the tests are repeatedly re-run on a PR, then the default assumption would be that the PR itself is introducing failures into the tree, so we should avoid merging until we know that the PR is not destabilizing the tree.

@joestringer joestringer removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 16, 2025
@vipul-21
Copy link
Contributor Author

Hey @joestringer sorry about that. Here's the investigation on why I retriggered.
/test by Casey initially - This was before the squashing of the commits. So had to run /test after squashing the commits which lead to two failures.

--- FAIL: TestSpireDelegateClient_GetCertificateForIdentity (0.05s)
--- FAIL: TestSpireDelegateClient_GetCertificateForIdentity/get_certificate_for_numeric_identity (0.00s)

On searching the existing issues, found the open issue and a flaky label attached to it. #40130. So re ran the integration test only which passed on the second run.

Since the test failures were not related to the code changes in the PR and same issues were seen elsewhere too, I didn;t investigate further and re ran them.

@joestringer
Copy link
Member

@vipul-21 OK thanks for reporting back. Your explanation makes sense. I'd encourage reporting in a similar way if you observe failures unrelated to your PRs again, as this helps to highlight which CI issues we need to focus on, and also makes it clear to reviewers/committers whether there might be reasons to be concerned about how your PR affects CI stability.

It turns out that #40130 was pretty easy to reproduce, so I dug in a bit and proposed a fix. #39855 is already being investigated by someone so hopefully we'll resolve that soon enough.

@joestringer joestringer added this pull request to the merge queue Jul 17, 2025
Merged via the queue into cilium:main with commit b6ba886 Jul 17, 2025
68 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFP: Introduce HA mode for DNS proxy
6 participants