Skip to content

Conversation

timebertt
Copy link
Member

How to categorize this PR?

/area dev-productivity testing
/kind enhancement

What this PR does / why we need it:

This PR implements the Bastion controller in provider-local. For this, it deploys another pod with the local node image and a corresponding LoadBalancer service. As with other LoadBalancer services in the local setup, there is some network magic involved (see the provider-local service controller).
This PR also fixes user login sessions on local shoot nodes (required for SSH).

Finally, the PR adds e2e tests for the newly added Bastion functionality. With this, we extend the test coverage of local e2e tests to previously untested parts of the codebase.
The e2e tests use a newly introduced package (pkg/utils/ssh) that simplifies SSH connection handling for our use cases (e2e test and later on in gardenadm bootstrap).

This prepares the Bastion usage in gardenadm bootstrap, see GEP-28.

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:

/cc @ScheererJ @rfranzke @maboehm

Release note:

The provider-local extension implements the `Bastion` resource now. With this, you can use `gardenctl ssh` in the local setup.

@gardener-prow gardener-prow bot requested a review from ScheererJ June 20, 2025 09:19
@gardener-prow gardener-prow bot added the area/dev-productivity Developer productivity related (how to improve development) label Jun 20, 2025
@gardener-prow gardener-prow bot requested a review from rfranzke June 20, 2025 09:19
@gardener-prow gardener-prow bot added the area/testing Testing related label Jun 20, 2025
Copy link
Contributor

gardener-prow bot commented Jun 20, 2025

@timebertt: GitHub didn't allow me to request PR reviews from the following users: maboehm.

Note that only gardener members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

How to categorize this PR?

/area dev-productivity testing
/kind enhancement

What this PR does / why we need it:

This PR implements the Bastion controller in provider-local. For this, it deploys another pod with the local node image and a corresponding LoadBalancer service. As with other LoadBalancer services in the local setup, there is some network magic involved (see the provider-local service controller).
This PR also fixes user login sessions on local shoot nodes (required for SSH).

Finally, the PR adds e2e tests for the newly added Bastion functionality. With this, we extend the test coverage of local e2e tests to previously untested parts of the codebase.
The e2e tests use a newly introduced package (pkg/utils/ssh) that simplifies SSH connection handling for our use cases (e2e test and later on in gardenadm bootstrap).

This prepares the Bastion usage in gardenadm bootstrap, see GEP-28.

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:

/cc @ScheererJ @rfranzke @maboehm

Release note:

The provider-local extension implements the `Bastion` resource now. With this, you can use `gardenctl ssh` in the local setup.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gardener-prow gardener-prow bot added kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 20, 2025
@timebertt timebertt force-pushed the provider-local-bastion branch 10 times, most recently from c5ecd5c to 937614b Compare June 24, 2025 06:55
@timebertt
Copy link
Member Author

After a few iterations with the e2e tests, I believe I tackled all scenarios 😄

@ScheererJ
Copy link
Member

/assign

@timebertt timebertt force-pushed the provider-local-bastion branch from 937614b to ae81b4e Compare June 24, 2025 12:54
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2025
@timebertt timebertt force-pushed the provider-local-bastion branch from ae81b4e to 5bf1db4 Compare June 24, 2025 14:52
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2025
@timebertt
Copy link
Member Author

Rebased to resolve conflicts

@rfranzke
Copy link
Member

/assign

Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the Bastion for provider-local to drive the GEP-28 story forward.

@timebertt timebertt requested a review from ScheererJ June 25, 2025 13:23
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2025
The `logrotate` binary is not installed in provider-local machine images.
Hence, the `containerd-logrotate.service` is failing:

```
root@machine-shoot--local--local-local-57666-r5xnh:/# systemctl status containerd-logrotate.service
× containerd-logrotate.service - Rotate and Compress System Logs
     Loaded: loaded (/etc/systemd/system/containerd-logrotate.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Sat 2025-06-14 22:10:04 UTC; 1min 22s ago
   Duration: 2ms
TriggeredBy: ● containerd-logrotate.timer
    Process: 3692 ExecStart=/usr/sbin/logrotate -s /var/lib/containerd-logrotate.status /etc/systemd/containerd.conf (code=exited, status=203/EXEC)
   Main PID: 3692 (code=exited, status=203/EXEC)
        CPU: 465us

Jun 14 22:10:04 machine-shoot--local--local-local-57666-r5xnh (ogrotate)[3692]: containerd-logrotate.service: Failed at step EXEC spawning /usr/sbin/logrotate: No such file or directory
Jun 14 22:10:04 machine-shoot--local--local-local-57666-r5xnh systemd[1]: Started containerd-logrotate.service - Rotate and Compress System Logs.
Jun 14 22:10:04 machine-shoot--local--local-local-57666-r5xnh systemd[1]: containerd-logrotate.service: Main process exited, code=exited, status=203/EXEC
Jun 14 22:10:04 machine-shoot--local--local-local-57666-r5xnh systemd[1]: containerd-logrotate.service: Failed with result 'exit-code'.
```

Because of this, the system never reaches `multi-user.target`.
Hence, `systemd-user-sessions.service` never runs and never removes `/run/nologin`.

```
root@machine-shoot--local--local-local-57666-r5xnh:/# cat /run/nologin
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."
```

This prevents non-privileged users (e.g., `gardener`) from logging into the machine, e.g., via SSH.
For some reason, `systemd-user-sessions.service` is not activated automatically in local machine pods.
Without this, non-privileged users are not able to log into the machine (e.g., via SSH).
As a workaround, add a drop-in to trigger it explicitly.
@timebertt timebertt force-pushed the provider-local-bastion branch from f44bab2 to 5f53b85 Compare June 26, 2025 09:40
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2025
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the PR :)

@timebertt timebertt requested a review from rfranzke June 26, 2025 13:43
@rfranzke
Copy link
Member

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2025
Copy link
Contributor

gardener-prow bot commented Jun 26, 2025

LGTM label has been added.

Git tree hash: fbb999d050e2b6618180821635a3adc3a9b2b515

Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Contributor

gardener-prow bot commented Jun 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ScheererJ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2025
@timebertt
Copy link
Member Author

/retest

@gardener-prow gardener-prow bot merged commit 3519875 into gardener:master Jun 30, 2025
19 checks passed
@timebertt timebertt deleted the provider-local-bastion branch June 30, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dev-productivity Developer productivity related (how to improve development) area/testing Testing related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants