-
Notifications
You must be signed in to change notification settings - Fork 16.6k
[stable/concourse] Replaces worker StatefulSet by Deployment #9668
Conversation
036f59a
to
835760f
Compare
835760f
to
3dfe9a9
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
3dfe9a9
to
ddedb7b
Compare
ddedb7b
to
d241a56
Compare
Since you're using a |
Yes this is so awesome and moves towards an ideal view of operating concourse on k8s; if a worker dies, a new one can come up without any expectation of state left behind by the previous incarnation. I would only be comfortable with removing the option of statefulset+pvc if we ensured that space on the node is reclaimed when the worker restarts. We need to test that out on recent versions of k8s using each baggage claim driver. Early on (late 2017) I noticed that space was never being reclaimed and I eventually ran out of disk on the node, causing trouble for any other pods running on the node and not reclaimable until restarted the host vm. Having a PVC made it possible to deal with that reality by tossing the disk when it became full. |
/hold |
let's rebase this and get it ready to merge! |
That's a pretty valid concern @william-tran , especially when it comes to the btrfs baggageclaim driver. I'll rebase the PR for now but still consider it on hold until we have in our test suite a test that verifies such behavior and ensures that for either Please let me know if there are any further scenarios you think would be interesting to cover before we move forward! thx! |
d241a56
to
b97eaca
Compare
b97eaca
to
9cb6da3
Compare
We are facing the issue (concourse/concourse#1780) in the latest version of concourse. Would it be possible after introducing this feature, the issue could be solved? If so, could @cirocosta you help on testing it as well? |
Hey, We created an issue under Another possible issue that I can see is when making use of local disks instead of persistent dir (with btrfs) - without the clean up that The idea is to include https://github.com/concourse/baggageclaim/blob/1bd5c2e7660df90ebef1c27ea8b7bdcb4840f85a/fs/btrfs.go#L67-L101 in the lifecycle (at termination) so that when The only sad part of it is that these changes would arrive only in the 5.x series (which is close to being released but not there yet). thx! |
9cb6da3
to
fab640d
Compare
Previously, `StatefulSet`s were used in order to work around some issues that users had with having pods registering & deregistering from ATC. After the introduction of `--ephemeral`, such logic can be simplified such that we can get back to the best way of deploying concourse: assuming that the workers are ephemeral in all ways (including state). Now, by default `--ephemeral` is set and persistence is disabled (for workers - postgresql still defaults to being persistent). Given that we're not using StatefulSets anymore, there's also no need of having a headless service anymore. Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
fab640d
to
d3ad546
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cirocosta The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. |
What this PR does / why we need it:
Previously, StatefulSets were used in order to work around some
issues that users had with having pods registering & deregistering
from ATC ( see concourse/concourse#1457).
From what I understand, we were not even making use of the idea of
bringing state together across restarts / new pods being created:
charts/stable/concourse/templates/worker-statefulset.yaml
Line 46 in 43a94e6
After the introduction of
--ephemeral
, the logic of automatically pruning a workerthat has gone away is suepr simplified: after
retire
, it'll be automatically pruned.That said, I guess we can go back to the best way of deploying concourse: assuming
that the workers are ephemeral in all ways (including state).
As a result of the changes,
--ephemeral
comes set totrue
by default.Which issue this PR fixes
Special notes for your reviewer:
I think in a close future we can haveStatefulSet
s back in order to have persistence but I'd say that for now (especially with 4.2.1 that doesn't include some GC improvements) we are much better off going with a simpleDeployment
that is totally stateless.Another argument for having the StatefulSet would be to increase the size of the disk where the worker state lives if needed. Maybe there's a way around this?Maybe we should not completely ditch the StatefulSet and try hard to go straight to the scenario where we have either SS or Deployments?Update: Actually, now I think it's actually pretty important to have the support for StatefulSet so we can support scenarios where disk IO is the highest priority
Wdyt?
Checklist