Skip to content

Conversation

andy-paine
Copy link
Contributor

Signed-off-by: Andy Paine andy.paine@engineerbetter.com

What does this PR accomplish?

Bug Fix | Feature | Documentation

Helps (partially) with #4340

Changes proposed by this PR:

Rather than tracking the active container count via the heartbeat, just use the containers table instead. This gives a more accurate representation of how many containers there are from the ATC's point of view since this is what is used for scheduling decisions and brings the fly workers and fly containers numbers in sync. This data is refreshed from the workers slightly less frequently than the heartbeats but is updated by the ATC as soon as a container is created so has faster initial feedback which may help with scheduling decisions.

Notes to reviewer:

Release Note

  • The number of containers in the fly workers output will now match the containers returned by fly containers

Contributor Checklist

Reviewer Checklist

  • Code reviewed
  • Tests reviewed
  • Documentation reviewed
  • Release notes reviewed
  • PR acceptance performed
  • New config flags added? Ensure that they are added to the
    BOSH and
    Helm packaging; otherwise, ignored for
    the integration
    tests

    (for example, if they are Garden configs that are not displayed in the
    --help text).

@clarafu
Copy link
Contributor

clarafu commented Jan 28, 2021

Hi! I think I'm a little hesitant of changing the output of fly containers from the containers that are coming directly from the worker to the containers that are known on the database side. This is because the current representation of the containers from the worker is the only method that users can easily take a look at the current state of the containers within the worker. There are times when the containers within the workers gets out of sync with the containers in the database, and the users can use the fly containers command to try to figure out what is going on within the workers that are causing issues that they are seeing within Concourse. I think rather than changing the output of this fly command, maybe it would be more useful to add another command or flag that will show you a representation of what ATC thinks is the current state of your containers are?

@andy-paine
Copy link
Contributor Author

I don't think this PR changes the fly containers command - it just brings the count of containers output of fly workers in line with fly containers. Given that this is purely a number with no extra details, this output doesn't really help that much with debugging.

The only real use right now for it is "fly containers reports far more containers than fly workers - something in the ATC/DB must be wrong." To find out which containers are in the DB and shouldn't be, you still need to somehow exec onto the worker and query Garden directly. I could maybe see a place for expected_containers and reported_containers count fields in the worker API though?

@clarafu
Copy link
Contributor

clarafu commented Feb 1, 2021

Oh sorry about that! I misread your pull request description and I had a wrong understanding of the fly containers command, totally my bad. 😓 I do agree that having the two container counts be different between fly workers and fly containers is confusing. I think your idea of the expected_containers and reported_containers is a great idea though! Because I think the current active_containers on the workers column is a more accurate representation of the container count on the worker since it is updated more frequently. If we display both container counts with a descriptive label it might be useful for users.

@clarafu
Copy link
Contributor

clarafu commented Feb 5, 2021

@andy-paine I was just wondering if you would be interested in adding the two columns rather than replacing the existing column with new data? Sorry if it sounds like I'm rushing you, I just wanted to know if this PR is in a finished state or if you wanted to add that extra part. It should be just keeping the existing column data and renaming it, then adding a new column with the new data that you added! Thanks! :)

aoldershaw added a commit that referenced this pull request Feb 28, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Feb 28, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Mar 2, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Mar 14, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
@BooleanCat BooleanCat force-pushed the pull-active-containers-from-containers-table branch from 22c2491 to b6c4c18 Compare April 9, 2021 17:05
@andy-paine andy-paine requested a review from a team as a code owner April 9, 2021 17:05
aoldershaw added a commit that referenced this pull request Apr 9, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Apr 9, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
@BooleanCat BooleanCat force-pushed the pull-active-containers-from-containers-table branch 2 times, most recently from c13b802 to 950eae5 Compare April 12, 2021 11:48
@BooleanCat
Copy link
Contributor

Hey @clarafu - I'm Tom from EngineerBetter. I'll make the changes Andy and yourself suggested here (renaming the existing column and adding a new column).

@BooleanCat
Copy link
Contributor

I've started with a shallow rename of the DB fields. Just to check, are we suggesting a deep rename of interfaces here such as this interface too? Happy to do this but just wanted to point out that that will touch a few areas of the codebase wherever this interface is used and I'm not sure if that's considered a "breaking change" with versioning considerations.

@clarafu
Copy link
Contributor

clarafu commented Apr 13, 2021

Thanks for picking this PR up! I think renaming the DB fields along with the interfaces would be best, I don't think we need to consider this a breaking change because it is just adding a new column in the DB and renaming a field on the fly workers command. We usually consider something a breaking change when there is some large behavioural changes that users would need to change in their current workflow in order to match the new behaviour. The only case that I can think of that would make this breaking is if there was a user that depended on the specific containers label to be outputted by the fly workers command. But we don't have official support for fly or our API so I don't think we need to stress about that 👍

@BooleanCat BooleanCat force-pushed the pull-active-containers-from-containers-table branch from 0e0ae37 to 1765ca6 Compare April 15, 2021 13:43
@BooleanCat
Copy link
Contributor

Renamed active_containers to expected_containers. I'll work on adding back Andy's change as reported_containers when I'm back at work next week 👍

aoldershaw added a commit that referenced this pull request Apr 16, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
@BooleanCat BooleanCat force-pushed the pull-active-containers-from-containers-table branch from 7e0fa6a to 68648a1 Compare April 19, 2021 13:35
andy-paine and others added 3 commits April 20, 2021 09:12
Rather than tracking the active container count via the heartbeat, just
use the containers table instead. This gives a more accurate
representation of how many containers there are from the ATC's point of
view since this is what is used for scheduling decisions and brings the
`fly workers` and `fly containers` numbers in sync. This data is
refreshed from the workers slightly less frequently than the heartbeats
but is updated by the ATC as soon as a container is created so has
faster initial feedback which may help with scheduling decisions.

Signed-off-by: Andy Paine <andy.paine@engineerbetter.com>
This reverts commit 950eae5.

Signed-off-by: Tom Godkin <tom.godkin@engineerbetter.com>
Signed-off-by: Tom Godkin <tom.godkin@engineerbetter.com>
@BooleanCat BooleanCat force-pushed the pull-active-containers-from-containers-table branch 2 times, most recently from 39d52e7 to c429bd3 Compare April 20, 2021 09:31
Signed-off-by: Tom Godkin <tom.godkin@engineerbetter.com>
@46bit 46bit force-pushed the pull-active-containers-from-containers-table branch from c429bd3 to 50d353b Compare April 20, 2021 10:20
@BooleanCat
Copy link
Contributor

@clarafu We've added the the reported_containers value to fly workers. A few things to note:

  1. We were unsure how the tsa/cmd/tsa tests work and couldn't figure out a meaningful way to test our change in there
  2. We've left the tabular output of fly workers the same as it was
  3. We've introduced the new value in the --json output only
  4. If you merge, could you please squash and merge? We kept the history so we could refer to Andy's change

Copy link
Contributor

@clarafu clarafu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your question about tsa/cmd/tsa, I don't think you need to add any tests and your changes seem appropriate where it is just renaming the field. I think you don't need to add any new tests because we arn't changing any behaviour, since we are just renaming a field. As long as there are existing tests that test the new renamed field, I think that is sufficient!

I also see there are some merge conflicts, would you be able to merge or rebase your branch ontop of master?

Thanks! 😄

@@ -0,0 +1 @@
ALTER TABLE workers RENAME COLUMN active_containers TO expected_containers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually needed to rename the active_containers column to reported_containers. This is because whenever a worker heartbeats to the ATC, it will report the number of containers that are currently on the worker itself. This is the number of containers that is reported directly from the worker. The other container count that we want to add is expected_containers and this is the number of containers that the ATC itself thinks "should be" on the worker. This should be the number of containers in the containers table. Sorry for the confusion, I should have clarified it properly before you made all these changes :(

@@ -140,7 +140,7 @@ func (command *WorkersCommand) tableFor(workers []worker) ui.Table {
for _, w := range workers {
row := ui.TableRow{
{Contents: w.Name},
{Contents: strconv.Itoa(w.ActiveContainers)},
{Contents: strconv.Itoa(w.ExpectedContainers)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can try to add a new column in the fly workers output by just adding a new TableRow here and on like 120. We can also rename the containers column to reported containers and then add a new column for expected containers!

@@ -57,7 +58,8 @@ var workersQuery = psql.Select(`
w.ephemeral
`).
From("workers w").
LeftJoin("teams t ON w.team_id = t.id")
LeftJoin("teams t ON w.team_id = t.id").
LeftJoin("(SELECT worker_name, COUNT(handle) reported_containers FROM containers GROUP BY worker_name) cs ON w.name = cs.worker_name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be populating the expected_containers count rather than reported_containers because the reported_containers should be containing the number of containers that the worker heartbeating returns back. (but this is the same as what I mentioned in my previous review comment)

I was thinking here that maybe instead of having the expected_containers count be queried out everytime we select for a worker, we could have a method on the db.Worker interface called ExpectedWorker() (int, error) and it will run a query to find the number of containers for that worker in the containers table. The reason for this is that we can separately run the method to find the reported containers when we need it. If we put the query within the workersQuery then everytime we fetch out a db.Worker object it will run this query even in places that we don't care about the expected_containers (for example, grabbing out all the workers for container placement). The only time we will need the expected_containers is in the api so we can just have the api run the method to grab out the number of expected_containers. Similar to

activeTasks, err := workerInfo.ActiveTasks()
if err != nil {
activeTasks = 0
}
. Let me know what you think!

aoldershaw added a commit that referenced this pull request Apr 26, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Apr 30, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request May 5, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request May 14, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request May 27, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jun 2, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jun 3, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jun 10, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jun 22, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jun 24, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jul 6, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jul 9, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jul 13, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Jul 15, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Aug 9, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Aug 10, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
aoldershaw added a commit that referenced this pull request Aug 16, 2021
there's one behavioural difference currently - fewest-build-containers
previously fetched ActiveContainers from the containers table, rather
than the heartbeated value. I implemented it more simply using
db.Worker.ActiveContainers since #6469 will take care of this

Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>
@xtremerui
Copy link
Contributor

closing the PR for now. Feel free to reopen it.

@xtremerui xtremerui closed this Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants