Skip to content

Conversation

0xaravindh
Copy link
Member

@0xaravindh 0xaravindh commented Mar 21, 2025

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug

/kind cleanup

/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

Added logging of Chain ID to the event message when a ChainPolicy is applied in the FleetAutoscaler controller.

Which issue(s) this PR fixes:

Closes #3953

Special notes for your reviewer:

@0xaravindh 0xaravindh self-assigned this Mar 21, 2025
@github-actions github-actions bot added the kind/cleanup Refactoring code, fixing up documentation, etc label Mar 21, 2025
@gongmax gongmax requested a review from vicentefb March 21, 2025 19:00
@vicentefb
Copy link
Collaborator

@indexjoseph fyi

@0xaravindh
Copy link
Member Author

0xaravindh commented Mar 21, 2025

Below is an example of the relevant event output:

karavindh@karavindh:~$ k describe fleetautoscaler chain-fleet-autoscaler 
Name:         chain-fleet-autoscaler
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.agones.dev/v1
Kind:         FleetAutoscaler
Metadata:
  Creation Timestamp:  2025-03-21T18:27:33Z
  Generation:          1
  Resource Version:    1742581683798655009
  UID:                 668b9aa1-3905-40c1-989d-28c256ca7b16
Spec:
  Fleet Name:  fleet-example
  Policy:
    Chain:
      Id:  weekday
      Schedule:
        Active Period:
          Duration:    5h
          Start Cron:  0 1 * * 0
          Timezone:    America/New_York
        Between:
          End:    2100-02-23T21:04:04Z
          Start:  2100-02-20T21:04:04Z
        Policy:
          Buffer:
            Buffer Size:   50
            Max Replicas:  2000
            Min Replicas:  100
          Type:            Buffer
      Type:                Schedule
      Id:                  weekend
      Schedule:
        Active Period:
          Duration:    7h
          Start Cron:  0 1 * * 0
          Timezone:    America/New_York
        Between:
          End:    2100-02-26T21:04:05Z
          Start:  2100-02-24T21:04:05Z
        Policy:
          Counter:
            Buffer Size:   10
            Key:           rooms
            Max Capacity:  1000
            Min Capacity:  500
          Type:            Counter
      Type:                Schedule
      Buffer:
        Buffer Size:   5
        Max Replicas:  2000
        Min Replicas:  100
      Id:              default
      Type:            Buffer
    Type:              Chain
  Sync:
    Fixed Interval:
      Seconds:  30
    Type:       FixedInterval
Status:
  Able To Scale:     true
  Current Replicas:  100
  Desired Replicas:  100
  Last Scale Time:   2025-03-21T18:27:33Z
  Scaling Limited:   true
Events:
  Type     Reason              Age                  From                        Message
  ----     ------              ----                 ----                        -------
  Normal   AutoScalingFleet    10m                  fleetautoscaler-controller  Scaling fleet fleet-example from 2 to 100
  Warning  ScalingLimited      9m43s (x2 over 10m)  fleetautoscaler-controller  Scaling fleet fleet-example was limited to minimum size of 100
  Normal   ChainPolicyApplied  13s (x21 over 10m)   fleetautoscaler-controller  FleetAutoscaler 'chain-fleet-autoscaler' applied ChainPolicy | ID: default | Type: Buffer
karavindh@karavindh:~$ 
karavindh@karavindh:~$ k logs -n agones-system agones-controller-7ff77d548b-kszbx | grep "chain-fleet-autoscaler"
{"fasKey":"default/chain-fleet-autoscaler","message":"Processing","queue":"autoscaling.agones.dev.FleetAutoscalerController","severity":"debug","source":"*fleetautoscalers.Controller","time":"2025-03-21T18:28:03.777824323Z"}
{"fasKey":"default/chain-fleet-autoscaler","message":"Synchronising","severity":"debug","source":"*fleetautoscalers.Controller","time":"2025-03-21T18:28:03.777959318Z"}
{"fasKey":"chain-fleet-autoscaler","message":"Fleet autoscaler check: Schedule not active for fleet fleet-example","severity":"debug","source":"*fleetautoscalers.Controller","time":"2025-03-21T18:28:03.7780303Z"}
{"fasKey":"chain-fleet-autoscaler","message":"Fleet autoscaler check: Schedule not active for fleet fleet-example","severity":"debug","source":"*fleetautoscalers.Controller","time":"2025-03-21T18:28:03.778073074Z"}
{"fasKey":"chain-fleet-autoscaler","message":"Fleet Autoscaler operation completed for fleet: fleet-example, with BufferPolicy: {0 5 }","severity":"debug","source":"*fleetautoscalers.Controller","time":"2025-03-21T18:28:03.778110277Z"}
{"fasKey":"default/chain-fleet-autoscaler","message":"Computed desired fleet size: 100, Scaling limited: true","severity":"debug","source":"*fleetautoscalers.Controller","time":"2025-03-21T18:28:03.778172408Z"}

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 49128080-b1a1-49fb-86b0-0ef305887429

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 8c857142-ea5f-496f-ab18-168358f4ed0e

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@igooch
Copy link
Collaborator

igooch commented Mar 24, 2025

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: b0da549d-cefb-4b56-ba9d-b0b91a2ff116

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@indexjoseph
Copy link
Contributor

indexjoseph commented Mar 24, 2025

I think this all looks good except for one thing - it seems like the current functionality we're logging on every sync cycle. I'm not sure if it matters that much but I think we should only log when there's a change within the chain. For instance, we'd log when the FleetAutoscaler switches from applying the default policy to applying the weekend policy.

@indexjoseph
Copy link
Contributor

I'm also curious if this bug can be consolidated with #3955.

@0xaravindh
Copy link
Member Author

#3955

I think this all looks good except for one thing - it seems like the current functionality we're logging on every sync cycle. I'm not sure if it matters that much but I think we should only log when there's a change within the chain. For instance, we'd log when the FleetAutoscaler switches from applying the default policy to applying the weekend policy.

https://github.com/0xaravindh/agones/blob/09ca94005be43ae0159631177d0f51f086418530/pkg/fleetautoscalers/fleetautoscalers.go#L465-L470

You want to stop logging the default state. right?

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 875a1333-609a-4c35-b15c-d34208dac51e

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4131/head:pr_4131 && git checkout pr_4131
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.48.0-dev-09ca940

@indexjoseph
Copy link
Contributor

indexjoseph commented Mar 25, 2025

#3955

I think this all looks good except for one thing - it seems like the current functionality we're logging on every sync cycle. I'm not sure if it matters that much but I think we should only log when there's a change within the chain. For instance, we'd log when the FleetAutoscaler switches from applying the default policy to applying the weekend policy.

https://github.com/0xaravindh/agones/blob/09ca94005be43ae0159631177d0f51f086418530/pkg/fleetautoscalers/fleetautoscalers.go#L465-L470

You want to stop logging the default state. right?

Not necessarily, I'm pointing out that the current implementation logs the applied policy and fleet size on every synchronization cycle of the FleetAutoscaler. This results in repetitive event entries, even when there's no actual change in the applied policy or the fleet's size.

My suggestion is to modify the logging behavior so that it only occurs when the FleetAutoscaler switches from applying one policy to another (e.g., from 'default' to 'weekend'). Although I'm unsure if this is possible without adding logic to hold state, as of right now there's no way to compare the previous policy within a chain against the current policy and determine whether a new log entry is needed, it's up to you if you think it's necessary.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 51940118-42fc-4173-9090-f79f0d634ad3

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@0xaravindh
Copy link
Member Author

#3955

I think this all looks good except for one thing - it seems like the current functionality we're logging on every sync cycle. I'm not sure if it matters that much but I think we should only log when there's a change within the chain. For instance, we'd log when the FleetAutoscaler switches from applying the default policy to applying the weekend policy.

https://github.com/0xaravindh/agones/blob/09ca94005be43ae0159631177d0f51f086418530/pkg/fleetautoscalers/fleetautoscalers.go#L465-L470
You want to stop logging the default state. right?

Not necessarily, I'm pointing out that the current implementation logs the applied policy and fleet size on every synchronization cycle of the FleetAutoscaler. This results in repetitive event entries, even when there's no actual change in the applied policy or the fleet's size.

My suggestion is to modify the logging behavior so that it only occurs when the FleetAutoscaler switches from applying one policy to another (e.g., from 'default' to 'weekend'). Although I'm unsure if this is possible without adding logic to hold state, as of right now there's no way to compare the previous policy within a chain against the current policy and determine whether a new log entry is needed, it's up to you if you think it's necessary.

@indexjoseph I added a LastAppliedPolicy field to track the previously applied policy. Now, logs will only show when the policy actually changes (e.g., from default to weekend). This should prevent the repeated event entries you pointed out. Let me know if that works!

karavindh@karavindh:~$ kubectl describe fleetautoscaler chain-fleet-autoscaler
Name:         chain-fleet-autoscaler
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.agones.dev/v1
Kind:         FleetAutoscaler
Metadata:
  Creation Timestamp:  2025-04-16T07:11:28Z
  Generation:          1
  Resource Version:    1744787518525023009
  UID:                 8a835d61-5578-4a84-85e1-861570bd61ab
Spec:
  Fleet Name:  fleet-example
  Policy:
    Chain:
      Id:  weekday
      Schedule:
        Active Period:
          Duration:    5h
          Start Cron:  0 0 * * *
          Timezone:    America/Los_Angeles
        Between:
          End:    2025-07-01T07:00:00Z
          Start:  2025-04-01T07:00:00Z
        Policy:
          Buffer:
            Buffer Size:   50
            Max Replicas:  120
            Min Replicas:  30
          Type:            Buffer
      Type:                Schedule
      Id:                  weekend
      Schedule:
        Active Period:
          Duration:    5h
          Start Cron:  0 0 * * *
          Timezone:    America/Los_Angeles
        Between:
          End:    2025-07-01T07:00:00Z
          Start:  2025-04-01T07:00:00Z
        Policy:
          Counter:
            Buffer Size:   10
            Key:           rooms
            Max Capacity:  120
            Min Capacity:  30
          Type:            Counter
      Type:                Schedule
      Buffer:
        Buffer Size:   5
        Max Replicas:  100
        Min Replicas:  20
      Id:              default
      Type:            Buffer
    Type:              Chain
  Sync:
    Fixed Interval:
      Seconds:  30
    Type:       FixedInterval
Status:
  Able To Scale:        true
  Current Replicas:     50
  Desired Replicas:     50
  Last Applied Policy:  Chain:weekday:Schedule
  Last Scale Time:      2025-04-16T07:11:28Z
  Scaling Limited:      false
Events:
  Type    Reason            Age   From                        Message
  ----    ------            ----  ----                        -------
  Normal  ChainPolicy       20m   fleetautoscaler-controller  FleetAutoscaler 'chain-fleet-autoscaler' successfully applied ChainPolicy | ID: weekday | Type: Schedule
  Normal  AutoScalingFleet  20m   fleetautoscaler-controller  Scaling fleet fleet-example from 2 to 50
karavindh@karavindh:~$ 

@0xaravindh
Copy link
Member Author

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 33faf044-500e-40ea-a8f9-9ee8107a6924

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: a1dbfb6e-ac25-4330-8bb3-b1e515370288

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 34084e68-3cd6-4a26-a0a7-d20a65fb7a89

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: f48d9bb5-7330-4b04-8caa-a89bf80a44e9

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4131/head:pr_4131 && git checkout pr_4131
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.49.0-dev-a8111e2

@indexjoseph
Copy link
Contributor

@indexjoseph I added a LastAppliedPolicy field to track the previously applied policy. Now, logs will only show when the policy actually changes (e.g., from default to weekend). This should prevent the repeated event entries you pointed out. Let me know if that works!

Yep, LGTM!

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 65e690b7-f645-4e19-bed9-f788c13751bc

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4131/head:pr_4131 && git checkout pr_4131
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.49.0-dev-490d764

Copy link
Collaborator

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions

Copy link
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit or e2e test that checks that the last applied policy works as expected? And throws handles errors as expected?

Comment on lines +83 to +85
// WithLastAppliedPolicy sets the LastAppliedPolicy field in the declarative configuration to the given value
// and returns the receiver, so that objects can be built by chaining "With" function invocations.
// If called multiple times, the LastAppliedPolicy field is set to the value of the last call.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence of this comment is confusing. Could you reword?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't edit this file. please correct me if i'm wrong

// This code was autogenerated. Do not edit directly.
// Code generated by applyconfiguration-gen. DO NOT EDIT.

@0xaravindh
Copy link
Member Author

Is there a unit or e2e test that checks that the last applied policy works as expected? And throws handles errors as expected?

Yes, I added unit and e2e tests to verify LastAppliedPolicy and error handling.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 021e7380-1dac-4f66-a4aa-4f55999e1da2

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@0xaravindh
Copy link
Member Author

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 36aa6bc7-1611-45c3-80a6-9626df52343b

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: a873a5c6-a422-4c16-9822-4a72c2aa4579

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4131/head:pr_4131 && git checkout pr_4131
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.49.0-dev-a8c8835

@@ -1301,6 +1301,20 @@ func TestCounterAutoscalerAllocatedMultipleNamespaces(t *testing.T) {
return fas.Status.CurrentReplicas > 0
})

// Wait for LastAppliedPolicy to be set to CounterPolicyType for fasA
framework.WaitForFleetAutoScalerCondition(t, fasA, func(log *logrus.Entry, fas *autoscalingv1.FleetAutoscaler) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this is under the test TestCounterAutoscalerAllocatedMultipleNamespaces. Could you move either to its own test, or under TestChainAutoscaler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added couple of testcase for TestChainAutoscaler and TestAutoscalerWebhook

Copy link
Collaborator

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, i believe Ivy requested changes

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 70386160-7a67-44f0-91bc-251ae2eaf588

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4131/head:pr_4131 && git checkout pr_4131
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.50.0-dev-f2594db

Copy link
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@igooch igooch merged commit 7f92611 into googleforgames:main May 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Refactoring code, fixing up documentation, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add chain ID status to fleetautoscaler event logger
5 participants