Skip to content

Conversation

markmandel
Copy link
Collaborator

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking

/kind bug

/kind cleanup
/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

When sidecar containers are enabled, if a Pod exit(0)s, then it will end up in a Succeeded state, and the backing GameServer will not get deleted.

This introduces a new controller to check for this Pod state and move the GameServer to a Shutdown state.

Also updates all records of simple-game-server to the latest version, since we need it to test that this fix works.

Which issue(s) this PR fixes:

Closes #4188

Special notes for your reviewer:

Won't pass until https://us-docker.pkg.dev/agones-images/examples/simple-game-server:0.38 is published.

@github-actions github-actions bot added the kind/bug These are bugs. label Jun 6, 2025
@markmandel markmandel requested review from igooch and peterzhongyi June 6, 2025 07:43
@github-actions github-actions bot added the size/M label Jun 6, 2025
@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 0d598dd6-57f6-4526-be3c-bf5b43bd7ce6

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@0xaravindh
Copy link
Member

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 5926551c-2d54-4be0-bf50-f8a1af75f2e8

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator Author

Gah, lint test failure! Rookie mistake! 🤣

@markmandel markmandel force-pushed the example/crash-exist-code branch from aff1ca0 to 354b91d Compare June 11, 2025 11:32
@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: c9e7a4f8-88f8-4606-b11c-938ccebc616b

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator Author

Welp, something went definitely wrong here:

VERBOSE:     gameserver_test.go:326: 
VERBOSE:         	Error Trace:	/go/src/agones.dev/agones/test/e2e/gameserver_test.go:326
VERBOSE:         	Error:      	Received unexpected error:
VERBOSE:         	            	GameServer reached terminal state Shutdown
VERBOSE:         	            	agones.dev/agones/test/e2e/framework.(*Framework).WaitForGameServerState.func1
VERBOSE:         	            		/go/src/agones.dev/agones/test/e2e/framework/framework.go:279
VERBOSE:         	            	k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2
VERBOSE:         	            		/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:87
VERBOSE:         	            	k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
VERBOSE:         	            		/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:88
VERBOSE:         	            	k8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout
VERBOSE:         	            		/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:48
VERBOSE:         	            	agones.dev/agones/test/e2e/framework.(*Framework).WaitForGameServerState
VERBOSE:         	            		/go/src/agones.dev/agones/test/e2e/framework/framework.go:259
VERBOSE:         	            	agones.dev/agones/test/e2e.TestGameServerUnhealthyAfterDeletingPod
VERBOSE:         	            		/go/src/agones.dev/agones/test/e2e/gameserver_test.go:325
VERBOSE:         	            	testing.tRunner
VERBOSE:         	            		/usr/local/go/src/testing/testing.go:1690
VERBOSE:         	            	runtime.goexit
VERBOSE:         	            		/usr/local/go/src/runtime/asm_amd64.s:1700
VERBOSE:         	            	waiting for GameServer 1749643344/game-servergxm9n to be Unhealthy
VERBOSE:         	            	agones.dev/agones/test/e2e/framework.(*Framework).WaitForGameServerState
VERBOSE:         	            		/go/src/agones.dev/agones/test/e2e/framework/framework.go:288
VERBOSE:         	            	agones.dev/agones/test/e2e.TestGameServerUnhealthyAfterDeletingPod
VERBOSE:         	            		/go/src/agones.dev/agones/test/e2e/gameserver_test.go:325
VERBOSE:         	            	testing.tRunner
VERBOSE:         	            		/usr/local/go/src/testing/testing.go:1690
VERBOSE:         	            	runtime.goexit
VERBOSE:         	            		/usr/local/go/src/runtime/asm_amd64.s:1700
VERBOSE:         	Test:       	TestGameServerUnhealthyAfterDeletingPod
VERBOSE: --- FAIL: TestGameServerUnhealthyAfterDeletingPod (9.76s)
VERBOSE: FAIL

When sidecar containers are enabled, if a Pod `exit(0)`s, then it will
end up in a Succeeded state, and the backing GameServer will not get
deleted.

This introduces a new controller to check for this Pod state and move
the `GameServer` to a `Shutdown` state.

Also updates all records of simple-game-server to the latest version,
since we need it to test that this fix works.

Closes googleforgames#4188
@markmandel markmandel force-pushed the example/crash-exist-code branch from 354b91d to a14eb74 Compare June 13, 2025 23:11
@markmandel
Copy link
Collaborator Author

🤞🏻 I believe that should fix all those issues.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: a62d63b0-fb18-4333-803a-9fff2fece65f

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator Author

Gah! #4163 got me again.

Warning: autopilot-default-resources-mutator:Autopilot updated Job default/upgrade-test-runner: defaulted unspecified 'cpu' resource for containers [upgrade-test-controller] (see http://g.co/gke/autopilot-defaults).
job.batch/upgrade-test-runner created
pod/upgrade-test-runner-q6shk condition met
Wait for job upgrade-test-runner to complete or fail on cluster gke-autopilot-upgrade-test-cluster-1-32
SuccessCriteriaMetSuccessCriteriaMetSuccessCriteriaMetSuccessCriteriaMetSuccessCriteriaMet/tmp/tmp.9Mw2PmalbK/standard-upgrade-test-cluster-1-31.log: SuccessCriteriaMet
/tmp/tmp.9Mw2PmalbK/standard-upgrade-test-cluster-1-33.log: SuccessCriteriaMet
/tmp/tmp.9Mw2PmalbK/standard-upgrade-test-cluster-1-32.log: SuccessCriteriaMet
FailureTarget/tmp/tmp.9Mw2PmalbK/gke-autopilot-upgrade-test-cluster-1-31.log: FailureTarget

@markmandel markmandel force-pushed the example/crash-exist-code branch from a14eb74 to c36629c Compare June 14, 2025 02:08
@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 23ea4f3f-2a99-4a6c-b248-32e540b65d56

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator Author

Ooh, new bug.

Feature flags disabled.

time="2025-06-14 02:56:56.232" level=info msg="checking if GameServer exists" error="<nil>" gs=game-servertlcr2 state=Unhealthy test=TestGameServerPodCompletedAfterCleanExit
time="2025-06-14 02:56:59.231" level=info msg="checking if GameServer exists" error="<nil>" gs=game-servertlcr2 state=Unhealthy test=TestGameServerPodCompletedAfterCleanExit
time="2025-06-14 02:57:02.187" level=info msg="checking if GameServer exists" error="client rate limiter Wait returned an error: context deadline exceeded" gs=game-servertlcr2 state= test=TestGameServerPodCompletedAfterCleanExit
    gameserver_test.go:544: 
        	Error Trace:	/go/src/agones.dev/agones/test/e2e/gameserver_test.go:544
        	Error:      	Received unexpected error:
        	            	client rate limiter Wait returned an error: context deadline exceeded
        	Test:       	TestGameServerPodCompletedAfterCleanExit
--- FAIL: TestGameServerPodCompletedAfterCleanExit (183.19s)

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: fab54ff8-f77c-497c-9347-ec4805f480c1

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel markmandel force-pushed the example/crash-exist-code branch from b1d1c1a to 8e6055a Compare June 14, 2025 11:36
@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 1015e240-5015-44bb-84a8-f41488273b27

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

* Ignore terminating pods.
* Ignores Error or Unhealthy states.
* Minor cleanup of failing e2e test, to help with error reporting.
@markmandel
Copy link
Collaborator Author

Blarg. More submit-upgrade-test-cloud-build flakiness. Unless I'm doing something wrong? 🤔

@markmandel markmandel force-pushed the example/crash-exist-code branch from 8e6055a to 6ec5418 Compare June 14, 2025 12:15
@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: e58b3483-5854-40dc-bcfc-287047224b6e

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator Author

Okay, seems like submit-upgrade-test-cloud-build is failing for a reason, since it's failed twice in a row -- @igooch , @0xaravindh , @peterzhongyi - do you have special abilities to tell me why, and what's the reason?

@markmandel
Copy link
Collaborator Author

eeyyy! it finally passed. Okay, was just bad flakiness.

We should really get that step de-flaked.

@markmandel markmandel requested a review from gongmax June 22, 2025 22:14
@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: afb9221a-7879-48af-9fb4-06b1c7feed0c

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4201/head:pr_4201 && git checkout pr_4201
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.51.0-dev-7c1a019

@markmandel
Copy link
Collaborator Author

bump bump @igooch , @peterzhongyi @gongmax please.

@lacroixthomas
Copy link
Collaborator

LGTM

Copy link
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@igooch igooch merged commit 6b6fb43 into googleforgames:main Jun 27, 2025
4 checks passed
@markmandel markmandel deleted the example/crash-exist-code branch June 30, 2025 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Game servers remain Allocated with Completed backing Pods (SidecarContainers feature flag)
5 participants