Skip to content

bug: create too many nodes when informercache latency is high #1300

@zhuangqh

Description

@zhuangqh

Describe the bug

In the nodeclaim creation phase. There are mainly 3 steps.

  1. create several nodeclaims
  2. waiting for these nodeclaims to be ready
  3. get specific node object
    https://github.com/kaito-project/kaito/blob/main/pkg/workspace/controllers/workspace_controller.go#L482

We list nodeclaim from informercache in step 2, if local informercache is inconsistent with apiserver, we won't get the right nodeclaim to wait. Then step3 fails, the workspace will be reconciled again.

If the informercache remains inconsistent with apiserver for a while, we will end up creating many nodeclaims.

I observed this happening in the e2e tests. The informercache have more than 10seconds delay. The workspace controller create 200+ nodeclaims.
https://github.com/kaito-project/kaito/actions/runs/16444568532

Steps To Reproduce

Expected behavior

Logs

Environment

  • Kubernetes version (use kubectl version):
  • OS (e.g: cat /etc/os-release):
  • Install tools:
  • Others:

Additional context

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions