Skip to content

Windows Kubelet stats/summary endpoint returns "Internal Error: failed to list pod stats: failed to list all container stats on Windows" #98509

@jsturtevant

Description

@jsturtevant

What happened:

When reviewing Windows kubelet logs noticed errors like:

E0930 20:21:31.218007 5020 handler.go:321] HTTP InternalServerError serving /stats/summary: Internal Error: failed to list pod stats: failed to list all container
stats: rpc error: code = Unknown desc = container 46527a779618ed4197babc79889886b540f352d72503656eda3c37c2f247c4a0 encountered an error during Properties: failure in a
Windows system call: The requested virtual machine or container operation is not valid in the current state. (0xc0370105) 
E0930 20:21:34.026254 5020 handler.go:321] HTTP InternalServerError serving /stats/summary: Internal Error: failed to list pod stats: failed to list all container
stats: rpc error: code = Unknown desc = container d8a566cef3d9ed97442c37ee5f26fb4ff51aac19ad9c1b4bb6caf84acaf0afa5 encountered an error during Properties: failure in a
Windows system call: Access is denied. (0x5)

What you expected to happen:
The stats should be returned

How to reproduce it (as minimally and precisely as possible):
This occurs when the pods has just started or during termination. To simulate the issue:

k proxy
for (( ; ; ))
do
   curl http://localhost:8001/api/v1/nodes/1241k8s00000000/proxy/stats/summary
done

Then deploy pod:

k apply -f https://gist.githubusercontent.com/jsturtevant/5f49c3bd9218666af877927a674b7645/raw/fb042ff14b689c308c9404422e08cf85c56b6ff1/deployment.yaml

Anything else we need to know?:

The container is in a Created state but is has not started running so the call to get the stats fails:

stats, err := hcsshim_container.Statistics()
if err != nil {
return nil, err
}

This is handled in other places like moby:

https://github.com/moby/moby/blob/46cdcd206c56172b95ba5c77b827a722dab426c5/daemon/stats.go#L38-L39

and

cstats, err := fetchContainerStats(c)
if err != nil {
klog.V(4).Infof("Failed to fetch statistics for container %q with error '%v', continue to get stats for other containers", c.ID, err)
continue
}

Environment:

  • Kubernetes version (use kubectl version): 1.18+
  • Cloud provider or hardware configuration: aks-engine
  • OS (e.g: cat /etc/os-release): Windows
  • Kernel (e.g. uname -a): 2019
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others: Dockershim

/sig windows
/priority critical-urgent
/assign

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.sig/windowsCategorizes an issue or PR as relevant to SIG Windows.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions