Skip to content

Filter by namespace intermittently includes all namespaces #388

@abean-work

Description

@abean-work




Description:
When attempting to run Popeye against a namespace, it will intermittently (around half the time) fail due to problems in other namespaces.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a pod that will cause Popeye to fail: kubectl run fail-pod --image=nonexistent/nonexistentimage:latest -n test
  2. Scan a different namespace that is healthy: popeye -n healthy -l error -f ./spinach.yml
  3. Repeat the scan until it fails.
  • Most scans will return healthy with no issues, e.g.:
    PODS (5 SCANNED)                                                             💥 0 😱 0 🔊 0 ✅ 5 100٪
    
    ┅┅┅┅┅┅┅
    · Nothing to report.
    
  • Occasionally, it will fail due to the pod in the other namespace (notice a lot more pods are included in the scan):
    PODS (30 SCANNED)                                                            💥 1 😱 0 🔊 0 ✅ 29 96٪
    ┅┅┅┅┅┅┅
    · test/fail-pod...............................................................................💥
      💥 [POP-207] Pod is in an unhappy phase (Pending).
      🐳 fail-pod
        💥 [POP-203] Pod is waiting [0/1] ImagePullBackOff.
    

Using the following (crude) command, I was able to reproduce the error easily:

> repeat 20 { popeye -n healthy -l error -f ./spinach.yml > /dev/null 2>&1; echo $?}
1
0
0
1
0
0
1
0
0
1
1
0
0
0
0
1
1
1
0
1

The exit codes show that, in this instance, 9 out of 20 scans failed due to including resources from other namespaces. When repeating this command, the number of failures has always been between 8 and 12, so roughly half the time it fails.

Expected behavior

  1. The namespace flag should restrict the popeye scan to that namespace.
  2. Scans are consistent in the resources they include.

Versions (please complete the following information):

  • OS: OSX 14.7 and Ubuntu 22.04
  • Popeye: 0.21.5
  • K8s: 1.29.8

Additional context

Our team owns/manages a number of namespaces on shared Kubernetes (AKS) clusters, which we are scanning individually using the -n flag and then aggregating the JUnit output.

These namespaces are looped through, so the scans happen immediately after one another. I've tried adding sleeps between scans, but this didn't help.

This could be related to #314, but I've created a new issue as it does work some of the time.

Spinach config:
---
# Popeye configuration using the AKS sample as a base.
# See: https://github.com/derailed/popeye/blob/master/spinach/spinach_aks.yml
popeye:
  allocations:
    cpu:
      # Checks if cpu is under allocated by more than x% at current load.
      underPercUtilization: 200
      # Checks if cpu is over allocated by more than x% at current load.
      overPercUtilization: 50
    memory:
      # Checks if mem is under allocated by more than x% at current load.
      underPercUtilization: 200
      # Checks if mem is over allocated by more than x% at current load.
      overPercUtilization: 50

  # Excludes define rules to exempt resources from sanitization
  excludes:
    global:
      fqns:
        # Exclude kube-system namespace
        - rx:^kube-system/

    linters:
      # Exclude system CRBs
      clusterrolebindings:
        instances:
          - fqns:
              - rx:^aks
              - rx:^omsagent
              - rx:^system

      # Exclude system CRs
      clusterroles:
        instances:
          - fqns:
              - rx:^system
              - admin
              - cluster-admin
              - edit
              - omsagent-reader
              - view
            codes: [400]

      # Exclude unused windows daemonset
      daemonsets:
        instances:
          - fqns: [calico-system/calico-windows-upgrade]
            codes: [508]

      # Exclude due to intermittent false positives
      serviceaccounts:
        codes: ["305"]

  resources:
    # Nodes specific sanitization
    node:
      limits:
        cpu: 90
        memory: 80

    # Pods specific sanitization
    pod:
      limits:
        # Fail if cpu is over x%
        # Set intentionally high to ignore (if you comment it out, it'll default to 80)
        cpu: 250
        # Set intentionally high to ignore (if you comment it out, it'll default to 90)
        # Fail if pod mem is over x%
        memory: 900
      # Fail if more than x restarts on any pods
      restarts: 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingno-repro

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions