Skip to content

Runtime awareness of memory consumption to prevent OOM #6364

@berndverst

Description

@berndverst

What: The runtime should be self-aware of its memory consumption and gracefully pause or refuse additional requests.

Why: This is to avoid the sidecar container from being killed due to being Out of Memory (OOM).

Who benefits from this? This is especially important when a memory limit has been defined on the Dapr sidecar, or when Dapr is run on systems with small available memory. That could be in an IoT / embedded scenario, on a Raspberry Pi Kubernetes Cluster - or simply a small Kubernetes cluster with very small VM sizes.

How can this be done?

Conceptually, I recommend that the Dapr HTTP and gRPC server upon each incoming request immediately check memory utilization. If the utilization is beyond the current allowable limit (say Kubernetes container resource limit - 50MB):

  1. Pause incoming connections for a certain period of time, frequently checking if the request can be processed because the memory utilization has been reduced.
  2. If after a certain period of time the memory utilization is not in the allowable range, the request should be denied with HTTP 503 (temporarily unavailable). This is far better than killing the sidecar pod.

Here is an example of how this could work conceptually:

Imagine this where my sidecar definition (the memory limit is exposed to the sidecar using the Kubernetes Downward API):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fasthttp-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fasthttp-example
  template:
    metadata:
      labels:
        app: fasthttp-example
    spec:
      containers:
      - name: daprd
        image: daprio/daprd:latest
        env:
        - name: MAX_MEMORY
          valueFrom:
            resourceFieldRef:
              containerName: daprd
              resource: limits.memory
              divisor: "1Mi"

And for example the HTTP request handler of the Dapr API server could work like so:

package main

import (
	"fmt"
	"log"
	"runtime"
	"time"

	"github.com/valyala/fasthttp"
)

const holdDuration = 3 * time.Second
const checkInterval = 100 * time.Millisecond

func main() {
	maxMemoryStr := os.Getenv("MAX_MEMORY")
	if maxMemoryStr == "" {
		log.Fatal("MAX_MEMORY environment variable not set")
	}
	maxMemory, err := strconv.ParseInt(maxMemoryStr, 10, 64)
	if err != nil {
		log.Fatalf("Invalid MAX_MEMORY value: %v", err)
	}
	maxMemory *= 1024 * 1024

	server := fasthttp.Server{
		Handler: func(ctx *fasthttp.RequestCtx) {
			start := time.Now()
			for time.Since(start) < holdDuration {
				var m runtime.MemStats
				runtime.ReadMemStats(&m)
				if m.Alloc <= maxMemory {
					fmt.Fprintf(ctx, "Hello, world!\n")
					return
				}
				time.Sleep(checkInterval)
			}
			ctx.SetStatusCode(fasthttp.StatusServiceUnavailable)
			fmt.Fprintf(ctx, "Memory usage exceeded %d bytes\n", maxMemory)
		},
	}
	log.Fatal(server.ListenAndServe(":8080"))
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions