Skip to content

Is there a potential regression in performance of CAS Add() in v1.21.0? #1748

@chlunde

Description

@chlunde

Re. #1661

I wonder if we should consider the value for 10ms for the initial sleep, because in my mind this is a very long time. I worry that callers of Add() would not expect that, and that it could lead to high tail latencies and lower throughput for a normal workflow.

I think the microbenchmark is biased against normal workloads

  • it does not perform any work between calls to Add()
  • In many cases, there is a very high number of goroutines compared to cores. This hides the throughput-problem.

To illustrate the issue, I have drafted what I hope is a more balanced microbenchmark with four goroutines. On my machine it shows regressions compared to 1.20.0. A task that took 3ms before now takes 13ms to 39ms, median values around 25ms. With an initial sleep of 10 µs instead of 10 ms, the execution time is still 3-4ms.

func TestAtomicUpdateFloatLatency(t *testing.T) {
	var val float64 = 0.0
	bits := (*uint64)(unsafe.Pointer(&val))
	var wg sync.WaitGroup
	numGoroutines := 4 // Low contention scenario
	latencies := make([]time.Duration, numGoroutines)
	workData := make([]int, numGoroutines)

	work := func(idx int) {
		// Simulate some work
		j := workData[idx]
		for i := 0; i < 10000; i++ {
			j += i
		}

		workData[idx] += j
	}

	testStart := time.Now()
	for i := 0; i < numGoroutines; i++ {
		wg.Add(1)
		go func(idx int) {
			defer wg.Done()

			for j := 0; j < 1000; j++ {
				// t := time.Now() // uncomment to discover how long it takes to do the work on your machine
				work(idx)
				// fmt.Println("Work took", time.Since(t))

				start := time.Now()
				atomicUpdateFloat(bits, func(f float64) float64 {
					return f + 1.0
				})
				latencies[idx] += time.Since(start)
			}
		}(i)
	}

	wg.Wait()

	// Check if any operation took longer than expected due to backoff
	for i, lat := range latencies {
		if lat >= 5*time.Millisecond {
			t.Errorf("Operation %d took %v, which suggests unnecessary backoff delay", i, lat)
		}
	}

	t.Logf("Test took %v", time.Since(testStart))
}

10 ms

$ go test -count=10 -v -run=TestAtomicUpdateFloatLatency
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 2 took 11.077548ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 10.215797ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 13.678618ms
--- FAIL: TestAtomicUpdateFloatLatency (0.01s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 21.180438ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 10.336778ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 24.43109ms
--- FAIL: TestAtomicUpdateFloatLatency (0.02s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 10.22996ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 1 took 20.774034ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 2 took 10.588107ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 25.983047ms
--- FAIL: TestAtomicUpdateFloatLatency (0.03s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 1 took 10.4611ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 2 took 21.187434ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 10.116956ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 28.06019ms
--- FAIL: TestAtomicUpdateFloatLatency (0.03s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 10.948254ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 2 took 10.333145ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 15.054686ms
--- FAIL: TestAtomicUpdateFloatLatency (0.02s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 10.308113ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 20.704634ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 24.944142ms
--- FAIL: TestAtomicUpdateFloatLatency (0.02s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 21.336846ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 1 took 10.494176ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 31.890589ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 36.762684ms
--- FAIL: TestAtomicUpdateFloatLatency (0.04s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 10.540261ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 21.214645ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 29.41141ms
--- FAIL: TestAtomicUpdateFloatLatency (0.03s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 0 took 10.518466ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 1 took 22.075644ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 2 took 31.846374ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 37.270233ms
--- FAIL: TestAtomicUpdateFloatLatency (0.04s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:213: Operation 1 took 21.45678ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 2 took 10.683375ms, which suggests unnecessary backoff delay
    atomic_update_test.go:213: Operation 3 took 30.958185ms, which suggests unnecessary backoff delay
    atomic_update_test.go:217: Test took 39.396503ms
--- FAIL: TestAtomicUpdateFloatLatency (0.04s)
FAIL
exit status 1
FAIL	github.com/prometheus/client_golang/prometheus	0.280s

10 µs

go test -count=10 -v -run=TestAtomicUpdateFloatLatency
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 4.733696ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 5.024125ms
--- PASS: TestAtomicUpdateFloatLatency (0.01s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.034559ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.877559ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 2.856746ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.730594ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.782137ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 2.988741ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 4.279611ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.001143ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
PASS
ok  	github.com/prometheus/client_golang/prometheus	0.041s

No sleep

go test -count=10 -v -run=TestAtomicUpdateFloatLatency
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.753508ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.454005ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.747001ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.300866ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.250745ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 2.99001ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.589476ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.858731ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 4.221545ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.713847ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
PASS
ok  	github.com/prometheus/client_golang/prometheus	0.039s

runtime.Gosched()

go test -count=10 -v -run=TestAtomicUpdateFloatLatency
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.875757ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.412897ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.197536ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.328073ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.137486ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.119635ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 2.965832ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.105811ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 3.154215ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
=== RUN   TestAtomicUpdateFloatLatency
    atomic_update_test.go:217: Test took 2.921208ms
--- PASS: TestAtomicUpdateFloatLatency (0.00s)
PASS
ok  	github.com/prometheus/client_golang/prometheus	0.036s

--

edit: replaced Inc() with Add() as Inc is not relevant

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions