Skip to content

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented Jun 12, 2025

This commit fixes a race condition when timer fires ⚡️ and context gets canceled 💥 at the same time. This race condition was detected the the TestCollectionCancellation and happens as follows ...

monitor_collection_test.go:201:
Error Trace:    .../monitor_collection_test.go:201
Error:          Not equal:
                expected: time.Date(2025, time.June, 11, 8, 14, 45, 720156604, time.Local)
                actual  : time.Date(2025, time.June, 11, 8, 14, 45, 732389967, time.Local)
Test:           TestCollectionCancellation
Messages:       No collections should happen after cancellation

Race Sequence

  1. Test Context: TestCollectionCancellation runs with a 100ms timeout context

  2. Timer Setup: Collection goroutines are scheduled with 10ms intervals

  3. Cancellation: After ~50ms, test calls cancel() which triggers pm.collectionCancel()

  4. Run() Exit: The Run() method exits and test captures snapshotAfterRun.Timestamp

  5. Race Window: 💥 There's a brief window where:

    • Context is canceled (pm.collectionCtx.Done() becomes ready)
    • Timer also fires (<-timer becomes ready)
    • Go's select randomly chooses between the two ready channels
  6. Failure Case: When select chooses <-timer:

    • synchronizedPowerRefresh() executes and updates snapshot with new timestamp
    • Test assertion fails because timestamps don't match

Fixes #2144

@github-actions github-actions bot added the fix A bug fix label Jun 12, 2025
Copy link

codecov bot commented Jun 12, 2025

Codecov Report

Attention: Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.

Project coverage is 91.52%. Comparing base (02d572a) to head (99a271e).
Report is 2 commits behind head on reboot.

Files with missing lines Patch % Lines
internal/monitor/monitor.go 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           reboot    #2146      +/-   ##
==========================================
+ Coverage   91.50%   91.52%   +0.01%     
==========================================
  Files          36       36              
  Lines        3438     3444       +6     
==========================================
+ Hits         3146     3152       +6     
  Misses        225      225              
  Partials       67       67              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sthaha sthaha requested review from vimalk78 and vprashar2929 June 12, 2025 01:44
vprashar2929
vprashar2929 previously approved these changes Jun 12, 2025
@sthaha sthaha force-pushed the fix-race-condition branch from b28e5e2 to b5988d7 Compare June 12, 2025 08:02
vimalk78
vimalk78 previously approved these changes Jun 12, 2025
@vimalk78
Copy link
Collaborator

vimalk78 commented Jun 12, 2025

suggestion:
print pm.collectionCtx.Err() in log

		case <-pm.collectionCtx.Done():
			pm.logger.Info("Collection loop terminated")
			return
		}

@sthaha sthaha dismissed stale reviews from vimalk78 and vprashar2929 via f9c7d57 June 12, 2025 08:17
@sthaha sthaha force-pushed the fix-race-condition branch from b5988d7 to f9c7d57 Compare June 12, 2025 08:17
@sthaha sthaha enabled auto-merge June 12, 2025 08:24
This commit fixes a race condition when timer fires ⚡️ and context
gets canceled 💥 at the same time. This race condition was detected the the
TestCollectionCancellation and happens as follows ...

````
monitor_collection_test.go:201:
Error Trace:    .../monitor_collection_test.go:201
Error:          Not equal:
                expected: time.Date(2025, time.June, 11, 8, 14, 45, 720156604, time.Local)
                actual  : time.Date(2025, time.June, 11, 8, 14, 45, 732389967, time.Local)
Test:           TestCollectionCancellation
Messages:       No collections should happen after cancellation
````

Race Sequence
--------------

1. **Test Context**: `TestCollectionCancellation` runs with a 100ms timeout context
2. **Timer Setup**: Collection goroutines are scheduled with 10ms intervals
3. **Cancellation**: After ~50ms, test calls `cancel()` which triggers `pm.collectionCancel()`
4. **Run() Exit**: The `Run()` method exits and test captures `snapshotAfterRun.Timestamp`
5. **Race Window**: 💥 There's a brief window where:
   - Context is canceled (`pm.collectionCtx.Done()` becomes ready)
   - Timer also fires (`<-timer` becomes ready)
   - Go's `select` **randomly** chooses between the two ready channels

6. **Failure Case**: When `select` chooses `<-timer`:
   - `synchronizedPowerRefresh()` executes and updates snapshot with new timestamp
   - Test assertion fails because timestamps don't match

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@sthaha sthaha force-pushed the fix-race-condition branch from f9c7d57 to 99a271e Compare June 12, 2025 08:27
@sthaha sthaha merged commit 4d24510 into sustainable-computing-io:reboot Jun 12, 2025
16 of 17 checks passed
vimalk78 pushed a commit to vimalk78/kepler that referenced this pull request Jun 12, 2025
…-condition

fix(monitor): race condition between timer and context cancelation
vimalk78 pushed a commit to vimalk78/kepler that referenced this pull request Jun 12, 2025
…-condition

fix(monitor): race condition between timer and context cancelation
vimalk78 pushed a commit to vimalk78/kepler that referenced this pull request Jun 12, 2025
…-condition

fix(monitor): race condition between timer and context cancelation
vimalk78 pushed a commit to vimalk78/kepler that referenced this pull request Jun 12, 2025
…-condition

fix(monitor): race condition between timer and context cancelation
vimalk78 pushed a commit to vimalk78/kepler that referenced this pull request Jun 12, 2025
…-condition

fix(monitor): race condition between timer and context cancelation
vimalk78 pushed a commit to vimalk78/kepler that referenced this pull request Jun 12, 2025
…-condition

fix(monitor): race condition between timer and context cancelation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix A bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants