Skip to content

Conversation

joshuay03
Copy link
Contributor

@joshuay03 joshuay03 commented Dec 30, 2024

Description

Context:

With fork_worker un-commented and the original sleep the test times out for me locally, but doesn't on this branch. We go from ~10 restarts to ~3 restarts in a non-fork_worker test like this one, which I think is perfectly reasonable for what these tests are asserting.

Fixing this is necessary to unblock #3297.

cc/ @MSP-Greg

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@joshuay03 joshuay03 changed the title Reduce TestIntegration#restart_does_not_drop_connections restart frequency Reduce TestIntegration#restart_does_not_drop_connections restart frequency Dec 30, 2024
@joshuay03 joshuay03 force-pushed the fix-timing-out-phased-restart-fork-worker-test branch from 1732a15 to ac6bdca Compare December 30, 2024 02:00
@github-actions github-actions bot added the waiting-for-review Waiting on review from anyone label Dec 30, 2024
@joshuay03 joshuay03 force-pushed the fix-timing-out-phased-restart-fork-worker-test branch 4 times, most recently from cfaf63a to d6155e8 Compare December 30, 2024 03:46
Comment on lines 43 to 46
def test_phased_restart_does_not_drop_connections_threads_fork_worker
restart_does_not_drop_connections num_threads: 10, total_requests: 3_000,
signal: :USR1 #, config: 'fork_worker', log: true
signal: :USR1, config: 'fork_worker'
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only test that needs sleep 1 in restart_does_not_drop_connections, what about passing that in from the test? Making it more clear why we need to sleep 1 second

Or does it always make sense to sleep 1 second? I'm not sure what we're waiting for, that should probably be explained better in the comment above the sleep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does it always make sense to sleep 1 second? I'm not sure what we're waiting for, that should probably be explained better in the comment above the sleep.

IMO it always makes sense to sleep 1 second. The sleep is just how long we wait before signalling another restart. All the tests that use #restart_does_not_drop_connections at most set a total_requests of 3000. With a sleep of 0.15 we restart ~10 times, so something like 300 requests before each restart. With a sleep of 1, ~3 times, so 1000 requests before each restart.

Seeing as the tests are to ensure connections aren't dropped:

  1. I don't think the number of restarts really matters, as long as there's at least one
  2. I don't think (hot/phased) restarting puma every 300 requests consecutively is a realistic scenario anyway

Essentially, this is just an arbitrary threshold that just so happens to time out one particular test, and I don't think adjusting it to the new value for all tests affects their accuracy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, thanks for explaining!

Copy link
Contributor Author

@joshuay03 joshuay03 Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were still 1/2 macOS failures each run with the increased sleep. Tried increasing it further and also added an assertion for restart count to ensure there are always at least 2 restarts and the assertion started failing i.e. with that high a sleep we only restart once. I instead went for an exponential back-off which seems to be yield a more consistent and desirable result.

The remaining failure seems to be unrelated, would appreciate if someone could rerun to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks.

Yes, unrelated: #3478 (comment)

@joshuay03 joshuay03 force-pushed the fix-timing-out-phased-restart-fork-worker-test branch 8 times, most recently from eb47e82 to e2191b5 Compare January 2, 2025 09:44
@joshuay03 joshuay03 force-pushed the fix-timing-out-phased-restart-fork-worker-test branch from e2191b5 to b1f5f48 Compare January 2, 2025 09:46
@MSP-Greg
Copy link
Member

MSP-Greg commented Jan 3, 2025

@joshuay03

Good day. Thanks for working on this. The test probably evolved from code tested locally. That often can be an issue with GHA CI. I'm sure you've had test code that ran fine locally, but failed when run in GHA.

I did a revamp of the code, where the number of restarts can be specified. See:

master...MSP-Greg:puma:00-hot-restart

I've run it in CI a few times, and it seems more reliable than the current code. Any thoughts? Feel free to use the code. I haven't worked on the fork_worker issue...

@dentarg dentarg mentioned this pull request Jan 10, 2025
@dentarg dentarg merged commit a478d4d into puma:master Jan 10, 2025
86 of 87 checks passed
@dentarg dentarg removed the waiting-for-review Waiting on review from anyone label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants