-
Notifications
You must be signed in to change notification settings - Fork 334
Tune dragonfly to Remove Old Entries & Use Longtext to fit larger warming jobs #4145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* use longtext to fit larger clusters * fix scheduler and seed-client register logic
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4145 +/- ##
=======================================
Coverage 32.96% 32.96%
=======================================
Files 346 346
Lines 40922 40918 -4
=======================================
Hits 13490 13490
+ Misses 26538 26534 -4
Partials 894 894
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
@e-ngo I think the change to After the scheduler is started, it will maintain keepalive with the manager. If the scheduler down and the keepalive connection with the manager is disconnected, the scheduler instance status in the database will become inactive. However, for problem A, when the manager down, the scheduler restart cannot notify the manager that it has down. So unable to update the status of scheduler instance in the database to inactive. I think the solution to this problem is to scan the scheduler and seed client tables regularly in database, and automatically delete the row in the database if the scheduler and seed client do not report for more than 2 keepalive intervals. |
Yea, that makes sense. Can make that change |
@e-ngo just do it |
@e-ngo Hi, I think the issue @gaius-qi mentioned will be resolved by #4179, so this PR can be focused on the db column type change, so could you update your PR to only retain the "longtext" change, then we can merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@e-ngo We will merge it and then restore the change for removing old entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
use longtext to fit larger clusters
fix scheduler and seed-client register logic
Description
This change updates the manager peer & scheduler registration logic to use more stable fields, ie. host names as opposed to IPs which can change when a pod is rescheduled.
Also use longtext instead of just text so that warming jobs don't fail in large clusters.
Related Issue
Maybe dragonflyoss/client#1116
Motivation and Context
Pod IPs will change. Pods will be rescheduled to other nodes, etc. When manager returns list of schedulers, it grabs the list in order. Thus it will ALWAYS return a stale scheduler / seed client address. The scheduler/seed reregistration SHOULD over rid the OLD entries.
Additionally jobs may be larger in pieces. This won't fit in text. Fit it in longtext instead.
Screenshots (if appropriate)
Types of changes
Checklist