-
Notifications
You must be signed in to change notification settings - Fork 789
Make NewDeployment wait timeout configurable #1260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make NewDeployment wait timeout configurable #1260
Conversation
details ----------- - add "deploytimeout" flag to specify timeout in seconds
pkg/executor/newdeploy/newdeploy.go
Outdated
return nil, errors.New("failed to create deployment within timeout window") | ||
|
||
// this error appears in the executor pod logs | ||
timeoutError := fmt.Sprintf("failed to create deployment within timeout window of %d seconds", waitTimeOut) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use fmt.Errorf
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have made the change. Thank you :)
details ---------- - Errorf reads better in this case instead of errors.New
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @bhavin192 and @vadasambar)
pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):
MaxScale int TargetCPUPercent int Timeout int
The term Timeout
is too general for many cases and may confuse with function response timeout.
It's better to describe the purpose of the timeout at field name.
How about renaming it to SpecializationTimeout
?
pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):
func (deploy *NewDeploy) waitForDeploy(depl *v1beta1.Deployment, replicas int32, waitTimeOut int) (*v1beta1.Deployment, error) { for i := 0; i < waitTimeOut; i++ {
We should be careful every time we add a field since it may break the compatibility of existing CRDs.
All existing ExecutionStrategy
doesn't contain the timeout setting and the default value of int in Go is 0 , so the value will be zero for all existing function setting and break user service once they upgrade the fission.
Please check the timeout to ensure that it's >=0
. If not then use the default one.
pkg/fission-cli/function.go, line 154 at r2 (raw file):
if c.IsSet("deploytimeout") { timeout = c.Int("deploytimeout") if timeout < 0 {
Consider that kubernetes takes time for creating, pulling images, scheduling pod to nodes and time for fetcher to specialize pod, it will never be zero.
Since this flag is for the functions that take a long specialization and people normally don't need to set this except for the case described in the issue, I suggest for now we should set the minimum specialization timeout to 120 (the original one).
pkg/fission-cli/main.go, line 97 at r2 (raw file):
maxScale := cli.IntFlag{Name: "maxscale", Usage: "Maximum number of pods (Uses resource inputs to configure HPA)"} targetcpu := cli.IntFlag{Name: "targetcpu", Usage: "Target average CPU usage percentage across pods for scaling"} deployTimeoutFlag := cli.IntFlag{Name: "deploytimeout", Usage: "Timeout for newdeploy function creation"}
Since we may need to implement the same thing to poolmanager as well, so if the flag name is something like specializationtimeout, st
then we don't need to add another flag for poolmanager in future.
other -------- - add default newdeploy timeout for rest of the test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @bhavin192 and @life1347)
pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):
SpecializationTimeout
This name makes more sense. Let me rename them :)
pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
We should be careful every time we add a field since it may break the compatibility of existing CRDs.
All existingExecutionStrategy
doesn't contain the timeout setting and the default value of int in Go is 0 , so the value will be zero for all existing function setting and break user service once they upgrade the fission.
Please check the timeout to ensure that it's>=0
. If not then use the default one.
I have set the default time out to 120 seconds. Please check DEFAULT_DEPLOY_TIMEOUT
in pkg/fission-cli/function.go
pkg/fission-cli/function.go, line 154 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
Consider that kubernetes takes time for creating, pulling images, scheduling pod to nodes and time for fetcher to specialize pod, it will never be zero.
Since this flag is for the functions that take a long specialization and people normally don't need to set this except for the case described in the issue, I suggest for now we should set the minimum specialization timeout to 120 (the original one).
This piece of code ensures that the timeout set through cli is a non-negative number. Default specialization timeout has been set to 120 in pkg/fission-cli/function.go (please check DEFAULT_DEPLOY_TIMEOUT
)
pkg/fission-cli/main.go, line 97 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
Since we may need to implement the same thing to poolmanager as well, so if the flag name is something like
specializationtimeout, st
then we don't need to add another flag for poolmanager in future.
Makes sense. Let me change the name to specializationtimeout
. Thank you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)
pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):
Previously, vadasambar (Suraj Banakar) wrote…
I have set the default time out to 120 seconds. Please check
DEFAULT_DEPLOY_TIMEOUT
in pkg/fission-cli/function.go
But it only affects the newly created functions, what about the old existing functions?
They don't have timeout field.
pkg/fission-cli/function.go, line 154 at r2 (raw file):
Previously, vadasambar (Suraj Banakar) wrote…
This piece of code ensures that the timeout set through cli is a non-negative number. Default specialization timeout has been set to 120 in pkg/fission-cli/function.go (please check
DEFAULT_DEPLOY_TIMEOUT
)
I know you've added DEFAULT_DEPLOY_TIMEOUT
, but what if users enter --specializationtimeout 1
, isn't it overrides the default one? Then the function will never get up since it doesn't have enough time to do things describe above, right? You have to check whether the input value is a reasonable value and prompt user the correct message.
if timeout < DEFAULT_DEPLOY_TIMEOUT {
return nil, errors.New("deploytimeout must be greater than or equal to 120")
}
pkg/fission-cli/function.go, line 453 at r3 (raw file):
} func fnUpdate(c *cli.Context) error {
We should allow users to update timeout setting in fnUpdate as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)
pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):
Previously, vadasambar (Suraj Banakar) wrote…
SpecializationTimeout
This name makes more sense. Let me rename them :)
Please also add the validation to
fission/pkg/apis/fission.io/v1/validation.go
Line 310 in 7bbcaf2
func (is InvokeStrategy) Validate() error { |
details ---------- - this is to make sure existing functions have a default specialization timeout
details ----------- - 120 seconds is the minimum specialization timeout - make changes to reflect the same in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions (waiting on @bhavin192 and @life1347)
pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
Please also add the validation to
to make sure that the timeout setting that the user provided is a reasonable value.fission/pkg/apis/fission.io/v1/validation.go
Line 310 in 7bbcaf2
func (is InvokeStrategy) Validate() error {
I will add the validation for SpecializationTimeout. Thank you.
pkg/executor/newdeploy/newdeploy.go, line 434 at r1 (raw file):
Previously, vadasambar (Suraj Banakar) wrote…
Makes sense. I have made the change. Thank you :)
Done.
pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
But it only affects the newly created functions, right? What about the old existing functions?
They don't have timeout field.
Makes sense. I will add a default specialization timeout here as well. Something similar to cli flag validation
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
}
Thanks for pointing it out! Please let me know if there's any other change.
pkg/fission-cli/function.go, line 154 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
I know you've added
DEFAULT_DEPLOY_TIMEOUT
, but what if users enter--specializationtimeout 1
, isn't it overrides the default one? Then the function will never get up since it doesn't have enough time to do things describe above, right? You have to check whether the input value is a reasonable value and prompt user the correct message.if timeout < DEFAULT_DEPLOY_TIMEOUT { return nil, errors.New("deploytimeout must be greater than or equal to 120") }
Thank you for pointing it out! I will change the minimum timeout to 120.
pkg/fission-cli/function.go, line 453 at r3 (raw file):
Validate
Makes sense. I will work on this.
pkg/fission-cli/main.go, line 97 at r2 (raw file):
Previously, vadasambar (Suraj Banakar) wrote…
Makes sense. Let me change the name to
specializationtimeout
. Thank you :)
Done.
If there are any more changes, please let me know. Thank you :) |
other --------- - do not allow setting 'specializationtimeout' if executortype is not of the type 'newdeploy'
Travis Build has failed due to network timeout. Can someone please trigger a rebuild? Timeout waiting for network availability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)
pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):
Previously, vadasambar (Suraj Banakar) wrote…
Makes sense. I will add a default specialization timeout here as well. Something similar to cli flag validation
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT { specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT }
Thanks for pointing it out! Please let me know if there's any other change.
I suggest moving the code block in different places
// if no specializationTimeout is set, use default value
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
}
into this function (waitForDeploy
). So that people don't need to write the same thing again before calling waitForDeploy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @bhavin192 and @life1347)
pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):
Previously, life1347 (Ta-Ching Chen) wrote…
I suggest moving the code block in different places
// if no specializationTimeout is set, use default value if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT { specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT }
into this function. So that people don't need to write the same thing again before calling
waitForDeploy
.
Done
From the test_spec_merge error log, looks like we have to comment out the validation part and shows warning for now to prevent compatibility issue.
Here is the diff I've made, you may be able to apply it directly. Changes in diff
After applying the diff, result becomes
|
details/other ------------------- - this is to prevent test_spec_merge from failing - add warning if specializationtimeout is lower than default value
I could apply the diff without any problems. |
CI result with kubernetes integration test: https://travis-ci.org/fission/fission/builds/571724467 |
pkg/fission-cli/function.go
Outdated
|
||
if c.IsSet("specializationtimeout") { | ||
if c.String("executortype") != types.ExecutorTypeNewdeploy { | ||
log.Fatal("specializationtimeout must be greater than or equal to 120 seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I suddenly notice that the message here is wrong, please correct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have noticed that. Thank you for pointing it out. I have fixed it.
pkg/fission-cli/main.go
Outdated
@@ -94,6 +94,7 @@ func NewCliApp() *cli.App { | |||
minScale := cli.IntFlag{Name: "minscale", Usage: "Minimum number of pods (Uses resource inputs to configure HPA)"} | |||
maxScale := cli.IntFlag{Name: "maxscale", Usage: "Maximum number of pods (Uses resource inputs to configure HPA)"} | |||
targetcpu := cli.IntFlag{Name: "targetcpu", Usage: "Target average CPU usage percentage across pods for scaling"} | |||
specializationTimeoutFlag := cli.IntFlag{Name: "specializationtimeout, st", Usage: "Timeout for newdeploy function creation"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage: "Timeout for newdeploy to wait for function pod creation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Merged! @vadasambar Thanks for the PR! And the changes' detail of each commit is great. |
Thank you :) |
details
Fix for #1213
This change is