Skip to content

Conversation

vadasambar
Copy link
Contributor

@vadasambar vadasambar commented Aug 5, 2019

details

  • add "deploytimeout" flag to specify timeout in seconds

Fix for #1213


This change is Reviewable

details
-----------
- add "deploytimeout" flag to specify timeout in seconds
return nil, errors.New("failed to create deployment within timeout window")

// this error appears in the executor pod logs
timeoutError := fmt.Sprintf("failed to create deployment within timeout window of %d seconds", waitTimeOut)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use fmt.Errorf here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have made the change. Thank you :)

details
----------
- Errorf reads better in this case instead of errors.New
Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @bhavin192 and @vadasambar)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

		MaxScale         int
		TargetCPUPercent int
		Timeout          int

The term Timeout is too general for many cases and may confuse with function response timeout.
It's better to describe the purpose of the timeout at field name.
How about renaming it to SpecializationTimeout?


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

func (deploy *NewDeploy) waitForDeploy(depl *v1beta1.Deployment, replicas int32, waitTimeOut int) (*v1beta1.Deployment, error) {
	for i := 0; i < waitTimeOut; i++ {

We should be careful every time we add a field since it may break the compatibility of existing CRDs.
All existing ExecutionStrategy doesn't contain the timeout setting and the default value of int in Go is 0 , so the value will be zero for all existing function setting and break user service once they upgrade the fission.
Please check the timeout to ensure that it's >=0. If not then use the default one.


pkg/fission-cli/function.go, line 154 at r2 (raw file):

		if c.IsSet("deploytimeout") {
			timeout = c.Int("deploytimeout")
			if timeout < 0 {

Consider that kubernetes takes time for creating, pulling images, scheduling pod to nodes and time for fetcher to specialize pod, it will never be zero.
Since this flag is for the functions that take a long specialization and people normally don't need to set this except for the case described in the issue, I suggest for now we should set the minimum specialization timeout to 120 (the original one).


pkg/fission-cli/main.go, line 97 at r2 (raw file):

	maxScale := cli.IntFlag{Name: "maxscale", Usage: "Maximum number of pods (Uses resource inputs to configure HPA)"}
	targetcpu := cli.IntFlag{Name: "targetcpu", Usage: "Target average CPU usage percentage across pods for scaling"}
	deployTimeoutFlag := cli.IntFlag{Name: "deploytimeout", Usage: "Timeout for newdeploy function creation"}

Since we may need to implement the same thing to poolmanager as well, so if the flag name is something like specializationtimeout, st then we don't need to add another flag for poolmanager in future.

other
--------
- add default newdeploy timeout for rest of the test cases
Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @bhavin192 and @life1347)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

SpecializationTimeout

This name makes more sense. Let me rename them :)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

We should be careful every time we add a field since it may break the compatibility of existing CRDs.
All existing ExecutionStrategy doesn't contain the timeout setting and the default value of int in Go is 0 , so the value will be zero for all existing function setting and break user service once they upgrade the fission.
Please check the timeout to ensure that it's >=0. If not then use the default one.

I have set the default time out to 120 seconds. Please check DEFAULT_DEPLOY_TIMEOUT in pkg/fission-cli/function.go


pkg/fission-cli/function.go, line 154 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Consider that kubernetes takes time for creating, pulling images, scheduling pod to nodes and time for fetcher to specialize pod, it will never be zero.
Since this flag is for the functions that take a long specialization and people normally don't need to set this except for the case described in the issue, I suggest for now we should set the minimum specialization timeout to 120 (the original one).

This piece of code ensures that the timeout set through cli is a non-negative number. Default specialization timeout has been set to 120 in pkg/fission-cli/function.go (please check DEFAULT_DEPLOY_TIMEOUT)


pkg/fission-cli/main.go, line 97 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Since we may need to implement the same thing to poolmanager as well, so if the flag name is something like specializationtimeout, st then we don't need to add another flag for poolmanager in future.

Makes sense. Let me change the name to specializationtimeout. Thank you :)

Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

I have set the default time out to 120 seconds. Please check DEFAULT_DEPLOY_TIMEOUT in pkg/fission-cli/function.go

But it only affects the newly created functions, what about the old existing functions?
They don't have timeout field.


pkg/fission-cli/function.go, line 154 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

This piece of code ensures that the timeout set through cli is a non-negative number. Default specialization timeout has been set to 120 in pkg/fission-cli/function.go (please check DEFAULT_DEPLOY_TIMEOUT)

I know you've added DEFAULT_DEPLOY_TIMEOUT, but what if users enter --specializationtimeout 1, isn't it overrides the default one? Then the function will never get up since it doesn't have enough time to do things describe above, right? You have to check whether the input value is a reasonable value and prompt user the correct message.

if timeout < DEFAULT_DEPLOY_TIMEOUT {
    return nil, errors.New("deploytimeout must be greater than or equal to 120")
}

pkg/fission-cli/function.go, line 453 at r3 (raw file):

}

func fnUpdate(c *cli.Context) error {

We should allow users to update timeout setting in fnUpdate as well.

Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

SpecializationTimeout

This name makes more sense. Let me rename them :)

Please also add the validation to

func (is InvokeStrategy) Validate() error {
to make sure that the timeout setting that the user provided is a reasonable value.

Suraj Banakar added 3 commits August 7, 2019 17:30
details
----------
- this is to make sure existing functions have a default specialization timeout
details
-----------
- 120 seconds is the minimum specialization timeout
- make changes to reflect the same in tests
Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions (waiting on @bhavin192 and @life1347)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Please also add the validation to

func (is InvokeStrategy) Validate() error {
to make sure that the timeout setting that the user provided is a reasonable value.

I will add the validation for SpecializationTimeout. Thank you.


pkg/executor/newdeploy/newdeploy.go, line 434 at r1 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. I have made the change. Thank you :)

Done.


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

But it only affects the newly created functions, right? What about the old existing functions?
They don't have timeout field.

Makes sense. I will add a default specialization timeout here as well. Something similar to cli flag validation

if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
		specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
	}

Thanks for pointing it out! Please let me know if there's any other change.


pkg/fission-cli/function.go, line 154 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

I know you've added DEFAULT_DEPLOY_TIMEOUT, but what if users enter --specializationtimeout 1, isn't it overrides the default one? Then the function will never get up since it doesn't have enough time to do things describe above, right? You have to check whether the input value is a reasonable value and prompt user the correct message.

if timeout < DEFAULT_DEPLOY_TIMEOUT {
    return nil, errors.New("deploytimeout must be greater than or equal to 120")
}

Thank you for pointing it out! I will change the minimum timeout to 120.


pkg/fission-cli/function.go, line 453 at r3 (raw file):

Validate
Makes sense. I will work on this.


pkg/fission-cli/main.go, line 97 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. Let me change the name to specializationtimeout. Thank you :)

Done.

@vadasambar
Copy link
Contributor Author

If there are any more changes, please let me know. Thank you :)

other
---------
- do not allow setting 'specializationtimeout' if executortype is not of the type 'newdeploy'
@vadasambar
Copy link
Contributor Author

Travis Build has failed due to network timeout. Can someone please trigger a rebuild?

Timeout waiting for network availability.

@vishal-biyani vishal-biyani added this to the 1.5 milestone Aug 8, 2019
Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. I will add a default specialization timeout here as well. Something similar to cli flag validation

if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
		specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
	}

Thanks for pointing it out! Please let me know if there's any other change.

I suggest moving the code block in different places

// if no specializationTimeout is set, use default value
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
	specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
}

into this function (waitForDeploy ). So that people don't need to write the same thing again before calling waitForDeploy.

Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @bhavin192 and @life1347)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

I suggest moving the code block in different places

// if no specializationTimeout is set, use default value
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
	specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
}

into this function. So that people don't need to write the same thing again before calling waitForDeploy.

Done

@life1347
Copy link
Member

life1347 commented Aug 13, 2019

From the test_spec_merge error log, looks like we have to comment out the validation part and shows warning for now to prevent compatibility issue.

test_specs/test_spec_merge/test_spec_merge.sh.log
========== start test_specs/test_spec_merge/test_spec_merge.sh.log ==========
~/gopath/src/github.com/fission/fission/test/tests/test_specs/test_spec_merge ~/gopath/src/github.com/fission/fission
Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: Poolsize can only be configured when environment version equals to 3, default poolsize 3 will be used for creating environment pool.
Fatal error: Failed to validate specs: 1 error occurred:
	* ExecutionStrategy.SpecializationTimeout: Invalid value: 0: [SpecializationTimeout must be a value equal to or greater than 120]
Cleaning up...
~/gopath/src/github.com/fission/fission
Fatal error: Spec directory specs doesn't exist. Please check directory path or run "fission spec init" to create it.

Here is the diff I've made, you may be able to apply it directly. Changes in diff

  1. Comment out specializationtimeout validation
  2. Shows warning when running fission spec apply without aborting the task.
diff --git a/pkg/apis/fission.io/v1/validation.go b/pkg/apis/fission.io/v1/validation.go
index 085a9ca8..4eddaf6d 100644
--- a/pkg/apis/fission.io/v1/validation.go
+++ b/pkg/apis/fission.io/v1/validation.go
@@ -347,9 +347,10 @@ func (es ExecutionStrategy) Validate() error {
 			result = multierror.Append(result, MakeValidationErr(ErrorInvalidValue, "ExecutionStrategy.TargetCPUPercent", es.TargetCPUPercent, "TargetCPUPercent must be a value between 1 - 100"))
 		}
 
-		if es.SpecializationTimeout < 120 {
-			result = multierror.Append(result, MakeValidationErr(ErrorInvalidValue, "ExecutionStrategy.SpecializationTimeout", es.SpecializationTimeout, "SpecializationTimeout must be a value equal to or greater than 120"))
-		}
+		// TODO Add validation warning
+		//if es.SpecializationTimeout < 120 {
+		//	result = multierror.Append(result, MakeValidationErr(ErrorInvalidValue, "ExecutionStrategy.SpecializationTimeout", es.SpecializationTimeout, "SpecializationTimeout must be a value equal to or greater than 120"))
+		//}
 	}
 
 	return result.ErrorOrNil()
diff --git a/pkg/fission-cli/spec.go b/pkg/fission-cli/spec.go
index ec9ef25a..736aa729 100644
--- a/pkg/fission-cli/spec.go
+++ b/pkg/fission-cli/spec.go
@@ -437,6 +437,10 @@ func (fr *FissionResources) validate(c *cli.Context) error {
 		if _, ok := environments[fmt.Sprintf("%s:%s", f.Spec.Environment.Name, f.Spec.Environment.Namespace)]; !ok {
 			log.Warn(fmt.Sprintf("Environment %s is referenced in function %s but not declared in specs", f.Spec.Environment.Name, f.Metadata.Name))
 		}
+		strategy := f.Spec.InvokeStrategy.ExecutionStrategy
+		if strategy.ExecutorType == fv1.ExecutorTypeNewdeploy && strategy.SpecializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
+			log.Warn(fmt.Sprintf("SpecializationTimeout in function spec.InvokeStrategy.ExecutionStrategy should be a value equal to or greater than %v", DEFAULT_SPECIALIZATION_TIMEOUT))
+		}
 	}
 
 	// (ErrorOrNil returns nil if there were no errors appended.)

After applying the diff, result becomes

Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: Poolsize can only be configured when environment version equals to 3, default poolsize 3 will be used for creating environment pool.
Warning: SpecializationTimeout in function spec.InvokeStrategy.ExecutionStrategy should be a value equal to or greater than 120

details/other
-------------------
- this is to prevent test_spec_merge from failing
- add warning if specializationtimeout is lower than default value
@vadasambar
Copy link
Contributor Author

I could apply the diff without any problems.

@life1347
Copy link
Member

CI result with kubernetes integration test: https://travis-ci.org/fission/fission/builds/571724467
LGTM, I will merge this PR later today.


if c.IsSet("specializationtimeout") {
if c.String("executortype") != types.ExecutorTypeNewdeploy {
log.Fatal("specializationtimeout must be greater than or equal to 120 seconds")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I suddenly notice that the message here is wrong, please correct it.

Copy link
Contributor Author

@vadasambar vadasambar Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed that. Thank you for pointing it out. I have fixed it.

@@ -94,6 +94,7 @@ func NewCliApp() *cli.App {
minScale := cli.IntFlag{Name: "minscale", Usage: "Minimum number of pods (Uses resource inputs to configure HPA)"}
maxScale := cli.IntFlag{Name: "maxscale", Usage: "Maximum number of pods (Uses resource inputs to configure HPA)"}
targetcpu := cli.IntFlag{Name: "targetcpu", Usage: "Target average CPU usage percentage across pods for scaling"}
specializationTimeoutFlag := cli.IntFlag{Name: "specializationtimeout, st", Usage: "Timeout for newdeploy function creation"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage: "Timeout for newdeploy to wait for function pod creation"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@life1347 life1347 merged commit 8091056 into fission:master Aug 16, 2019
@life1347
Copy link
Member

Merged! @vadasambar Thanks for the PR! And the changes' detail of each commit is great.

@vadasambar
Copy link
Contributor Author

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants