PoolManager pod specialization doesn't work as expected during a peak of simultaneous requests.

**Fission/Kubernetes version**

<pre>
$ fission --version
client:
  fission/core:
    BuildDate: "2021-12-29T03:57:30Z"
    GitCommit: 327275d1
    Version: v1.15.1
server:
  fission/core:
    BuildDate: "2021-12-29T03:57:30Z"
    GitCommit: 327275d1
    Version: v1.15.1

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.4-eks-6b7464", GitCommit:"6b746440c04cb81db4426842b4ae65c3f7035e53", GitTreeState:"clean", BuildDate:"2021-03-19T19:35:50Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
n
</pre>

**Kubernetes platform**: Docker Desktop

**Describe the bug**
The problem occurs when many requests to the same function occur simultaneously and it is the first time this function is being called. This will trigger in parallel the specialization of several pods for the same function, instead of handling multiple requests on the same function pod honoring the `rpp` configuration.

**To Reproduce**
- Create an environment (e.g. go).
- Create a function (e.g. go hello world) with PoolManager executor and `rpp=5`.
- Let’s consider the following basic script:
<pre>
cat test.sh
time fission fn test --name f2 &
time fission fn test --name f2 &
time fission fn test --name f2 &
time fission fn test --name f2 &
time fission fn test --name f2 &
</pre>
- Initially the warm pool have 3 pods:
<pre>
kubectl get pods -n fission-function | grep Running
poolmgr-go-default-407982-6b66d6bb45-4lrfz   2/2     Running       0          6m41s
poolmgr-go-default-407982-6b66d6bb45-9vwfk   2/2     Running       0          6m39s
poolmgr-go-default-407982-6b66d6bb45-c59fr   2/2     Running       0          6m37s
</pre>
- The script will run in background 5 calls to the function:
<pre>
./test.sh
Hello, world!

real    0m0.319s
user    0m0.037s
sys     0m0.036s
Hello, world!

real    0m0.324s
user    0m0.028s
sys     0m0.041s
Hello, world!

real    0m0.330s
user    0m0.072s
sys     0m0.000s
Hello, world!

real    0m2.185s
user    0m0.059s
sys     0m0.017s
Hello, world!

real    0m3.479s
user    0m0.031s
sys     0m0.046s
</pre>
- What we have seen is that 3 pods from the warm pool have been specialized and 3 function requests didn’t suffer cold starts (the first 3), but then no more pods were available on the warm pool and the remaining 2 requests had to wait for more pods to start and suffered a cold start.
<pre>
kubectl get pods -n fission-function | grep Running
poolmgr-go-default-407982-6b66d6bb45-4lrfz   2/2     Running       0          8m14s
poolmgr-go-default-407982-6b66d6bb45-4tckz   2/2     Running       0          41s
poolmgr-go-default-407982-6b66d6bb45-9vwfk   2/2     Running       0          8m12s
poolmgr-go-default-407982-6b66d6bb45-c59fr   2/2     Running       0          8m10s
poolmgr-go-default-407982-6b66d6bb45-djbph   2/2     Running       0          41s
poolmgr-go-default-407982-6b66d6bb45-k67xk   2/2     Running       0          38s
poolmgr-go-default-407982-6b66d6bb45-nwdhk   2/2     Running       0          41s
poolmgr-go-default-407982-6b66d6bb45-qndrn   2/2     Running       0          39s
</pre>
- A total of 8 pods have been used! One for each request to the function, plus 3 pods that were started to keep the pool with size of 3. This behavior was not expected and it happens when multiple requests for the same function happen at the exact same time and there aren’t any specialized pod for the function. Then, Fission triggers in parallel the specialization of pods for the same function and it doesn't honor the `rpp` parameter.

**Expected result**
The expected result was the handling of requests honoring the `rpp`. 

**Actual result**
The actual result is the exhaustion of the pool of pods with just one requests per pod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PoolManager pod specialization doesn't work as expected during a peak of simultaneous requests. #2452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PoolManager pod specialization doesn't work as expected during a peak of simultaneous requests. #2452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions