Skip to content

Perf issue: High longtail response times with KeepAlive=true when running with(or without) workers #3487

@schneems

Description

@schneems

Describe the bug

When using ab with -k or using hey (a go based a/b network testing library) I see very high long tail response times that is an order of magnitude of what I would expect.

AB without keepalive:

$  ab -c 10 -n 200 127.0.0.1:3000/
...
Percentage of the requests served within a certain time (ms)
  50%    420
  66%    421
  75%    422
  80%    422
  90%    423
  95%    424
  98%    426
  99%    428
 100%    432 (longest request)

Longest request took 432ms

With keepalive:

$  ab -c 10 -n 200 -k 127.0.0.1:3000/
...
Percentage of the requests served within a certain time (ms)
  50%    205
  66%    205
  75%    206
  80%    206
  90%   1852
  95%   2258
  98%   2273
  99%   2450
 100%   2477 (longest request)

Longest request took 2,477ms (or 5.7x longer than without keepalive).

Puma config:

$ cat Gemfile
source "https://rubygems.org"

gem "puma"
gem "rack"
⛄️ 3.3.1 🚀 /private/tmp/e3f9bf2d7f5e861b6de95d7e48a1b5b0
$ cat config.ru
require 'rack'

response = ['200', {'Content-Type' => 'text/html'}, ['Hello, World!']]
app = Proc.new do |env|
  sleep 0.2
  response
end

run app
$ bundle exec puma -t5 -p 3000
Puma starting in single mode...
* Puma version: 6.4.2 (ruby 3.3.1-p55) ("The Eagle of Durango")
*  Min threads: 5
*  Max threads: 5
*  Environment: development
*          PID: 7434
* Listening on http://0.0.0.0:3000
Use Ctrl-C to stop

To Reproduce

  • Use the above example app that sleeps for 200ms
  • Boot with bundle exec puma -t5 -p 3000
  • Hit with AB and keepalive enabled or hey
$ hey -c 10 -n 200 http://localhost:3000/

Summary:
  Total:	8.3877 secs
  Slowest:	2.4628 secs
  Fastest:	0.2005 secs
  Average:	0.3612 secs
  Requests/sec:	23.8445

  Total data:	2600 bytes
  Size/request:	13 bytes

Response time histogram:
  0.201 [1]	|
  0.427 [184]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.653 [0]	|
  0.879 [0]	|
  1.105 [0]	|
  1.332 [0]	|
  1.558 [0]	|
  1.784 [0]	|
  2.010 [0]	|
  2.237 [0]	|
  2.463 [15]	|■■■

Extra observations

  • Enabling workers makes the problem go away
  • Disabling keepalive makes the problem go away

In server.rb the codepaths diverge

puma/lib/puma/server.rb

Lines 467 to 469 in 796d8c6

case handle_request(client, requests + 1)
when false
break
and that could be an okay place to look. Also within prepare_response() in lib/puma/request.rb. The return value of true indicates keepalive and false indicates no keepalive.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions