-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Hi,
So I'm in the process of doing latency-benchmarks for a couple of different web server frameworks. One of the frameworks is cohttp. I have the following code for the test:
(* This file is in the public domain *)
open Core.Std
open Async.Std
open Cohttp_async
(* given filename: hello_world.ml compile with:
$ corebuild hello_world.native -pkg cohttp.async
*)
let handler ~body:_ _sock req =
let uri = Cohttp.Request.uri req in
match Uri.path uri with
| "/" -> Server.respond_with_string "CHAPTER I. Down the Rabbit-Hole Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, <and what is the use of a book,> thought Alice <without pictures or conversations?> So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, <Oh dear! Oh dear! I shall be late!> (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again. The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well. Either the well was very deep, or she fell very slowly, for she had plenty of time as she went down to look about her and to wonder what was going to happen next. First, she tried to look down and make out what she was coming to, but it was too dark to see anything; then she looked at the sides of the well, and noticed that they were filled with cupboards......"
| _ ->
Server.respond_with_string ~code:`Not_found "Route not found"
let start_server port () =
eprintf "Listening for HTTP on port %d\n" port;
eprintf "Try 'curl http://localhost:%d/'\n%!" port;
Cohttp_async.Server.create ~on_handler_error:`Raise
(Tcp.on_port port) handler
>>= fun _ -> Deferred.never ()
let () =
Command.async_basic
~summary:"Start a hello world Async server"
Command.Spec.(empty +>
flag "-p" (optional_with_default 8080 int)
~doc:"int Source port to listen on"
) start_server
|> Command.run
and it is built with:
#!/bin/sh
corebuild main.native -pkg cohttp.async
echo "Starting server"
ASYNC_CONFIG='((epoll_max_ready_events 10240) (max_num_open_file_descrs 32000) (max_num_threads 50) (file_descr_watcher Epoll) (max_num_jobs_per_priority_per_cycle 500))' ./main.native
There are 2 servers, one load-generating, and one running the cohttp server, over a 1gigabit line (iperf shows 953mbit throughput), with no latency whatsoever (There is a switch between, but that is all).
The servers are linux:
Linux lady-of-pain 3.19.3-3-ARCH #1 SMP PREEMPT Wed Apr 8 14:10:00 CEST 2015 x86_64 GNU/Linux
fairly recent Core i7's, no virtualization since it just slows you down.
Some sysctl.conf tuning were necessary:
fs.file-max = 65535
fs.nr_open = 65535
net.core.netdev_max_backlog = 4096
net.core.rmem_max = 16777216
net.core.somaxconn = 65535
net.core.wmem_max = 16777216
net.ipv4.ip_forward = 0
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_max_syn_backlog = 20480
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
vm.min_free_kbytes = 65536
vm.overcommit_memory = 1
Async is the recent point-release and I'm forcing Epoll support because the test just fails when it runs in a select(2) loop.
The load generator is wrk2 (github.com/giltene/wrk2) which avoids coordinated omission. Most load generators coordinate: if a connection stalls towards the system-under-test (SUT), then no further requests are issued on that connection until the first one completes. You get one "bad" latency number. In wrk2, the rate is kept stable: at an interval, new requests are planned to be issued on the stalling connection, so one can get more realistic latencies from stalled connections.
We run the following:
#!/bin/sh
exec wrk2 \
-t8 -c10000 -d60S \
--timeout 2000 \
-R 30000 --latency \
-H 'Connection: keep-alive' \
http://172.16.0.1:$1/
That is, 10k connections, and a rate of 30k req/s which means 3 req/s per connection. Such a run yields the following latency histogram:
Running 1m test @ http://172.16.0.1:8080/
8 threads and 10000 connections
Thread calibration: mean lat.: 1359.711ms, rate sampling interval: 4898ms
Thread calibration: mean lat.: 1310.689ms, rate sampling interval: 4870ms
Thread calibration: mean lat.: 1312.983ms, rate sampling interval: 4866ms
Thread calibration: mean lat.: 1378.042ms, rate sampling interval: 4874ms
Thread calibration: mean lat.: 1371.283ms, rate sampling interval: 4898ms
Thread calibration: mean lat.: 1286.566ms, rate sampling interval: 4861ms
Thread calibration: mean lat.: 1276.854ms, rate sampling interval: 4861ms
Thread calibration: mean lat.: 1302.831ms, rate sampling interval: 4870ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.88s 5.44s 0.95m 84.10%
Req/Sec 1.59k 151.82 1.84k 71.88%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 8.20s
75.000% 10.35s
90.000% 11.79s
99.000% 31.88s
99.900% 0.90m
99.990% 0.94m
99.999% 0.95m
100.000% 0.95m
Detailed Percentile spectrum:
Value Percentile TotalCount 1/(1-Percentile)
2670.591 0.000000 3 1.00
3868.671 0.100000 55526 1.11
5189.631 0.200000 111201 1.25
6111.231 0.300000 166684 1.43
7385.087 0.400000 222341 1.67
8204.287 0.500000 277984 2.00
8716.287 0.550000 305353 2.22
9109.503 0.600000 333100 2.50
9510.911 0.650000 361000 2.86
9953.279 0.700000 389927 3.33
10346.495 0.750000 416615 4.00
10551.295 0.775000 430281 4.44
10764.287 0.800000 444321 5.00
10977.279 0.825000 458356 5.71
11206.655 0.850000 472177 6.67
11517.951 0.875000 485909 8.00
11649.023 0.887500 492901 8.89
11788.287 0.900000 500109 10.00
12017.663 0.912500 506582 11.43
14565.375 0.925000 513515 13.33
16875.519 0.937500 520495 16.00
18087.935 0.943750 523937 17.78
19480.575 0.950000 527398 20.00
20889.599 0.956250 530869 22.86
22167.551 0.962500 534331 26.67
23658.495 0.968750 537824 32.00
24870.911 0.971875 539535 35.56
26443.775 0.975000 541272 40.00
27672.575 0.978125 543007 45.71
28803.071 0.981250 544740 53.33
30031.871 0.984375 546473 64.00
30605.311 0.985938 547358 71.11
31113.215 0.987500 548217 80.00
31604.735 0.989062 549099 91.43
32063.487 0.990625 549946 106.67
32620.543 0.992188 550832 128.00
32980.991 0.992969 551254 142.22
33439.743 0.993750 551684 160.00
34078.719 0.994531 552126 182.86
34668.543 0.995313 552554 213.33
35422.207 0.996094 552984 256.00
35979.263 0.996484 553196 284.44
36863.999 0.996875 553416 320.00
37552.127 0.997266 553636 365.71
38338.559 0.997656 553851 426.67
39518.207 0.998047 554065 512.00
52789.247 0.998242 554171 568.89
53313.535 0.998437 554300 640.00
53477.375 0.998633 554390 731.43
53706.751 0.998828 554505 853.33
53936.127 0.999023 554604 1024.00
54099.967 0.999121 554666 1137.78
54231.039 0.999219 554719 1280.00
54394.879 0.999316 554779 1462.86
54525.951 0.999414 554829 1706.67
54657.023 0.999512 554878 2048.00
54755.327 0.999561 554912 2275.56
54820.863 0.999609 554934 2560.00
54919.167 0.999658 554961 2925.71
55017.471 0.999707 554986 3413.33
55148.543 0.999756 555013 4096.00
55214.079 0.999780 555025 4551.11
55312.383 0.999805 555041 5120.00
55410.687 0.999829 555052 5851.43
55574.527 0.999854 555066 6826.67
56131.583 0.999878 555080 8192.00
56229.887 0.999890 555087 9102.22
56295.423 0.999902 555092 10240.00
56393.727 0.999915 555102 11702.86
56459.263 0.999927 555108 13653.33
56557.567 0.999939 555117 16384.00
56557.567 0.999945 555117 18204.44
56590.335 0.999951 555121 20480.00
56623.103 0.999957 555124 23405.71
56688.639 0.999963 555128 27306.67
56721.407 0.999969 555132 32768.00
56721.407 0.999973 555132 36408.89
56754.175 0.999976 555134 40960.00
56786.943 0.999979 555137 46811.43
56786.943 0.999982 555137 54613.33
56819.711 0.999985 555138 65536.00
56852.479 0.999986 555141 72817.78
56852.479 0.999988 555141 81920.00
56852.479 0.999989 555141 93622.86
56852.479 0.999991 555141 109226.67
56885.247 0.999992 555143 131072.00
56885.247 0.999993 555143 145635.56
56885.247 0.999994 555143 163840.00
56918.015 0.999995 555144 187245.71
56918.015 0.999995 555144 218453.33
56918.015 0.999996 555144 262144.00
56950.783 0.999997 555145 291271.11
56950.783 0.999997 555145 327680.00
56950.783 0.999997 555145 374491.43
56950.783 0.999998 555145 436906.67
56950.783 0.999998 555145 524288.00
56983.551 0.999998 555146 582542.22
56983.551 1.000000 555146 inf
#[Mean = 8879.930, StdDeviation = 5441.720]
#[Max = 56950.784, Total count = 555146]
#[Buckets = 27, SubBuckets = 2048]
----------------------------------------------------------
730491 requests in 1.00m, 1.44GB read
Socket errors: connect 0, read 434, write 92, timeout 12
Requests/sec: 12175.28
Transfer/sec: 24.59MB
There are a few timeout errors, but note we can't really keep the request rate at what we want, as it is at 12k req/s. The single core on the SUT maxes out, and queuing buildup happens. I've attached the histogram plot (Note the x axis, which is compressed quite a lot so you can see the high latencies) and also included nim
which is another single-threaded solution making for a fair comparison. The go
solution uses all 8 cores, so it has considerably less work to do per core, i.e., that comparison isn't really fair. The wai
solution is a Haskell framework, but it also utilizes all 8 cores.
I have yet to try an Lwt solution with cohttp. It may perform totally different, but I'm not sure from where to cut it. I thought about starting off of the file server examples by Hannes, but I'm not really sure this is the way to go. Also, tuning of the OCaml system is something I'm very interested in.