package
0.0.0-20241223123140-b986ffcde9b0
Repository: https://github.com/cloudflare/cloudflare-blog.git
Documentation: pkg.go.dev

# README

Accept vs Epoll de-queueing order

The experiment showing the FIFO vs LIFO-like behavior of two ways of waiting for new connections on the shared accept queue. The balanced behavior of blocking-accept:

$ python blocking-accept.py &
$ for i in `seq 6`; do nc localhost 1024; done
2
1
0
2
1
0

The unbalanced LIFO-like behavior of blocking-EPOLLEXCLUSIVE-epoll and nonblocking-accept setup:

$ python epoll-and-accept.py &
$ for i in `seq 6`; do nc localhost 1024; done
0
0
0
0
0
0

Latency of requests in shared-queue vs reuseport nginx setup

In this experiment we try to show that even though the total work is the same, the latency distribution will suffer in high-load SO_REUSEPORT setup.

In this setup we need to pretend that there is some CPU-intensive work done by nginx. We do that by running a stupid loop inside request handler:

i = 0
for names = 1, 1000000 do
  i = i + 1
end
ngx.say("<p>hello, world</p>")

Note that we also run 12 nginx workers, and assume 24 logical cpus. To reproduce the shared-queue latency graph, run nginx.

$ nginx -c nginx-shared-queue.conf -p $PWD

We can confirm the CPU pinning and that indeed we have one shared accept queue:

$ for pid in $(pgrep nginx); do taskset -cp $pid; done
pid 9366's current affinity list: 0-23
pid 9367's current affinity list: 12
pid 9369's current affinity list: 13
pid 9370's current affinity list: 14
pid 9371's current affinity list: 15
pid 9372's current affinity list: 16
pid 9373's current affinity list: 17
pid 9374's current affinity list: 18
pid 9375's current affinity list: 19
pid 9376's current affinity list: 20
pid 9377's current affinity list: 21
pid 9378's current affinity list: 22
pid 9379's current affinity list: 23

$ ss -4nl -t 'sport = :8181' | cat
State      Recv-Q Send-Q        Local Address:Port          Peer Address:Port
LISTEN     0      511                       *:8181                     *:*

Now run the benchmark program from another server:

$ go build benchhttp.go
$ ./benchhttp -n 100000 -c 200 -r target:8181 http://a.a/ | cut -d " " -f 1 | ./mmhistogram -t "Duration in ms (shared queue)"
Duration in ms (shared queue) min:3.61 avg:30.39 med=30.28 max:72.65 dev:1.58 count:100000
Duration in ms (shared queue):
 value |-------------------------------------------------- count
     0 |                                                   0
     1 |                                                   0
     2 |                                                   1
     4 |                                                   16
     8 |                                                   67
    16 |************************************************** 91760
    32 |                                              **** 8155
    64 |                                                   1

The second experiment, to check the SO_REUSEPORT multi accept queue latency distribution:

$ nginx -c nginx-shared-queue.conf -p $PWD

$ ss -4nl -t 'sport = :8181' | cat
State      Recv-Q Send-Q        Local Address:Port          Peer Address:Port
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*
LISTEN     0      511                       *:8181                     *:*

The benchmark reuseport multiple queues benchmark:

$ ./benchhttp -n 100000 -c 200 -r target:8181 http://a.a/ | cut -d " " -f 1 | ./mmhistogram -t "Duration in ms (multiple queues)"
Duration in ms (multiple queues) min:1.49 avg:31.37 med=24.67 max:144.55 dev:25.27 count:100000
Duration in ms (multiple queues):
 value |-------------------------------------------------- count
     0 |                                                   0
     1 |                                                 * 1023
     2 |                                         ********* 5321
     4 |                                 ***************** 9986
     8 |                  ******************************** 18443
    16 |    ********************************************** 25852
    32 |************************************************** 27949
    64 |                              ******************** 11368
   128 |                                                   58