Categorygithub.com/little-bear-labs/libp2p-go-webrtc-benchmarks
modulepackage
0.0.0-20240131165146-9e5a5ea409bf
Repository: https://github.com/little-bear-labs/libp2p-go-webrtc-benchmarks.git
Documentation: pkg.go.dev

# README

WebRTC Transport Benchmarks

This directory contains a benchmarking tool and instructions how to use it, to measure the performance of the WebRTC transport.

1. Instructions

In this section we'll show you how to run this benchmarking tool on your local (development) machine.

  1. Run a listener
  2. Run a client

What you do next to this depends on what you're after.

  • Are you using it to get metrics from a standard and well defined cloud run?
  • Are you using it to get metrics from your local machine?
  • Are you using it to (Go) profile one or multiple things?

With that in mind, we'll show you how to do all of the above.

1.1. Listener

Run:

go run ./main.go -metrics csv listen

This should output a multiaddr which can be used by the client to connect. Other transport values supported instead of webrtc are: tcp, quic, websocket and webtransport.

The listener will continue to run until you kill it.

1.1.1. Metrics

The metrics can be summarized using the report command:

go run ./main.go report -s 16 metrics_listen_webrtc_c2_s8_e1_p0.csv

Which will print the result to the stdout of your terminal. Or you can visualize them using the bundled python script:

./scripts/visualise/visualise.py metrics_listen_webrtc_c2_s8_e1_p0.csv -s 16

Which will open a new window with your graph in it.

More useful is however to save it to a file so we can share it. For the WebRTC results of Scenario 1 we might for example use the following command:

 ./scripts/visualise/visualise.py \
    -s 10000 \
    -o ./images/s1_webrtc.png \
    ./results/metrics_dial_webrtc_c10_s100_p0.csv \
    ./results/metrics_listen_webrtc_e1_p0.csv

1.2. Client

Run:

go run ./main.go -c 2 -s 8 dial <multiaddr>

You can configure the number of streams and connections opened by the dialer using opt-in flags.

The client will continue to run until you kill it.

Tip:

similar to the listen command you can also use the -metrics <path>.csv flag to output the metrics to a file.

1.3. Profile

Profiling the benchmark tool is supported using the Golang std pprof tool.

E.g. you can start your listener (or client) with the -profile 6060 flag to enable profiling over http.

With your listener/client running you can then profile using te std golang tool, e.g.:

# get cpu profile
go tool pprof http://localhost:6060/debug/pprof/profile

# get memory (heap) profile
go tool pprof http://localhost:6060/debug/pprof/heap

# check contended mutexes
go tool pprof http://localhost:6060/debug/pprof/mutex

# check why threads block
go tool pprof http://localhost:6060/debug/pprof/block

# check the amount of created goroutines
go tool pprof http://localhost:6060/debug/pprof/goroutine

It will open an interactive window allowing you to inspect the heap/cpu profile, e.g. to see te top offenders of your own code by focussing on the relevant module (e.g. top github.com/libp2p/go-libp2p/p2p/transport/webrtc).

And of course you can also use the -pdf flag to output it to a file instead that you can view in your browser or any other capable pdf viewer.

2. Benchmarks

The goal of this tooling was to be able to benchmark how the WebRTC transport performs on its own as well as compared to other transports such as QUIC and WebTransport. Not all scenarios which are benchmarked are compatible with the different transports, but WebRTC is tested on all benchmarked scenarios.

The scenarios described below and the results you'll find at the end are ran on / come from two c5 large EC2 instances. Each instance has 8 vCPUs and 16GB RAM. More information can be found at: https://aws.amazon.com/ec2/instance-types/c5/

Dream goal for WebRTC in terms of performance is to consume 2x or less resources compared to quic. For Scenario 2 the results are currently as follows when comparing WebRTC to quic:

Scenario 2 — WebRTC and Quic — CPU

Scenario 2 — WebRTC and Quic — Memory

Scenario 1:

  1. Server, on EC2 instance A, listens on a generated multi address.
  2. Client, on EC2 instance B, dials 10 connections, with 1000 streams per connection to the server.

Scenario 2:

  1. Server, on EC2 instance A, listens on a generated multi address.
  2. Client, on EC2 instance B, dials 100 connections, with 100 streams per connection to the server.

For both scenarios the following holds true:

  • Connections are ramped up at the rate of 1 connection/sec.
  • Streams are created at the rate of 10 streams/sec.
  • This is done to ensure the webrtc transport's inflight request limiting does not start rejecting connections.
  • The client opens streams to the server and runs the echo protocol writing 2KiB/s per stream (1 KiB every 500ms).
  • We let the tests run for about 5 minute each.

The instances are running each scenario variation one by one, as such there at any given moment only one benchmark script running.

2.1. Scenario 1

Server:

go run ./scripts/multirunner listen

Client:

go run ./scripts/multirunner dial

2.1.1. Results

All transports in function of CPU and Memory

Scenario 1 — All CPU

Scenario 1 — All Memory

TCP

Scenario 1 — TCP

s1_tcp_dial.csvs1_tcp_listen.csv
CPU (%)
min00
max03
avg01
Memory Heap (MiB)
min0.00067.151
max143.7470.000
avg0.000103.907
Bytes Read (KiB)
min2527.0000.000
max0.0002590.000
avg0.0002588.290
Bytes Written (KiB)
min2527.0000.000
max0.0002590.000
avg0.0002588.290

WebSocket (WS)

Scenario 1 — WebSocket

s1_websocket_dial.csvs1_websocket_listen.csv
CPU (%)
min03
max05
avg03
Memory Heap (MiB)
min67.8910.000
max0.000146.493
avg0.000106.615
Bytes Read (KiB)
min0.0002473.000
max0.0002590.000
avg0.0002587.361
Bytes Written (KiB)
min0.0002473.000
max0.0002590.000
avg0.0002587.361

WebRTC

Scenario 1 — WebRTC

s1_webrtc_dial.csvs1_webrtc_listen.csv
CPU (%)
min05
max1010
avg56
Memory Heap (MiB)
min270.316265.111
max556.074527.543
avg426.373393.691
Bytes Read (KiB)
min0.0002398.000
max2482.0002482.000
avg2134.7032478.026
Bytes Written (KiB)
min0.0002398.000
max2482.0002521.000
avg2140.3962478.026

2.2. Scenario 2

Server:

go run ./scripts/multirunner listen

Client:

go run ./scripts/multirunner -s 1 dial

2.2.1. Results

All transports in function of CPU and Memory

Scenario 2 — All CPU

Scenario 2 — All Memory

TCP

Scenario 2 — TCP

s2_tcp_dial.csvs2_tcp_listen.csv
CPU (%)
min10
max74
avg12
Memory Heap (MiB)
min23.21022.941
max126.677143.747
avg85.91790.692
Bytes Read (KiB)
min9.0000.000
max2612.0002580.000
avg2480.4702473.094
Bytes Written (KiB)
min9.0000.000
max2681.0002581.000
avg2509.7582473.094

WebSocket (WS)

Scenario 2 — WebSocket

s2_websocket_dial.csvs2_websocket_listen.csv
CPU (%)
min20
max69
avg34
Memory Heap (MiB)
min23.79023.415
max115.189152.205
avg71.23596.166
Bytes Read (KiB)
min10.0000.000
max2590.0002590.000
avg2484.5132492.197
Bytes Written (KiB)
min10.0000.000
max2693.0002590.000
avg2521.5832484.513

WebRTC

Scenario 2 — WebRTC

s2_webrtc_dial.csvs2_webrtc_listen.csv
CPU (%)
min00
max105
avg14
Memory Heap (MiB)
min27.32424.450
max281.883184.007
avg202.546126.187
Bytes Read (KiB)
min0.0000.000
max2410.0002468.000
avg737.8782315.657
Bytes Written (KiB)
min0.0000.000
max2467.0002511.000
avg2315.657740.010

QUIC

Scenario 2 — QUIC

s2_quic_dial.csvs2_quic_listen.csv
CPU (%)
min00
max117
avg21
Memory Heap (MiB)
min27.50624.056
max197.098185.402
avg96.26085.729
Bytes Read (KiB)
min0.0000.000
max2588.0002598.000
avg2139.3802484.218
Bytes Written (KiB)
min0.0000.000
max2699.0002598.000
avg2155.8072484.218

WebTransport

Scenario 2 — WebTransport

s2_webtransport_listen.csvs2_webtransport_dial.csv
CPU (%)
min02
max48
avg04
Memory Heap (MiB)
min22.98422.773
max79.51889.429
avg47.08855.111
Bytes Read (KiB)
min0.00011.000
max2590.0002590.000
avg290.6941963.886
Bytes Written (KiB)
min11.0000.000
max2591.0002692.000
avg1991.272290.694

# Packages

the runner code for running a benchmarking process, either in dial or listen mode note that this benchmark runner codebase is not written using best practices but more as a quick iteration of the original benchmark code (which itself is based on the echo example), with the goal of being able to run it as a single benchmark, as well as using the orchestraed runner script used to generate all the results in the README found in the parent folder.
No description provided by the author
No description provided by the author