Categorygithub.com/Shopify/mybench
modulepackage
0.0.0-20240925011916-fa5e5d722458
Repository: https://github.com/shopify/mybench.git
Documentation: pkg.go.dev

# README

mybench

mybench is a benchmark authoring library that helps you create your own database benchmark with Golang. The central features of mybench includes:

  • A library approach to database benchmarking
  • Discretized precise rate control: the rate at which the events run is discretized to a relatively low frequency (default: 50hz), as Linux + Golang cannot reliably maintain 100~1000Hz. The number of events run on each iteration is determined by sampling an uniform or Poisson distribution. The rate control is very precise and have been achieved standard deviations of <0.2% of the desired rate.
  • Ability to parallelize a single workload into multiple goroutines, each with its own connection.
  • Ability to run multiple workloads simultaneously with data being logged from all workloads.
  • Uses HDR Histogram to keep track of latency online.
  • Web UI for live monitoring throughput and latency of the current benchmark.
  • A simple interface for implementing the data loader (which creates the tables and seed it with data) and the benchmark driver.
  • A number of built-in data generators, including thread-safe auto incrementing generators.
  • Command line wrapper: A wrapper library to help build command line apps for the benchmark.

Design

For more details, see the design doc. Some of the information in this section may eventually move there.

There are a few important structs defined in this library, and they are:

  • Benchmark: The main "entrypoint" to running a benchmark. This keeps track of multiple Workloads and performs data aggregation across all the Workloads and their BenchmarkWorkers.
  • WorkloadInterface: an interface that is defined by the end-user who want to create a benchmark. Notably, the end-user will implement an Event() function that should be called at a some specified EventRate (concurrently with a number of goroutines).
  • Workload: Responsible for creating and running the workers (goroutines) to call the Event() function of the WorkloadInterface.
  • BenchmarkWorker: Responsible for setting up the Looper and keeping track of the worker-local statistics (such as the event latency/histograms for the local goroutine).
  • Looper: Responsible for discretizing the desired event rate into something that's achievable on Linux. Actually calls the Event() function. It can also perform complex discretization such as Poisson-distribution based event sampling.
  • BenchmarkDataLoader: A data loader helper that helps you easily concurrently load data by specifying only a few options, such as the number of rows and the type of data generator for each columns.
  • BenchmarkApp[T]: A wrapper to help create a command line app for a benchmark.
  • Table: An object that helps you create the database and track a default set of data generators.

Data collection and flow

The benchmark system mainly collects data about the throughput and latency of the Event() function call, which contains custom logic (usually MySQL calls). Since Event() can be called from a large number of BenchmarkWorkers, each BenchmarkWorker collects its own statistics for performance reasons. The data collected by the BenchmarkWorkers are:

  • The count and rate of Event()
  • The latency distribution of Event() as tracked via the HDR Histogram.
  • Unimplemented:
    • How long the worker spent in "saturation" (i.e. Event() is slower than the requested event rate). This is probably an important metric for later.
    • The amount of time spent sleeping (could be useful to debug saturation problem in case the looper is incorrectly implemented).
    • Everything in OuterLoopStat: wakeup latency, event batch size. This is probably less important than the above.

Having all this data in hundreds of independent Goroutines (BenchmarkWorkers) is not particularly useful. The data must be aggregated. This data aggregation is done on the workload level by the Workload, which is then aggregated at the Benchmark level via the data logger. This description may make it sound like the data collection is initiated by the BenchmarkWorkers -- it is not. Instead, every few seconds, the data logger calls the appropriate functions to aggregate data. During data collection, a lock taken for each BenchmarkWorker, which allows for the safe reading of data. This is fine as each BenchmarkWorker has its own mutex and there's never a lot of contention. If this becomes a problem, lockless programming may be a better approach.

Run a benchmark

  • Shopify orders benchmark: make examplebench && build/examplebench -host mysql-1 -user sys.admin_rw -pass hunter2 -bench -eventrate 3000
    • Change the host
    • Change the event rate. The command above specifies 3000 events/s.
  • Go to https://localhost:8005 to see the monitoring web UI.

Write your own benchmark

See benchmarks for examples and read the docs.

# Packages

No description provided by the author
No description provided by the author

# Functions

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
See NewHistogramDistribution for documentation the arguments for this function.
Creates a histogram distribution which is used by Rand to generate random numbers.
See NewHistogramDistribution for documentation the arguments for this function.
See NewHistogramDistribution for documentation the arguments for this function.
See NewHistogramDistribution for documentation the arguments for this function.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Creates a new Rand object.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
length is the length of the string to be generated min and current are the integer values used to generate the strings.
No description provided by the author
NewUuidGenerator Only version 1 (timebased) and version 4 (random) supported.
No description provided by the author
No description provided by the author
Runs a custom defined benchmark that implements the BenchmarkInterface.

# Constants

No description provided by the author
No description provided by the author

# Variables

No description provided by the author

# Structs

Atomically generate an auto incrementing value from the client-side.
No description provided by the author
No description provided by the author
A single goroutine worker that loops and benchmarks MySQL.
No description provided by the author
A thin wrapper around https://pkg.go.dev/github.com/go-mysql-org/go-mysql/client#Conn for now.
The database config object that can be turned into a single connection (without connection pooling).
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Generates values from a discrete set of possible values.
No description provided by the author
This extends the HDR histogram so it can track: - Start time - Under and overflow counts.
Generates a fixed number of unique strings with uniform distribution.
This generates float64 values based on a discrete probability distribution (represented via a histogram) via the inverse transform sampling algorithm (https://en.wikipedia.org/wiki/Inverse_transform_sampling).
Generates floating point values according to a histogram distribution.
Generates integers according to a histogram distribution.
Generates a random string with length selected by a histogram distribution.
No description provided by the author
No description provided by the author
Generates the same JSON document every time.
This is a double buffer implemented using a lock.
This is a convenience type defined to indicate that there's no context data.
Generates a floating point number with a given normal distribution.
Generates a random integer value according to a normal distribution.
A boring generator that only generates only null values.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
A terrible implementation of a ring, based on the Golang ring which is not thread-safe nor offers a nice API.
No description provided by the author
This struct provides helpers for creating and seeding a table.
Generates a fixed number of unique strings with uniform distribution.
Generates a date time value in two modes: 1.
TODO: can this be folded into the UniformFloatGenerator? Generates an random decimal value Sampling from existing is the same as the generation, which mean it is not guaranteed to generate an existing value if the number of rows in the database is small or the decimal has a large precision.
Generates a random floating point value according to an uniform distribution between min (inclusive) and max (exclusive).
No description provided by the author
Generates an integer value in the inclusive range between min (inclusive) and max (exclusive) with an uniform distribution.
Generates a random string with length selected between the min and max specified with uniform probability.
Generates an unique string with a fixed length every time Generate is called.
Generates UUIDs SampleFromExisting is basically broken as this should only very rarely generate a duplicate UUID.
No description provided by the author
This is the object type that holds the thread-local context data for each benchmark worker.
The actual benchmark struct for a single workload.
Config used to create the Workload.
Merges the IntervalData with other data.

# Interfaces

We want the workload to be templated so the context data can be transparently passed from the workload to the Event() function without going through runtime type selection.
This is the interface that the benchmark application needs to implement.
An interface for the data generator.
An interface for implementing the workload.

# Type aliases

No description provided by the author