Categorygithub.com/Shopify/mybench

modulepackage

0.0.0-20240925011916-fa5e5d722458

Repository: https://github.com/shopify/mybench.git

Documentation: pkg.go.dev

# README

`mybench`

mybench is a benchmark authoring library that helps you create your own database benchmark with Golang. The central features of mybench includes:

A library approach to database benchmarking
Discretized precise rate control: the rate at which the events run is discretized to a relatively low frequency (default: 50hz), as Linux + Golang cannot reliably maintain 100~1000Hz. The number of events run on each iteration is determined by sampling an uniform or Poisson distribution. The rate control is very precise and have been achieved standard deviations of <0.2% of the desired rate.
Ability to parallelize a single workload into multiple goroutines, each with its own connection.
Ability to run multiple workloads simultaneously with data being logged from all workloads.
Uses HDR Histogram to keep track of latency online.
Web UI for live monitoring throughput and latency of the current benchmark.
A simple interface for implementing the data loader (which creates the tables and seed it with data) and the benchmark driver.
A number of built-in data generators, including thread-safe auto incrementing generators.
Command line wrapper: A wrapper library to help build command line apps for the benchmark.

Design

For more details, see the design doc. Some of the information in this section may eventually move there.

There are a few important structs defined in this library, and they are:

Benchmark: The main "entrypoint" to running a benchmark. This keeps track of multiple Workloads and performs data aggregation across all the Workloads and their BenchmarkWorkers.
WorkloadInterface: an interface that is defined by the end-user who want to create a benchmark. Notably, the end-user will implement an Event() function that should be called at a some specified EventRate (concurrently with a number of goroutines).
Workload: Responsible for creating and running the workers (goroutines) to call the Event() function of the WorkloadInterface.
BenchmarkWorker: Responsible for setting up the Looper and keeping track of the worker-local statistics (such as the event latency/histograms for the local goroutine).
Looper: Responsible for discretizing the desired event rate into something that's achievable on Linux. Actually calls the Event() function. It can also perform complex discretization such as Poisson-distribution based event sampling.
BenchmarkDataLoader: A data loader helper that helps you easily concurrently load data by specifying only a few options, such as the number of rows and the type of data generator for each columns.
BenchmarkApp[T]: A wrapper to help create a command line app for a benchmark.
Table: An object that helps you create the database and track a default set of data generators.

Data collection and flow

The benchmark system mainly collects data about the throughput and latency of the Event() function call, which contains custom logic (usually MySQL calls). Since Event() can be called from a large number of BenchmarkWorkers, each BenchmarkWorker collects its own statistics for performance reasons. The data collected by the BenchmarkWorkers are:

The count and rate of Event()
The latency distribution of Event() as tracked via the HDR Histogram.
Unimplemented:
- How long the worker spent in "saturation" (i.e. Event() is slower than the requested event rate). This is probably an important metric for later.
- The amount of time spent sleeping (could be useful to debug saturation problem in case the looper is incorrectly implemented).
- Everything in OuterLoopStat: wakeup latency, event batch size. This is probably less important than the above.

Having all this data in hundreds of independent Goroutines (BenchmarkWorkers) is not particularly useful. The data must be aggregated. This data aggregation is done on the workload level by the Workload, which is then aggregated at the Benchmark level via the data logger. This description may make it sound like the data collection is initiated by the BenchmarkWorkers -- it is not. Instead, every few seconds, the data logger calls the appropriate functions to aggregate data. During data collection, a lock taken for each BenchmarkWorker, which allows for the safe reading of data. This is fine as each BenchmarkWorker has its own mutex and there's never a lot of contention. If this becomes a problem, lockless programming may be a better approach.

Run a benchmark

Shopify orders benchmark: make examplebench && build/examplebench -host mysql-1 -user sys.admin_rw -pass hunter2 -bench -eventrate 3000
- Change the host
- Change the event rate. The command above specifies 3000 events/s.
Go to https://localhost:8005 to see the monitoring web UI.

Write your own benchmark

See benchmarks for examples and read the docs.

# Packages

benchmarks

No description provided by the author

examples

No description provided by the author

# Functions

InitializeTable

No description provided by the author

NewAutoIncrementGenerator

No description provided by the author

NewAutoIncrementGeneratorFromDatabase

No description provided by the author

NewBenchmark

No description provided by the author

NewBenchmarkConfig

No description provided by the author

NewBenchmarkWorker

No description provided by the author

NewDataLogger

No description provided by the author

NewEnumGenerator

No description provided by the author

NewExtendedHdrHistogram

No description provided by the author

NewHistogramCardinalityStringGenerator

See NewHistogramDistribution for documentation the arguments for this function.

NewHistogramDistribution

Creates a histogram distribution which is used by Rand to generate random numbers.

NewHistogramFloatGenerator

See NewHistogramDistribution for documentation the arguments for this function.

NewHistogramIntGenerator

See NewHistogramDistribution for documentation the arguments for this function.

NewHistogramLengthStringGenerator

See NewHistogramDistribution for documentation the arguments for this function.

NewHttpServer

No description provided by the author

NewJSONGenerator

No description provided by the author

NewLockedDoubleBuffer

No description provided by the author

NewNormalFloatGenerator

No description provided by the author

NewNormalIntGenerator

No description provided by the author

NewNowGenerator

No description provided by the author

NewNullGenerator

No description provided by the author

NewOnlineHistogram

No description provided by the author

NewRand

Creates a new Rand object.

NewRing

No description provided by the author

NewUniformCardinalityStringGenerator

No description provided by the author

NewUniformDatetimeGenerator

No description provided by the author

NewUniformDecimalGenerator

No description provided by the author

NewUniformFloatGenerator

No description provided by the author

NewUniformHistogram

No description provided by the author

NewUniformIntGenerator

No description provided by the author

NewUniformLengthStringGenerator

No description provided by the author

NewUniqueStringGenerator

length is the length of the string to be generated min and current are the integer values used to generate the strings.

NewUniqueStringGeneratorFromDatabase

No description provided by the author

NewUuidGenerator

NewUuidGenerator Only version 1 (timebased) and version 4 (random) supported.

NewWorkload

No description provided by the author

QuestionMarksStringList

No description provided by the author

Run

Runs a custom defined benchmark that implements the BenchmarkInterface.

# Constants

LooperTypePoisson

No description provided by the author

LooperTypeUniform

No description provided by the author

# Variables

VersionString

No description provided by the author

# Structs

AutoIncrementGenerator

Atomically generate an auto incrementing value from the client-side.

Benchmark

No description provided by the author

BenchmarkConfig

No description provided by the author

BenchmarkWorker

A single goroutine worker that loops and benchmarks MySQL.

Column

No description provided by the author

Connection

A thin wrapper around https://pkg.go.dev/github.com/go-mysql-org/go-mysql/client#Conn for now.

DatabaseConfig

The database config object that can be turned into a single connection (without connection pooling).

DataLogger

No description provided by the author

DataSnapshot

No description provided by the author

DatetimeInterval

No description provided by the author

DiscretizedLooper

No description provided by the author

EnumGenerator

Generates values from a discrete set of possible values.

EventStat

No description provided by the author

ExtendedHdrHistogram

This extends the HDR histogram so it can track: - Start time - Under and overflow counts.

HistogramCardinalityStringGenerator

Generates a fixed number of unique strings with uniform distribution.

HistogramDistribution

This generates float64 values based on a discrete probability distribution (represented via a histogram) via the inverse transform sampling algorithm (https://en.wikipedia.org/wiki/Inverse_transform_sampling).

HistogramFloatGenerator

Generates floating point values according to a histogram distribution.

HistogramIntGenerator

Generates integers according to a histogram distribution.

HistogramLengthStringGenerator

Generates a random string with length selected by a histogram distribution.

HttpServer

No description provided by the author

IntervalData

No description provided by the author

JSONGenerator

Generates the same JSON document every time.

LockedDoubleBuffer

This is a double buffer implemented using a lock.

NoContextData

This is a convenience type defined to indicate that there's no context data.

NormalFloatGenerator

Generates a floating point number with a given normal distribution.

NormalIntGenerator

Generates a random integer value according to a normal distribution.

NullGenerator

A boring generator that only generates only null values.

OnlineHistogram

No description provided by the author

OuterLoopStat

No description provided by the author

Rand

No description provided by the author

RateControlConfig

No description provided by the author

Ring

A terrible implementation of a ring, based on the Golang ring which is not thread-safe nor offers a nice API.

StatusData

No description provided by the author

Table

This struct provides helpers for creating and seeding a table.

UniformCardinalityStringGenerator

Generates a fixed number of unique strings with uniform distribution.

UniformDatetimeGenerator

Generates a date time value in two modes: 1.

UniformDecimalGenerator

TODO: can this be folded into the UniformFloatGenerator? Generates an random decimal value Sampling from existing is the same as the generation, which mean it is not guaranteed to generate an existing value if the number of rows in the database is small or the decimal has a large precision.

UniformFloatGenerator

Generates a random floating point value according to an uniform distribution between min (inclusive) and max (exclusive).

UniformHistogram

No description provided by the author

UniformIntGenerator

Generates an integer value in the inclusive range between min (inclusive) and max (exclusive) with an uniform distribution.

UniformLengthStringGenerator

Generates a random string with length selected between the min and max specified with uniform probability.

UniqueStringGenerator

Generates an unique string with a fixed length every time Generate is called.

UuidGenerator

Generates UUIDs SampleFromExisting is basically broken as this should only very rarely generate a duplicate UUID.

VisualizationConfig

No description provided by the author

WorkerContext

This is the object type that holds the thread-local context data for each benchmark worker.

Workload

The actual benchmark struct for a single workload.

WorkloadConfig

Config used to create the Workload.

WorkloadDataSnapshot

Merges the IntervalData with other data.

# Interfaces

AbstractWorkload

We want the workload to be templated so the context data can be transparently passed from the workload to the Event() function without going through runtime type selection.

BenchmarkInterface

This is the interface that the benchmark application needs to implement.

DataGenerator

An interface for the data generator.

WorkloadInterface

An interface for implementing the workload.

# Type aliases

LooperType

No description provided by the author