arenaskl

Fast, lock-free, arena-based Skiplist implementation in Go that supports iteration in both directions.

Since this is pebble's internal lib, matrixone copied it and modified it to support some features and make it easier to use in matrixone.

Advantages

Arenaskl offers several advantages over other skiplist implementations:

High performance that linearly scales with the number of cores. This is achieved by allocating from a fixed-size arena and by avoiding locks.
Iterators that can be allocated on the stack and easily cloned by value.
Simple-to-use and low overhead model for detecting and handling race conditions with other threads.
Support for iterating in reverse (i.e. previous links).

Limitations

The advantages come at a cost that prevents arenaskl from being a general-purpose skiplist implementation:

The size of the arena sets a hard upper bound on the combined size of skiplist nodes, keys, and values. This limit includes even the size of deleted nodes, keys, and values.
Deletion is not supported. Instead, higher-level code is expected to add deletion tombstones and needs to process those tombstones appropriately.

Pedigree

This code is based on Andy Kimball's arenaskl code:

https://github.com/andy-kimball/arenaskl

The arenaskl code is based on the skiplist found in Badger, a Go-based KV store:

https://github.com/dgraph-io/badger/tree/master/skl

The skiplist in Badger is itself based on a C++ skiplist built for Facebook's RocksDB:

https://github.com/facebook/rocksdb/tree/master/memtable

Benchmarks

The benchmarks consist of a mix of reads and writes executed in parallel. The fraction of reads is indicated in the run name: "frac_X" indicates a run where X percent of the operations are reads.

The results are much better than skiplist and slist.

name                  time/op
ReadWrite/frac_0-8     470ns ±11%
ReadWrite/frac_10-8    462ns ± 3%
ReadWrite/frac_20-8    436ns ± 2%
ReadWrite/frac_30-8    410ns ± 2%
ReadWrite/frac_40-8    385ns ± 2%
ReadWrite/frac_50-8    360ns ± 4%
ReadWrite/frac_60-8    386ns ± 1%
ReadWrite/frac_70-8    352ns ± 2%
ReadWrite/frac_80-8    306ns ± 3%
ReadWrite/frac_90-8    253ns ± 4%
ReadWrite/frac_100-8  28.1ns ± 2%

Note that the above numbers are for concurrent operations using 8x parallelism. The same benchmarks without concurrency (use these numbers when comparing vs batchskl):

name                time/op
ReadWrite/frac_0    1.53µs ± 1%
ReadWrite/frac_10   1.46µs ± 2%
ReadWrite/frac_20   1.39µs ± 3%
ReadWrite/frac_30   1.28µs ± 3%
ReadWrite/frac_40   1.21µs ± 2%
ReadWrite/frac_50   1.11µs ± 3%
ReadWrite/frac_60   1.23µs ±17%
ReadWrite/frac_70   1.16µs ± 4%
ReadWrite/frac_80    959ns ± 3%
ReadWrite/frac_90    738ns ± 5%
ReadWrite/frac_100  81.9ns ± 2%

Forward and backward iteration are also fast:

name                time/op
IterNext            3.97ns ± 5%
IterPrev            3.88ns ± 3%

# README

arenaskl

Advantages

Limitations

Pedigree

Benchmarks