Categorygithub.com/CAFxX/fastrand
repositorypackage
0.1.0
Repository: https://github.com/cafxx/fastrand.git
Documentation: pkg.go.dev

# README

fastrand

Build status codecov Go Report Go Reference :warning: API is not stable yet.

Some fast, non-cryptographic PRNG sources, in three variants:

  • Plain - the basic implementation. Fastest, but can not be used concurrently without external synchronization.
  • Atomic - implementation using atomic operations. Non-locking, can be used concurrently, but a bit slower (especially at high concurrency).
  • Sharded - implementation using per-thread (P) sharding. Non-locking, can be used concurrently, fast (even at high concurrency), but does not support explicit seeding.

PRNGs currently implemented:

NameState (bits)Output (bits)PeriodVariants
SplitMix646464264Plain, Atomic, Sharded
PCG-XSH-RR6432264Plain, Atomic, Sharded
Xoshiro256**256642256-1Plain, Sharded

Planned additions include:

NameState (bits)Output (bits)PeriodVariants
PCG-XSL-RR128642128Plain, Atomic3, Sharded
xorshift128+128642128-1Plain, Atomic3, Sharded

Performance

Tests run on a Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz with Turbo Boost disabled. Lower is better.

GOMAXPROCS=1

PRNGPlainAtomicSharded
SplitMix642.02ns8.72ns7.33ns
PCG-XSH-RR3.17ns11.90ns7.33ns
Xoshiro256**4.57ns-112.40ns
math/rand7.06ns24.20ns2-

GOMAXPROCS=8

PRNGPlainAtomicSharded
SplitMix640.29ns26.20ns1.33ns
PCG-XSH-RR0.41ns13.20ns1.34ns
Xoshiro256**0.81ns-12.12ns
math/rand1.19ns72.40ns2-

Usage notes

Atomic variants

The atomic variant currently relies on unsafe to improve the performance of its CAS loops. It does so by calling the unexported procyield function in package runtime. This dependency will be removed in a future release. Usage of unsafe can be disabled by setting the fastrand_nounsafe build tag, at the cost of lower performance.

The state of the atomic variants is not padded/aligned to fill the cacheline: if needed users should pad the structure to avoid false sharing of the cacheline.

Sharded variants

Sharded variants rely on unsafe to implement sharding. They do so by calling the unexported procPin and procUnpin functions in package runtime. These functions are used by other packages (e.g. sync) for the same purpose, so they are unlikely to disappear/change. Usage of unsafe can be disabled by setting the fastrand_nounsafe build tag, at the cost of lower performance.

Sharded variants detect the value of GOMAXPROCS when they are instantiated (with NewShardedXxx). If GOMAXPROCS is increased after a sharded PRNG is instantiated it will yield suboptimal performance, as it may dynamically fallback to the corresponding atomic variant.

Sharded variants use more memory for the state than the other variants: in general they use at least GOMAXPROCS * 64 bytes. This is done to avoid false sharing of cachelines between shards.

Sharded variants do not allow explicit seeding since there is no easy way for a user to obtain a deterministic sequence from these variants (because, in general, goroutines can migrate between threads at any time).

License

MIT


1 there is no atomic variant for Xoshiro256** because its large state is not amenable to a performant atomic implementation.

2 the math/rand atomic variant is not a pure non-locking implementation, since it is implemented by guarding a rand.Rand using a sync.Mutex.

3 only for platforms where 128 bit CAS primitives are supported.