Categorygithub.com/dgraph-io/badger/v4
modulepackage
4.5.1
Repository: https://github.com/dgraph-io/badger.git
Documentation: pkg.go.dev

# README

BadgerDB

Go Reference Go Report Card Sourcegraph ci-badger-tests ci-badger-bank-tests ci-golang-lint

Badger mascot

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast, distributed graph database. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.

Project Status

Badger is stable and is being used to serve data sets worth hundreds of terabytes. Badger supports concurrent ACID transactions with serializable snapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for 8h, with --race flag and ensures the maintenance of transactional guarantees. Badger has also been tested to work with filesystem level anomalies, to ensure persistence and consistency. Badger is being used by a number of projects which includes Dgraph, Jaeger Tracing, UsenetExpress, and many more.

The list of projects using Badger can be found here.

Badger v1.0 was released in Nov 2017, and the latest version that is data-compatible with v1.0 is v1.6.0.

Badger v2.0 was released in Nov 2019 with a new storage format which won't be compatible with all of the v1.x. Badger v2.0 supports compression, encryption and uses a cache to speed up lookup.

Badger v3.0 was released in January 2021. This release improves compaction performance.

Please consult the Changelog for more detailed information on releases.

For more details on our version naming schema please read Choosing a version.

Table of Contents

Getting Started

Installing

To start using Badger, install Go 1.21 or above. Badger v3 and above needs go modules. From your project, run the following command

$ go get github.com/dgraph-io/badger/v4

This will retrieve the library.

Installing Badger Command Line Tool

Badger provides a CLI tool which can perform certain operations like offline backup/restore. To install the Badger CLI, retrieve the repository and checkout the desired version. Then run

$ cd badger
$ go install .

This will install the badger command line utility into your $GOBIN path.

Choosing a version

BadgerDB is a pretty special package from the point of view that the most important change we can make to it is not on its API but rather on how data is stored on disk.

This is why we follow a version naming schema that differs from Semantic Versioning.

  • New major versions are released when the data format on disk changes in an incompatible way.
  • New minor versions are released whenever the API changes but data compatibility is maintained. Note that the changes on the API could be backward-incompatible - unlike Semantic Versioning.
  • New patch versions are released when there's no changes to the data format nor the API.

Following these rules:

  • v1.5.0 and v1.6.0 can be used on top of the same files without any concerns, as their major version is the same, therefore the data format on disk is compatible.
  • v1.6.0 and v2.0.0 are data incompatible as their major version implies, so files created with v1.6.0 will need to be converted into the new format before they can be used by v2.0.0.
  • v2.x.x and v3.x.x are data incompatible as their major version implies, so files created with v2.x.x will need to be converted into the new format before they can be used by v3.0.0.

For a longer explanation on the reasons behind using a new versioning naming schema, you can read VERSIONING.

Badger Documentation

Badger Documentation is available at https://dgraph.io/docs/badger

Resources

Blog Posts

  1. Introducing Badger: A fast key-value store written natively in Go
  2. Make Badger crash resilient with ALICE
  3. Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go
  4. Concurrent ACID Transactions in Badger

Design

Badger was written with these design goals in mind:

  • Write a key-value database in pure Go.
  • Use latest research to build the fastest KV database for data sets spanning terabytes.
  • Optimize for SSDs.

Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.

Comparisons

FeatureBadgerRocksDBBoltDB
DesignLSM tree with value logLSM tree onlyB+ tree
High Read throughputYesNoYes
High Write throughputYesYesNo
Designed for SSDsYes (with latest research 1)Not specifically 2No
EmbeddableYesYesYes
Sorted KV accessYesYesYes
Pure Go (no Cgo)YesNoYes
TransactionsYes, ACID, concurrent with SSI3Yes (but non-ACID)Yes, ACID
SnapshotsYesYesYes
TTL supportYesYesNo
3D access (key-value-version)Yes4NoNo

1 The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.

2 RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.

3 SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger

4 Badger provides direct access to value versions via its Iterator API. Users can also specify how many versions to keep per key via Options.

Benchmarks

We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).

Projects Using Badger

Below is a list of known projects that use Badger:

  • Dgraph - Distributed graph database.
  • Jaeger - Distributed tracing platform.
  • go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
  • Riot - An open-source, distributed search engine.
  • emitter - Scalable, low latency, distributed pub/sub broker with message storage, uses MQTT, gossip and badger.
  • OctoSQL - Query tool that allows you to join, analyse and transform data from multiple databases using SQL.
  • Dkron - Distributed, fault tolerant job scheduling system.
  • smallstep/certificates - Step-ca is an online certificate authority for secure, automated certificate management.
  • Sandglass - distributed, horizontally scalable, persistent, time sorted message queue.
  • TalariaDB - Grab's Distributed, low latency time-series database.
  • Sloop - Salesforce's Kubernetes History Visualization Project.
  • Usenet Express - Serving over 300TB of data with Badger.
  • gorush - A push notification server written in Go.
  • 0-stor - Single device object store.
  • Dispatch Protocol - Blockchain protocol for distributed application data analytics.
  • GarageMQ - AMQP server written in Go.
  • RedixDB - A real-time persistent key-value store with the same redis protocol.
  • BBVA - Raft backend implementation using BadgerDB for Hashicorp raft.
  • Fantom - aBFT Consensus platform for distributed applications.
  • decred - An open, progressive, and self-funding cryptocurrency with a system of community-based governance integrated into its blockchain.
  • OpenNetSys - Create useful dApps in any software language.
  • HoneyTrap - An extensible and opensource system for running, monitoring and managing honeypots.
  • Insolar - Enterprise-ready blockchain platform.
  • IoTeX - The next generation of the decentralized network for IoT powered by scalability- and privacy-centric blockchains.
  • go-sessions - The sessions manager for Go net/http and fasthttp.
  • Babble - BFT Consensus platform for distributed applications.
  • Tormenta - Embedded object-persistence layer / simple JSON database for Go projects.
  • BadgerHold - An embeddable NoSQL store for querying Go types built on Badger
  • Goblero - Pure Go embedded persistent job queue backed by BadgerDB
  • Surfline - Serving global wave and weather forecast data with Badger.
  • Cete - Simple and highly available distributed key-value store built on Badger. Makes it easy bringing up a cluster of Badger with Raft consensus algorithm by hashicorp/raft.
  • Volument - A new take on website analytics backed by Badger.
  • KVdb - Hosted key-value store and serverless platform built on top of Badger.
  • Terminotes - Self hosted notes storage and search server - storage powered by BadgerDB
  • Pyroscope - Open source continuous profiling platform built with BadgerDB
  • Veri - A distributed feature store optimized for Search and Recommendation tasks.
  • bIter - A library and Iterator interface for working with the badger.Iterator, simplifying from-to, and prefix mechanics.
  • ld - (Lean Database) A very simple gRPC-only key-value database, exposing BadgerDB with key-range scanning semantics.
  • Souin - A RFC compliant HTTP cache with lot of other features based on Badger for the storage. Compatible with all existing reverse-proxies.
  • Xuperchain - A highly flexible blockchain architecture with great transaction performance.
  • m2 - A simple http key/value store based on the raft protocol.
  • chaindb - A blockchain storage layer used by Gossamer, a Go client for the Polkadot Network.
  • vxdb - Simple schema-less Key-Value NoSQL database with simplest API interface.
  • Opacity - Backend implementation for the Opacity storage project
  • Vephar - A minimal key/value store using hashicorp-raft for cluster coordination and Badger for data storage.
  • gowarcserver - Open-source server for warc files. Can be used in conjunction with pywb
  • flow-go - A fast, secure, and developer-friendly blockchain built to support the next generation of games, apps and the digital assets that power them.
  • Wrgl - A data version control system that works like Git but specialized to store and diff CSV.
  • Loggie - A lightweight, cloud-native data transfer agent and aggregator.
  • raft-badger - raft-badger implements LogStore and StableStore Interface of hashcorp/raft. it is used to store raft log and metadata of hashcorp/raft.
  • DVID - A dataservice for branched versioning of a variety of data types. Originally created for large-scale brain reconstructions in Connectomics.
  • KVS - A library for making it easy to persist, load and query full structs into BadgerDB, using an ownership hierarchy model.
  • LLS - LLS is an efficient URL Shortener that can be used to shorten links and track link usage. Support for BadgerDB and MongoDB. Improved performance by more than 30% when using BadgerDB
  • lakeFS - lakeFS is an open-source data version control that transforms your object storage to Git-like repositories. lakeFS uses BadgerDB for its underlying local metadata KV store implementation.

If you are using Badger in a project please send a pull request to add it to the list.

Contributing

If you're interested in contributing to Badger see CONTRIBUTING.

Contact

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Functions

No description provided by the author
DefaultOptions sets a list of recommended options for good performance.
No description provided by the author
No description provided by the author
LSMOnlyOptions follows from DefaultOptions, but sets a higher ValueThreshold so values would be collocated with the LSM tree, with value log largely acting as a write-ahead log only.
NewEntry creates a new entry with key and value passed in args.
Open returns a new DB object.
OpenKeyRegistry opens key registry if it exists, otherwise it'll create key registry and returns key registry.
OpenManaged returns a new DB, which allows more control over setting transaction timestamps, aka managed mode.
ReplayManifestFile reads the manifest file and constructs two manifest objects.
WriteKeyRegistry will rewrite the existing key registry file with new one.

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
KeyRegistryFileName is the file name for the key registry file.
KeyRegistryRewriteFileName is the file name for the rewrite key registry file.
ManifestFilename is the filename for the manifest file.
ValueThresholdLimit is the maximum permissible value of opt.ValueThreshold.
No description provided by the author

# Variables

DefaultIteratorOptions contains default options when iterating over Badger key-value stores.
ErrBannedKey is returned if the read/write key belongs to any banned namespace.
ErrBlockedWrites is returned if the user called DropAll.
ErrConflict is returned when a transaction conflicts with another transaction.
ErrDBClosed is returned when a get operation is performed after closing the DB.
ErrDiscardedTxn is returned if a previously discarded transaction is re-used.
ErrEmptyKey is returned if an empty key is passed on an update function.
ErrEncryptionKeyMismatch is returned when the storage key is not matched with the key previously given.
ErrGCInMemoryMode is returned when db.RunValueLogGC is called in in-memory mode.
ErrInvalidDataKeyID is returned if the datakey id is invalid.
ErrInvalidDump if a data dump made previously cannot be loaded into the database.
ErrInvalidEncryptionKey is returned if length of encryption keys is invalid.
ErrInvalidKey is returned if the key has a special !badger! prefix, reserved for internal usage.
ErrInvalidRequest is returned if the user request is invalid.
ErrKeyNotFound is returned when key isn't found on a txn.Get.
ErrManagedTxn is returned if the user tries to use an API which isn't allowed due to external management of transactions, when using ManagedDB.
ErrNamespaceMode is returned if the user tries to use an API which is allowed only when NamespaceOffset is non-negative.
ErrNilCallback is returned when subscriber's callback is nil.
ErrNoRewrite is returned if a call for value log GC doesn't result in a log file rewrite.
ErrPlan9NotSupported is returned when opt.ReadOnly is used on Plan 9.
ErrReadOnlyTxn is returned if an update function is called on a read-only transaction.
ErrRejected is returned if a value log GC is called either while another GC is running, or after DB::Close has been called.
ErrThresholdZero is returned if threshold is set to zero, and value log GC is called.
ErrTruncateNeeded is returned when the value log gets corrupt, and requires truncation of corrupt data to allow Badger to run properly.
ErrTxnTooBig is returned if too many writes are fit into a single transaction.
ErrValueLogSize is returned when opt.ValueLogFileSize option is not within the valid range.
ErrWindowsNotSupported is returned when opt.ReadOnly is used on Windows.
ErrZeroBandwidth is returned if the user passes in zero bandwidth for sequence.

# Structs

DB provides the various functions required to interact with Badger.
Entry
Entry provides Key, Value, UserMeta and ExpiresAt.
Item is returned during iteration.
Iterator helps iterating over the KV pairs in a lexicographically sorted order.
IteratorOptions is used to set options when iterating over Badger key-value stores.
KeyRegistry used to maintain all the data keys.
No description provided by the author
KVLoader is used to write KVList objects in to badger.
No description provided by the author
Manifest represents the contents of the MANIFEST file in a Badger store.
MergeOperator represents a Badger merge operator.
Options are params for creating DB object.
Sequence represents a Badger sequence.
Stream provides a framework to concurrently iterate over a snapshot of Badger, pick up key-values, batch them up and call Send.
StreamWriter is used to write data coming from multiple streams.
TableInfo represents the information about a table.
TableManifest contains information about a specific table in the LSM tree.
Txn represents a Badger transaction.
WriteBatch holds the necessary info to perform batched writes.

# Interfaces

Logger is implemented by any logging system that is used for standard logs.

# Type aliases

No description provided by the author
KVList contains a list of key-value pairs.
MergeFunc accepts two byte slices, one representing an existing value, and another representing a new value that needs to be ‘merged’ into it.