Categorygithub.com/hashicorp/raft
modulepackage
0.1.0
Repository: https://github.com/hashicorp/raft.git
Documentation: pkg.go.dev

# README

raft Build Status

raft is a Go library that manages a replicated log and can be used with an FSM to manage replicated state machines. It is a library for providing consensus.

The use cases for such a library are far-reaching as replicated state machines are a key component of many distributed systems. They enable building Consistent, Partition Tolerant (CP) systems, with limited fault tolerance as well.

Building

If you wish to build raft you'll need Go version 1.2+ installed.

Please check your installation with:

go version

Documentation

For complete documentation, see the associated Godoc.

To prevent complications with cgo, the primary backend MDBStore is in a separate repository, called raft-mdb. That is the recommended implementation for the LogStore and StableStore.

A pure Go backend using BoltDB is also available called raft-boltdb. It can also be used as a LogStore and StableStore.

Tagged Releases

As of September 2017, Hashicorp will start using tags for this library to clearly indicate major version updates. We recommend you vendor your application's dependency on this library.

  • v0.1.0 is the original stable version of the library that was in master and has been maintained with no breaking API changes. This was in use by Consul prior to version 0.7.0.

  • v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version manages server identities using a UUID, so introduces some breaking API changes. It also versions the Raft protocol, and requires some special steps when interoperating with Raft servers running older versions of the library (see the detailed comment in config.go about version compatibility). You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required to port Consul to these new interfaces.

    This version includes some new features as well, including non voting servers, a new address provider abstraction in the transport layer, and more resilient snapshots.

Protocol

raft is based on "Raft: In Search of an Understandable Consensus Algorithm"

A high level overview of the Raft protocol is described below, but for details please read the full Raft paper followed by the raft source. Any questions about the raft protocol should be sent to the raft-dev mailing list.

Protocol Description

Raft nodes are always in one of three states: follower, candidate or leader. All nodes initially start out as a follower. In this state, nodes can accept log entries from a leader and cast votes. If no entries are received for some time, nodes self-promote to the candidate state. In the candidate state nodes request votes from their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. The leader must accept new log entries and replicate to all the other followers. In addition, if stale reads are not acceptable, all queries must also be performed on the leader.

Once a cluster has a leader, it is able to accept new log entries. A client can request that a leader append a new log entry, which is an opaque binary blob to Raft. The leader then writes the entry to durable storage and attempts to replicate to a quorum of followers. Once the log entry is considered committed, it can be applied to a finite state machine. The finite state machine is application specific, and is implemented using an interface.

An obvious question relates to the unbounded nature of a replicated log. Raft provides a mechanism by which the current state is snapshotted, and the log is compacted. Because of the FSM abstraction, restoring the state of the FSM must result in the same state as a replay of old logs. This allows Raft to capture the FSM state at a point in time, and then remove all the logs that were used to reach that state. This is performed automatically without user intervention, and prevents unbounded disk usage as well as minimizing time spent replaying logs.

Lastly, there is the issue of updating the peer set when new servers are joining or existing servers are leaving. As long as a quorum of nodes is available, this is not an issue as Raft provides mechanisms to dynamically update the peer set. If a quorum of nodes is unavailable, then this becomes a very challenging issue. For example, suppose there are only 2 peers, A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum. This means the cluster is unable to add, or remove a node, or commit any additional log entries. This results in unavailability. At this point, manual intervention would be required to remove either A or B, and to restart the remaining node in bootstrap mode.

A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster of 5 can tolerate 2 node failures. The recommended configuration is to either run 3 or 5 raft servers. This maximizes availability without greatly sacrificing performance.

In terms of performance, Raft is comparable to Paxos. Assuming stable leadership, committing a log entry requires a single round trip to half of the cluster. Thus performance is bound by disk I/O and network latency.

# Packages

No description provided by the author

# Functions

AddUniquePeer is used to add a peer to a list of existing peers only if it is not already contained.
DefaultConfig returns a Config with usable defaults.
ExcludePeer is used to exclude a single peer from a list of peers.
NewDiscardSnapshotStore is used to create a new DiscardSnapshotStore.
NewFileSnapshotStore creates a new FileSnapshotStore based on a base directory.
NewFileSnapshotStoreWithLogger creates a new FileSnapshotStore based on a base directory.
NewInmemAddr returns a new in-memory addr with a randomly generate UUID as the ID.
NewInmemStore returns a new in-memory backend.
NewInmemTransport is used to initialize a new transport and generates a random local address if none is specified.
NewJSONPeers creates a new JSONPeers store.
NewLogCache is used to create a new LogCache with the given capacity and backend store.
NewNetworkTransport creates a new network transport with the given dialer and listener.
NewNetworkTransportWithLogger creates a new network transport with the given dialer and listener.
NewObserver creates a new observer that can be registered to make observations on a Raft instance.
NewRaft is used to construct a new Raft node.
NewTCPTransport returns a NetworkTransport that is built on top of a TCP streaming transport layer.
NewTCPTransportWithLogger returns a NetworkTransport that is built on top of a TCP streaming transport layer, with log output going to the supplied Logger.
PeerContained checks if a given peer is contained in a list.
ValidateConfig is used to validate a sane configuration.

# Constants

Candidate is one of the valid states of a Raft node.
256KB.
Follower is the initial state of a Raft node.
Leader is one of the valid states of a Raft node.
LogAddPeer is used to add a new peer.
LogBarrier is used to ensure all preceding operations have been applied to the FSM.
LogCommand is applied to a user FSM.
LogNoop is used to assert leadership.
LogRemovePeer is used to remove an existing peer.
Shutdown is the terminal state of a Raft node.

# Variables

ErrEnqueueTimeout is returned when a command fails due to a timeout.
ErrKnownPeer is returned when trying to add a peer to the configuration that already exists.
ErrLeader is returned when an operation can't be completed on a leader node.
ErrLeadershipLost is returned when a leader fails to commit a log entry because it's been deposed in the process.
ErrLogNotFound indicates a given log entry is not available.
ErrNothingNewToSnapshot is returned when trying to create a snapshot but there's nothing new commited to the FSM since we started.
ErrNotLeader is returned when an operation can't be completed on a follower or candidate node.
ErrPipelineReplicationNotSupported can be returned by the transport to signal that pipeline replication is not supported in general, and that no error message should be produced.
ErrPipelineShutdown is returned when the pipeline is closed.
ErrRaftShutdown is returned when operations are requested against an inactive Raft.
ErrTransportShutdown is returned when operations on a transport are invoked after it's been terminated.
ErrUnknownPeer is returned when trying to remove a peer from the configuration that doesn't exist.

# Structs

AppendEntriesRequest is the command used to append entries to the replicated log.
AppendEntriesResponse is the response returned from an AppendEntriesRequest.
Config provides any necessary configuration to the Raft server.
No description provided by the author
DiscardSnapshotStore is used to successfully snapshot while always discarding the snapshot.
FileSnapshotSink implements SnapshotSink with a file.
FileSnapshotStore implements the SnapshotStore interface and allows snapshots to be made on the local disk.
InmemStore implements the LogStore and StableStore interface.
InmemTransport Implements the Transport interface, to allow Raft to be tested in-memory without going over a network.
InstallSnapshotRequest is the command sent to a Raft peer to bootstrap its log (and state machine) from a snapshot on another peer.
InstallSnapshotResponse is the response returned from an InstallSnapshotRequest.
JSONPeers is used to provide peer persistence on disk in the form of a JSON file.
LeaderObservation is used in Observation.Data when leadership changes.
Log entries are replicated to all members of the Raft cluster and form the heart of the replicated state machine.
LogCache wraps any LogStore implementation to provide an in-memory ring buffer.
NetworkTransport provides a network based transport that can be used to communicate with Raft on remote machines.
Observation is sent along the given channel to observers when an event occurs.
Observer describes what to do with a given observation.
Raft implements a Raft node.
RequestVoteRequest is the command used by a candidate to ask a Raft peer for a vote in an election.
RequestVoteResponse is the response returned from a RequestVoteRequest.
RPC has a command, and provides a response mechanism.
RPCResponse captures both a response and a potential error.
SnapshotMeta is for metadata of a snapshot.
StaticPeers is used to provide a static list of peers.
TCPStreamLayer implements StreamLayer interface for plain TCP.

# Interfaces

AppendFuture is used to return information about a pipelined AppendEntries request.
AppendPipeline is used for pipelining AppendEntries requests.
ApplyFuture is used for Apply() and may return the FSM response.
FSM provides an interface that can be implemented by clients to make use of the replicated log.
FSMSnapshot is returned by an FSM in response to a Snapshot It must be safe to invoke FSMSnapshot methods with concurrent calls to Apply.
Future is used to represent an action that may occur in the future.
LogStore is used to provide an interface for storing and retrieving logs in a durable fashion.
LoopbackTransport is an interface that provides a loopback transport suitable for testing e.g.
PeerStore provides an interface for persistent storage and retrieval of peers.
SnapshotSink is returned by StartSnapshot.
SnapshotStore interface is used to allow for flexible implementations of snapshot storage and retrieval.
StableStore is used to provide stable storage of key configurations to ensure safety.
StreamLayer is used with the NetworkTransport to provide the low level stream abstraction.
Transport provides an interface for network transports to allow Raft to communicate with other nodes.
WithClose is an interface that a transport may provide which allows a transport to be shut down cleanly when a Raft instance shuts down.
WithPeers is an interface that a transport may provide which allows for connection and disconnection.

# Type aliases

FilterFn is a function that can be registered in order to filter observations.
LogType describes various types of log entries.
RaftState captures the state of a Raft node: Follower, Candidate, Leader, or Shutdown.