Categorygithub.com/securecollc/raft
modulepackage
1.0.1
Repository: https://github.com/securecollc/raft.git
Documentation: pkg.go.dev

# README

raft Build Status

raft is a Go library that manages a replicated log and can be used with an FSM to manage replicated state machines. It is a library for providing consensus.

The use cases for such a library are far-reaching as replicated state machines are a key component of many distributed systems. They enable building Consistent, Partition Tolerant (CP) systems, with limited fault tolerance as well.

Building

If you wish to build raft you'll need Go version 1.2+ installed.

Please check your installation with:

go version

Documentation

For complete documentation, see the associated Godoc.

To prevent complications with cgo, the primary backend MDBStore is in a separate repository, called raft-mdb. That is the recommended implementation for the LogStore and StableStore.

A pure Go backend using BoltDB is also available called raft-boltdb. It can also be used as a LogStore and StableStore.

Tagged Releases

As of September 2017, HashiCorp will start using tags for this library to clearly indicate major version updates. We recommend you vendor your application's dependency on this library.

  • v0.1.0 is the original stable version of the library that was in master and has been maintained with no breaking API changes. This was in use by Consul prior to version 0.7.0.

  • v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version manages server identities using a UUID, so introduces some breaking API changes. It also versions the Raft protocol, and requires some special steps when interoperating with Raft servers running older versions of the library (see the detailed comment in config.go about version compatibility). You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required to port Consul to these new interfaces.

    This version includes some new features as well, including non voting servers, a new address provider abstraction in the transport layer, and more resilient snapshots.

Protocol

raft is based on "Raft: In Search of an Understandable Consensus Algorithm"

A high level overview of the Raft protocol is described below, but for details please read the full Raft paper followed by the raft source. Any questions about the raft protocol should be sent to the raft-dev mailing list.

Protocol Description

Raft nodes are always in one of three states: follower, candidate or leader. All nodes initially start out as a follower. In this state, nodes can accept log entries from a leader and cast votes. If no entries are received for some time, nodes self-promote to the candidate state. In the candidate state nodes request votes from their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. The leader must accept new log entries and replicate to all the other followers. In addition, if stale reads are not acceptable, all queries must also be performed on the leader.

Once a cluster has a leader, it is able to accept new log entries. A client can request that a leader append a new log entry, which is an opaque binary blob to Raft. The leader then writes the entry to durable storage and attempts to replicate to a quorum of followers. Once the log entry is considered committed, it can be applied to a finite state machine. The finite state machine is application specific, and is implemented using an interface.

An obvious question relates to the unbounded nature of a replicated log. Raft provides a mechanism by which the current state is snapshotted, and the log is compacted. Because of the FSM abstraction, restoring the state of the FSM must result in the same state as a replay of old logs. This allows Raft to capture the FSM state at a point in time, and then remove all the logs that were used to reach that state. This is performed automatically without user intervention, and prevents unbounded disk usage as well as minimizing time spent replaying logs.

Lastly, there is the issue of updating the peer set when new servers are joining or existing servers are leaving. As long as a quorum of nodes is available, this is not an issue as Raft provides mechanisms to dynamically update the peer set. If a quorum of nodes is unavailable, then this becomes a very challenging issue. For example, suppose there are only 2 peers, A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum. This means the cluster is unable to add, or remove a node, or commit any additional log entries. This results in unavailability. At this point, manual intervention would be required to remove either A or B, and to restart the remaining node in bootstrap mode.

A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster of 5 can tolerate 2 node failures. The recommended configuration is to either run 3 or 5 raft servers. This maximizes availability without greatly sacrificing performance.

In terms of performance, Raft is comparable to Paxos. Assuming stable leadership, committing a log entry requires a single round trip to half of the cluster. Thus performance is bound by disk I/O and network latency.

# Packages

No description provided by the author
No description provided by the author

# Functions

BootstrapCluster initializes a server's storage with the given cluster configuration.
DefaultConfig returns a Config with usable defaults.
HasExistingState returns true if the server has any existing state (logs, knowledge of a current term, or any snapshots).
NewDiscardSnapshotStore is used to create a new DiscardSnapshotStore.
NewFileSnapshotStore creates a new FileSnapshotStore based on a base directory.
NewFileSnapshotStoreWithLogger creates a new FileSnapshotStore based on a base directory.
NewInmemAddr returns a new in-memory addr with a randomly generate UUID as the ID.
NewInmemSnapshotStore creates a blank new InmemSnapshotStore.
NewInmemStore returns a new in-memory backend.
NewInmemTransport is used to initialize a new transport and generates a random local address if none is specified.
NewInmemTransportWithTimeout is used to initialize a new transport and generates a random local address if none is specified.
NewLogCache is used to create a new LogCache with the given capacity and backend store.
NewNetworkTransport creates a new network transport with the given dialer and listener.
NewNetworkTransportWithConfig creates a new network transport with the given config struct.
NewNetworkTransportWithLogger creates a new network transport with the given logger, dialer and listener.
NewObserver creates a new observer that can be registered to make observations on a Raft instance.
NewRaft is used to construct a new Raft node.
NewTCPTransport returns a NetworkTransport that is built on top of a TCP streaming transport layer.
NewTCPTransportWithConfig returns a NetworkTransport that is built on top of a TCP streaming transport layer, using the given config struct.
NewTCPTransportWithLogger returns a NetworkTransport that is built on top of a TCP streaming transport layer, with log output going to the supplied Logger.
ReadConfigJSON reads a new-style peers.json and returns a configuration structure.
ReadPeersJSON consumes a legacy peers.json file in the format of the old JSON peer store and creates a new-style configuration structure.
RecoverCluster is used to manually force a new configuration in order to recover from a loss of quorum where the current configuration cannot be restored, such as when several servers die at the same time.
ValidateConfig is used to validate a sane configuration.

# Constants

AddNonvoter makes a server Nonvoter unless its Staging or Voter.
AddStaging makes a server Staging unless its Voter.
Candidate is one of the valid states of a Raft node.
256KB.
DemoteVoter makes a server Nonvoter unless its absent.
Follower is the initial state of a Raft node.
Leader is one of the valid states of a Raft node.
LogAddPeer is used to add a new peer.
LogBarrier is used to ensure all preceding operations have been applied to the FSM.
LogCommand is applied to a user FSM.
LogConfiguration establishes a membership change configuration.
LogNoop is used to assert leadership.
LogRemovePeer is used to remove an existing peer.
Nonvoter is a server that receives log entries but is not considered for elections or commitment purposes.
Promote is created automatically by a leader; it turns a Staging server into a Voter.
No description provided by the author
No description provided by the author
RemoveServer removes a server entirely from the cluster membership.
Shutdown is the terminal state of a Raft node.
No description provided by the author
No description provided by the author
Staging is a server that acts like a nonvoter with one exception: once a staging server receives enough log entries to be sufficiently caught up to the leader's log, the leader will invoke a membership change to change the Staging server to a Voter.
Voter is a server whose vote is counted in elections and whose match index is used in advancing the leader's commit index.

# Variables

ErrAbortedByRestore is returned when a leader fails to commit a log entry because it's been superseded by a user snapshot restore.
ErrCantBootstrap is returned when attempt is made to bootstrap a cluster that already has state present.
ErrEnqueueTimeout is returned when a command fails due to a timeout.
ErrLeader is returned when an operation can't be completed on a leader node.
ErrLeadershipLost is returned when a leader fails to commit a log entry because it's been deposed in the process.
ErrLogNotFound indicates a given log entry is not available.
ErrNothingNewToSnapshot is returned when trying to create a snapshot but there's nothing new commited to the FSM since we started.
ErrNotLeader is returned when an operation can't be completed on a follower or candidate node.
ErrPipelineReplicationNotSupported can be returned by the transport to signal that pipeline replication is not supported in general, and that no error message should be produced.
ErrPipelineShutdown is returned when the pipeline is closed.
ErrRaftShutdown is returned when operations are requested against an inactive Raft.
ErrTransportShutdown is returned when operations on a transport are invoked after it's been terminated.
ErrUnsupportedProtocol is returned when an operation is attempted that's not supported by the current protocol version.

# Structs

AppendEntriesRequest is the command used to append entries to the replicated log.
AppendEntriesResponse is the response returned from an AppendEntriesRequest.
Config provides any necessary configuration for the Raft server.
Configuration tracks which servers are in the cluster, and whether they have votes.
No description provided by the author
DiscardSnapshotStore is used to successfully snapshot while always discarding the snapshot.
FileSnapshotSink implements SnapshotSink with a file.
FileSnapshotStore implements the SnapshotStore interface and allows snapshots to be made on the local disk.
InmemSnapshotSink implements SnapshotSink in memory.
InmemSnapshotStore implements the SnapshotStore interface and retains only the most recent snapshot.
InmemStore implements the LogStore and StableStore interface.
InmemTransport Implements the Transport interface, to allow Raft to be tested in-memory without going over a network.
InstallSnapshotRequest is the command sent to a Raft peer to bootstrap its log (and state machine) from a snapshot on another peer.
InstallSnapshotResponse is the response returned from an InstallSnapshotRequest.
LeaderObservation is used for the data when leadership changes.
Log entries are replicated to all members of the Raft cluster and form the heart of the replicated state machine.
LogCache wraps any LogStore implementation to provide an in-memory ring buffer.
NetworkTransport provides a network based transport that can be used to communicate with Raft on remote machines.
NetworkTransportConfig encapsulates configuration for the network transport layer.
Observation is sent along the given channel to observers when an event occurs.
Observer describes what to do with a given observation.
Raft implements a Raft node.
RequestVoteRequest is the command used by a candidate to ask a Raft peer for a vote in an election.
RequestVoteResponse is the response returned from a RequestVoteRequest.
RPC has a command, and provides a response mechanism.
RPCHeader is a common sub-structure used to pass along protocol version and other information about the cluster.
RPCResponse captures both a response and a potential error.
Server tracks the information about a single server in a configuration.
SnapshotMeta is for metadata of a snapshot.
TCPStreamLayer implements StreamLayer interface for plain TCP.

# Interfaces

AppendFuture is used to return information about a pipelined AppendEntries request.
AppendPipeline is used for pipelining AppendEntries requests.
ApplyFuture is used for Apply and can return the FSM response.
ConfigurationFuture is used for GetConfiguration and can return the latest configuration in use by Raft.
FSM provides an interface that can be implemented by clients to make use of the replicated log.
FSMSnapshot is returned by an FSM in response to a Snapshot It must be safe to invoke FSMSnapshot methods with concurrent calls to Apply.
Future is used to represent an action that may occur in the future.
IndexFuture is used for future actions that can result in a raft log entry being created.
LogStore is used to provide an interface for storing and retrieving logs in a durable fashion.
LoopbackTransport is an interface that provides a loopback transport suitable for testing e.g.
No description provided by the author
SnapshotFuture is used for waiting on a user-triggered snapshot to complete.
SnapshotSink is returned by StartSnapshot.
SnapshotStore interface is used to allow for flexible implementations of snapshot storage and retrieval.
StableStore is used to provide stable storage of key configurations to ensure safety.
StreamLayer is used with the NetworkTransport to provide the low level stream abstraction.
Transport provides an interface for network transports to allow Raft to communicate with other nodes.
WithClose is an interface that a transport may provide which allows a transport to be shut down cleanly when a Raft instance shuts down.
WithPeers is an interface that a transport may provide which allows for connection and disconnection.
WithRPCHeader is an interface that exposes the RPC header.

# Type aliases

ConfigurationChangeCommand is the different ways to change the cluster configuration.
FilterFn is a function that can be registered in order to filter observations.
LogType describes various types of log entries.
These are the versions of the protocol (which includes RPC messages as well as Raft-specific log entries) that this server can _understand_.
RaftState captures the state of a Raft node: Follower, Candidate, Leader, or Shutdown.
ServerAddress is a network address for a server that a transport can contact.
ServerID is a unique string identifying a server for all time.
ServerSuffrage determines whether a Server in a Configuration gets a vote.
These are versions of snapshots that this server can _understand_.