Viewstamped Replication for Elixir
A distributed consensus system implementing the Viewstamped Replication (VSR) protocol, providing fault-tolerant state machine replication with automatic failure recovery.
✅ Core VSR Protocol
- Primary-backup replication with view changes
- Automatic primary failure detection and recovery
- Log-based operation ordering and consistency
- Quorum-based consensus decisions
✅ Implemented Components
- Sequential operation validation with gap detection
- Heartbeat mechanism for failure detection
- Automatic view change triggering on primary timeout
- Memory management (cleanup of committed operation metadata)
- Pluggable state machines, log storage, and communication layers
✅ Observability
- Comprehensive telemetry instrumentation following Erlang/Elixir conventions
- Leadership span tracking (when nodes are primary/leader)
- Protocol event tracking (prepare, commit, view changes)
- State machine operation spans with duration metrics
- Timer and heartbeat event tracking
- See TELEMETRY_EVENTS.md for complete event documentation
- Workaround: Implement request deduplication at the application layer using unique request IDs
- Future Work: Full write operation deduplication requires propagating client identifiers through the entire VSR protocol
- Limitation: Cluster size and membership must be determined at startup and cannot be changed during operation
- Workaround: Plan cluster capacity ahead of time to accommodate expected load
- Future Work: Implement the reconfiguration protocol described in "Viewstamped Replication Revisited"
If available in Hex, the package can be installed
by adding vsr to your list of dependencies in mix.exs:
def deps do
[
{:vsr, "~> 0.1.0"}
]
end# Start a VSR replica with a key-value state machine
{:ok, replica} = Vsr.start_link(
log: [],
state_machine: VsrKv,
cluster_size: 3
)
# Perform operations
VsrKv.put(replica, "key", "value")
result = VsrKv.get(replica, "key") # Returns "value"mix testTest Status: 106/106 tests passing
VSR includes integration with Jepsen Maelstrom, a workbench for learning distributed systems by writing your own implementations and testing them against fault injection.
-
Download the latest Maelstrom release:
wget https://github.com/jepsen-io/maelstrom/releases/download/v0.2.3/maelstrom.tar.bz2 tar -xjf maelstrom.tar.bz2
-
Or use the provided script to download and extract:
curl -L https://github.com/jepsen-io/maelstrom/releases/download/v0.2.3/maelstrom.tar.bz2 | tar -xj
The repository includes a convenience script for running linearizable key-value tests:
./maelstrom-kvThis runs the lin-kv workload which tests:
- Linearizable key-value operations (read, write, cas)
- Fault tolerance with network partitions
- Consistency under concurrent operations
You can also run Maelstrom tests manually:
cd maelstrom
java -jar maelstrom.jar test \
-w lin-kv \
--bin ../run-vsr-node \
--node-count 3 \
--time-limit 10 \
--concurrency 6Workload Options:
lin-kv- Linearizable key-value store (read, write, cas operations)--node-count- Number of VSR replicas to run--time-limit- Duration of test in seconds--concurrency- Number of concurrent client operations
After a test run, check:
- Test results: Maelstrom will report if linearizability was maintained
- Logs: Found in
store/lin-kv/latest/jepsen.log- Test runner logs and errorsnode-logs/n*.log- Individual node logs
Success criteria:
- All operations must satisfy linearizability
- Minimal network timeouts (some expected during partitions)
- No crashes or protocol violations
See SPECIFICATION.md for detailed VSR protocol specification.
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/vsr.