A practical approach to read write quorum systems [Part 2]
I published the post "A practical approach to read-write quorum systems" a few months ago. The post refers to the paper Read-Write Quorum Systems Made Practical - Michael Whittaker, Aleksey Charapko, Joseph M. Hellerstein, Heidi Howard, Ion Stoica. It illustrates the implementation of "Quoracle". Quoracle provides the optimal quorums in respect of either the load
, network
or latency
.
I have decided to rewrite the tool in Golang to explore the ecosystem and the tooling of the language. This article goes through the Golang implementation of the original Python library.
Quorum expressions definition
First of all, let's discuss the implementation of the expressions. The original paper uses expressions and nodes to describe quorums:
a, b, c = Node("a"), Node("b"), Node("c")
majority = QuorumSystem(reads=a*b + b*c + a*c)
The example above builds the majority quorums using the following pairs: [a,b], [b,c], [a,c]
. Python can represent the expression above by overloading the *
and the +
operations. The original quoracle library uses the operations overload approach.
Go advocates simplicity, and it does not embrace operator overloading. So, it is necessary to proceed with a different approach.
In Golang, these methods describe the operations:
The ExprOperator
interface defines the operations between two logical expressions. A logical expression between nodes represents many quorums.
Thus, it is possible to describe quorum as follows:
a, b, c :=
NewNode("a"), NewNode("b"), NewNode("c")
// (a * b) + (b * c) + (a * c)
majority := NewQuorumSystemWithReads(a.Multiply(b).Add(b.Multiply(c)).Add(a.Multiply(c)))
The ExprOperator
interface provides the same functionalities as the original library. In the example above, the pairs: [a,b], [b,c], [a,c]
are majority quorums. The next section goes through the Golang definition of a QuorumSystem
and how to use it.
Quorum system overview definition
Now that we know how to define an Expr
of nodes, we can declare a read-write quorum system. The below implementation describes the QuorumSystem
struct used in the library.
The reads
and writes
fields represent the quorums. The nameToNode
field keeps track of the different nodes in the quorum system. The QuorumSystem
struct has the methods for calculating the quorum system's capacity, latency, load, and network load.
The StrategyOptions
parameter struct represents the configurations for the strategy optimisation:
The Optimize
property points to the optimisation target. The LoadLimit
, NetworkLimit
, LatencyLimit
define an optional limit on the load, the network, and latency. The ReadFraction
and the WriteFraction
determine the workload distribution of the read and write operations.
The F
field represents the resilience of the quorum. A quorum is F-resilient for some integer f, despite removing any f nodes from r, r is still a read/write quorum[1].
The library defines the initialization functions for a new QuorumSystem
:
The above code omits some methods implementations for brevity. If the caller provides either the read or writes quorum, the constructor computes the logical dual operation of the other quorum and initialise a new QuorumSystem
struct. If the caller provides both read and write quorums, the constructor checks the validity of the quorums and returns a new QuorumSystem
struct with the corresponding quorums.
The following section shows how to translate the optimal strategy problem into a linear programming problem.
Optimal strategy problem definition
The original python implementation of quoracle uses the PuLP library and coin-or. The previous blog post looked at how to use PuLP for optimisation problems in a python runtime.
The Golang implementation uses a library called lanl/clp. Also lanl/clp relies on coin-or for solving linear programming optimization problems.
The codebase defines a helper struct to build a linear programming problem:
A lpDefinition
struct contains the variables needed to describe a linear programming problem.
Let's suppose that we have three six-faced dices. Two dice are not allowed to have the same value. The goal is to find a difference between the 1st and 2nd-largest dice smaller than the 2nd and 3rd dice. The following lpDefinition
represents the problem:
The lpDefinition
above stores a value in the Vars
array for each dice. The Constraints
matrix represents the dice range constraint, from 1 to 6. The Objective
matrix contains the two goals of the problem:
- Each dice must be different from the others (line 10 and 11);
- The difference between the 1st and 2nd largest dice must be smaller than the one between the 2nd and the 3rd dice.
The example above shows how to use the helper struct in the codebase to build a minimisation problem. Next, we will take a detailed look at the implementation for optimising the metrics. Depending on the optimisation target, the problem definition builds a different lpDefinition
struct.
Load optimization definition
Let's start by describing how the load optimization problem is implemented. To recap, the formula for the load defined in the paper[1] is:
$$ \frac{f_r}{cap_R(x)} \sum_{{r \in R | x \in r }} p_r + \frac{1 - f_r}{cap_W(x)} \sum_{{w \in W | x \in w }} p_w \leq L_{f_r} $$
The following snippet of code implements the above formula, and it builds the LP problem:
The snippet omits some code for brevity. The buildLoadDef
local function encapsulates the logic for building the load optimisation problem.
The function initialises the Vars
and the Constraints
from the lpVariable
. Next, it adds the l
variable and constraint that indicates the load(\(L_{fr}\)) for the specific read fraction. It builds the load formula for every lpVariable
in the problem. For each Node
in the quorum system, it applies the following expression in case of a read quorum:
tmp[v.Index] += fr * v.Value / float64(*qs.GetNodeByName(n.Name).ReadCapacity)
otherwise, in the case of a write quorum, it proceeds by using:
tmp[v.Index] += (1 - fr) * v.Value / float64(*qs.GetNodeByName(n.Name).WriteCapacity)
The ReadCapacity
and the WriteCapacity
are configurable for each Node
.
The code needs to maintain the same order in the Vars
and the Objectives
arrays. Thus, each lpVariable
uses an Index
to refer to the exact position of each element in the arrays.
Network load optimization definition
This section describes the implementation of the network load. Let's start by refreshing the formula[1]:
$$ f_r ( \sum_{r \in R} p_r \cdot |r|) + (1 - f_r) ( \sum_{w \in W} p_w \cdot |w|) $$
\(|r|\) and \(|w|\) are the length of the read and write quorums sets. The library builds the network load minimization problem as follow:
The above code initialises a new Vars
and the Constraints
fields for each quorum. Then, it applies the network load formula by multiplying the length of the quorum with fr
. Also, the implementation adds a row in the Objectives
matrix in case we specify a network limit.
Latency optimization definition
The last optimization target is the latency. The formula described in the paper defines the latency as:
$$ f_r ( \sum_{r \in R} p_r \cdot latency(r)) + (1 - f_r) ( \sum_{w \in W} p_w \cdot latency(w)) $$
Below is the optimization definition of the latency:
The implementation creates a new lpDefinition
populating the Vars
and the Constraints
. Then, it retrieves the latency for each readQuourmVars
and writeQuorumVars
. The latency of a quorum is the shortest time required to form a quorum after contacting the nodes in that quorum. The code calculates l
using the readQuorumLatency
and the writeQuorumLatency
methods.
Next, it continues by applying the formula of the latency for the read quorums:
obj[v.Index] = fr * v.Value * float64(l)
the code takes the same approach for the write quorums using the opposite workload:
obj[v.Index] = (1 - fr) * v.Value * float64(l)
The following section describes how to translate the optimisation result into a new Strategy
. Also, it shows how to execute the LP optimisation using the definitions seen in this section.
Strategy initialisation
The previous section described how to build the problem definition. Now we can proceed by executing the optimisation. The snippet of code below describes the optimisation execution and the initialisation of the strategy:
As a first step, the code initialises a NewSimplex
. The simplex algorithm is a popular linear programming algorithm. We want to find the least objective of our problem. Thus the code sets the optimisation direction as clp. Minimise
.
The code proceeds by creating a new lpDefinition
based on the optimisation definitions seen in the previous section. For example, if the optimisation target is Network
, the code calls the buildNetworkDef
function.
On top of the optimisation target, the code needs to add another objective: the total sum of the read and write probabilities must be 1. The getTotalProbabilityObjectives
method takes care of that. It returns a new objective array where the read and write probabilities targets 1
.
The code executes the optimisation and checks that the resulting status
is optimal. If the operation succeeds, the code gets back the optimal solution. Then, each quorum initialises a new SigmaRecord
with the quorum and its probability of being selected.
Finally, it creates a new Strategy
with the SigmaR
(array of SigmaRecord
for the read quorums) and the SigmaW
(array of SigmaRecord
for the write quorums).
Let's refresh the definition of strategy as mentioned in the paper:
$$ \sigma = (\sigma_R, \sigma_W) $$
\(\sigma_R\) and \(\sigma_W\) are probabilities distributions: \(\sigma_R(r)\) and \(\sigma_W(w)\) are respectively the probabilities that the strategy chooses a read quorum \(r\) and a write quorum \(w\).
quoracle-go represents a Strategy
in a similar way using the following structs:
The SigmaR
and SigmaW
corresponds to the strategy's probability of choosing a specific quorum. Also, the Strategy
struct maintains a hashmap storing the node and its likelihood of being selected.
The Strategy
struct exposes some methods that retrieve some metrics and information. Below is the list of methods provided with the Strategy
struct:
The code above omits the implementation of the functions for brevity. The GetReadQuorum
, GetWriteQuorum
uses a probability distribution to return the quorums. The Load
, Capacity
, NetworkLoad
and Latency
methods return the respective metrics for a given read or write workload. The NodeLoad
, NodeUtilization
and NodeThroughput
methods target specific Node
in the quorum system. The implementations use the probability of a node getting selected to calculate the respective node metric.
Searching for the optimal quorum strategy
We have seen how the codebase leverages linear programming to find the optimal strategy.
Let's reiterate one of the primary purposes of quoracle. Given the node's details, an optimisation target and workload distribution, it returns the optimised strategy.
The Search
method implements the rule mentioned above. It calculates the optimal strategy by trying all the combinations of quorums. Whenever the optimal strategy for a given valid quorum returns a better target metric, the Search
function saves that.
Below is the code implementation of the Search
function:
The dupFreeExprs
and the doSearch
functions encapsulate the core logic.
The dupFreeExprs
returns all the possible combinations of quorums composed using a list of nodes.
The doSearch
uses the dupFreeExprs
outcome to initialise a new quorum system and find an optimal strategy. The implementation keeps track of the Strategy
and QuorumSystem
with the most optimised metric.
The search process is time measured. When the operation reaches a specified timeout, then the search stops. The timeout-controlled function prevents long-hanging processes.
Wrap up
This post went through the Golang port of quoracle, describing the main components implemented in the codebase. The code is available at samueleresca/quoracle-go.
The project had two primary purposes. First, to put into practice the core concepts described in the paper[1]. Secondly, to explore the Golang ecosystem and tooling.
Some of the concepts might seem very theoretical, but it is essential to know the basics. Quorums are the foundation of distributed systems topics such as replication and consensus.
References
[1]Read-Write Quorum Systems Made Practical - Michael Whittaker, Aleksey Charapko, Joseph M. Hellerstein, Heidi Howard, Ion Stoica
[2]lanl/clp provides a Go interface to the COIN-OR Linear Programming (CLP) library
[3]github.com/mwhittaker/quoracle