A practical approach to read write quorum systems [Part 2]



I published the post "A practical approach to read-write quorum systems" a few months ago. The post refers to the paper Read-Write Quorum Systems Made Practical - Michael Whittaker, Aleksey Charapko, Joseph M. Hellerstein, Heidi Howard, Ion Stoica. It illustrates the implementation of "Quoracle". Quoracle provides the optimal quorums in respect of either the load, network or latency.

I have decided to rewrite the tool in Golang to explore the ecosystem and the tooling of the language. This article goes through the Golang implementation of the original Python library.

Quorum expressions definition

First of all, let's discuss the implementation of the expressions. The original paper uses expressions and nodes to describe quorums:

a, b, c = Node("a"), Node("b"), Node("c")
majority = QuorumSystem(reads=a*b + b*c + a*c)

The example above builds the majority quorums using the following pairs: [a,b], [b,c], [a,c]. Python can represent the expression above by overloading the * and the + operations. The original quoracle library uses the operations overload approach.

Go advocates simplicity, and it does not embrace operator overloading. So, it is necessary to proceed with a different approach.

In Golang, these methods describe the operations:

The ExprOperator interface defines the operations between two logical expressions. A logical expression between nodes represents many quorums.

Thus, it is possible to describe quorum as follows:

a, b, c :=
NewNode("a"), NewNode("b"), NewNode("c")

// (a * b) + (b * c) + (a * c)
majority := NewQuorumSystemWithReads(a.Multiply(b).Add(b.Multiply(c)).Add(a.Multiply(c)))

The ExprOperator interface provides the same functionalities as the original library. In the example above, the pairs: [a,b], [b,c], [a,c] are majority quorums. The next section goes through the Golang definition of a QuorumSystem and how to use it.

Quorum system overview definition

Now that we know how to define an Expr of nodes, we can declare a read-write quorum system. The below implementation describes the QuorumSystem struct used in the library.

The reads and writes fields represent the quorums. The nameToNode field keeps track of the different nodes in the quorum system. The QuorumSystem struct has the methods for calculating the quorum system's capacity, latency, load, and network load.

The StrategyOptions parameter struct represents the configurations for the strategy optimisation:

The Optimize property points to the optimisation target. The LoadLimit, NetworkLimit, LatencyLimit define an optional limit on the load, the network, and latency. The ReadFraction and the WriteFraction determine the workload distribution of the read and write operations.
The F field represents the resilience of the quorum. A quorum is F-resilient for some integer f, despite removing any f nodes from r, r is still a read/write quorum[1].

The library defines the initialization functions for a new QuorumSystem:

The above code omits some methods implementations for brevity. If the caller provides either the read or writes quorum, the constructor computes the logical dual operation of the other quorum and initialise a new QuorumSystem struct. If the caller provides both read and write quorums, the constructor checks the validity of the quorums and returns a new QuorumSystem struct with the corresponding quorums.

The following section shows how to translate the optimal strategy problem into a linear programming problem.

Optimal strategy problem definition

The original python implementation of quoracle uses the PuLP library and coin-or. The previous blog post looked at how to use PuLP for optimisation problems in a python runtime.

The Golang implementation uses a library called lanl/clp. Also lanl/clp relies on coin-or for solving linear programming optimization problems.

The codebase defines a helper struct to build a linear programming problem:

A lpDefinition struct contains the variables needed to describe a linear programming problem.
Let's suppose that we have three six-faced dices. Two dice are not allowed to have the same value. The goal is to find a difference between the 1st and 2nd-largest dice smaller than the 2nd and 3rd dice. The following lpDefinition represents the problem:

The lpDefinition above stores a value in the Vars array for each dice. The Constraints matrix represents the dice range constraint, from 1 to 6. The Objective matrix contains the two goals of the problem:

  1. Each dice must be different from the others (line 10 and 11);
  2. The difference between the 1st and 2nd largest dice must be smaller than the one between the 2nd and the 3rd dice.

The example above shows how to use the helper struct in the codebase to build a minimisation problem. Next, we will take a detailed look at the implementation for optimising the metrics. Depending on the optimisation target, the problem definition builds a different lpDefinition struct.

Load optimization definition

Let's start by describing how the load optimization problem is implemented. To recap, the formula for the load defined in the paper[1] is:

$$ \frac{f_r}{cap_R(x)} \sum_{{r \in R | x \in r }} p_r + \frac{1 - f_r}{cap_W(x)} \sum_{{w \in W | x \in w }} p_w \leq L_{f_r} $$

The following snippet of code implements the above formula, and it builds the LP problem:

The snippet omits some code for brevity. The buildLoadDef local function encapsulates the logic for building the load optimisation problem.
The function initialises the Vars and the Constraints from the lpVariable. Next, it adds the l variable and constraint that indicates the load(\(L_{fr}\)) for the specific read fraction. It builds the load formula for every lpVariable in the problem. For each Node in the quorum system, it applies the following expression in case of a read quorum:

tmp[v.Index] += fr * v.Value / float64(*qs.GetNodeByName(n.Name).ReadCapacity)

otherwise, in the case of a write quorum, it proceeds by using:

tmp[v.Index] += (1 - fr) * v.Value / float64(*qs.GetNodeByName(n.Name).WriteCapacity)

The ReadCapacity and the WriteCapacity are configurable for each Node.
The code needs to maintain the same order in the Vars and the Objectives arrays. Thus, each lpVariable uses an Index to refer to the exact position of each element in the arrays.

Network load optimization definition

This section describes the implementation of the network load. Let's start by refreshing the formula[1]:

$$ f_r ( \sum_{r \in R} p_r \cdot |r|) + (1 - f_r) ( \sum_{w \in W} p_w \cdot |w|) $$

\(|r|\) and \(|w|\) are the length of the read and write quorums sets. The library builds the network load minimization problem as follow:

The above code initialises a new Vars and the Constraints fields for each quorum. Then, it applies the network load formula by multiplying the length of the quorum with fr. Also, the implementation adds a row in the Objectives matrix in case we specify a network limit.

Latency optimization definition

The last optimization target is the latency. The formula described in the paper defines the latency as:

$$ f_r ( \sum_{r \in R} p_r \cdot latency(r)) + (1 - f_r) ( \sum_{w \in W} p_w \cdot latency(w)) $$

Below is the optimization definition of the latency:

The implementation creates a new lpDefinition populating the Vars and the Constraints. Then, it retrieves the latency for each readQuourmVars and writeQuorumVars. The latency of a quorum is the shortest time required to form a quorum after contacting the nodes in that quorum. The code calculates l using the readQuorumLatency and the writeQuorumLatency methods.

Next, it continues by applying the formula of the latency for the read quorums:

obj[v.Index] = fr * v.Value * float64(l)

the code takes the same approach for the write quorums using the opposite workload:

obj[v.Index] = (1 - fr) * v.Value * float64(l)

The following section describes how to translate the optimisation result into a new Strategy. Also, it shows how to execute the LP optimisation using the definitions seen in this section.

Strategy initialisation

The previous section described how to build the problem definition. Now we can proceed by executing the optimisation. The snippet of code below describes the optimisation execution and the initialisation of the strategy:

As a first step, the code initialises a NewSimplex. The simplex algorithm is a popular linear programming algorithm. We want to find the least objective of our problem. Thus the code sets the optimisation direction as clp. Minimise.

The code proceeds by creating a new lpDefinition based on the optimisation definitions seen in the previous section. For example, if the optimisation target is Network, the code calls the buildNetworkDef function.

On top of the optimisation target, the code needs to add another objective: the total sum of the read and write probabilities must be 1. The getTotalProbabilityObjectives method takes care of that. It returns a new objective array where the read and write probabilities targets 1.

The code executes the optimisation and checks that the resulting status is optimal. If the operation succeeds, the code gets back the optimal solution. Then, each quorum initialises a new SigmaRecord with the quorum and its probability of being selected.

Finally, it creates a new Strategy with the SigmaR (array of SigmaRecord for the read quorums) and the SigmaW (array of SigmaRecord for the write quorums).

Let's refresh the definition of strategy as mentioned in the paper:

$$ \sigma = (\sigma_R, \sigma_W) $$

\(\sigma_R\) and \(\sigma_W\) are probabilities distributions: \(\sigma_R(r)\) and \(\sigma_W(w)\) are respectively the probabilities that the strategy chooses a read quorum \(r\) and a write quorum \(w\).

quoracle-go represents a Strategy in a similar way using the following structs:

The SigmaR and SigmaW corresponds to the strategy's probability of choosing a specific quorum. Also, the Strategy struct maintains a hashmap storing the node and its likelihood of being selected.

The Strategy struct exposes some methods that retrieve some metrics and information. Below is the list of methods provided with the Strategy struct:

The code above omits the implementation of the functions for brevity. The GetReadQuorum, GetWriteQuorum uses a probability distribution to return the quorums. The Load, Capacity, NetworkLoad and Latency methods return the respective metrics for a given read or write workload. The NodeLoad, NodeUtilization and NodeThroughput methods target specific Node in the quorum system. The implementations use the probability of a node getting selected to calculate the respective node metric.

Searching for the optimal quorum strategy

We have seen how the codebase leverages linear programming to find the optimal strategy.

Let's reiterate one of the primary purposes of quoracle. Given the node's details, an optimisation target and workload distribution, it returns the optimised strategy.

The Search method implements the rule mentioned above. It calculates the optimal strategy by trying all the combinations of quorums. Whenever the optimal strategy for a given valid quorum returns a better target metric, the Search function saves that.

Below is the code implementation of the Search function:

The dupFreeExprs and the doSearch functions encapsulate the core logic.
The dupFreeExprs returns all the possible combinations of quorums composed using a list of nodes.
The doSearch uses the dupFreeExprs outcome to initialise a new quorum system and find an optimal strategy. The implementation keeps track of the Strategy and QuorumSystem with the most optimised metric.

The search process is time measured. When the operation reaches a specified timeout, then the search stops. The timeout-controlled function prevents long-hanging processes.

Wrap up

This post went through the Golang port of quoracle, describing the main components implemented in the codebase. The code is available at samueleresca/quoracle-go.
The project had two primary purposes. First, to put into practice the core concepts described in the paper[1]. Secondly, to explore the Golang ecosystem and tooling.
Some of the concepts might seem very theoretical, but it is essential to know the basics. Quorums are the foundation of distributed systems topics such as replication and consensus.


[1]Read-Write Quorum Systems Made Practical - Michael Whittaker, Aleksey Charapko, Joseph M. Hellerstein, Heidi Howard, Ion Stoica
[2]lanl/clp provides a Go interface to the COIN-OR Linear Programming (CLP) library