This post is an attempt to create a demo/example for consistent hashing in .Net/C#. This is not an in-depth analysis of consistent hashing as a concept. There is a plethora of excellent articles online that does that. Below are a few that helped me to grasp the concept.
This is essentially a walk through of the consistent hashing concept from a .Net developer’s perspective. As a developer, it has always been very helpful for me to grasp an idea when I create a proof of concept myself from a rudimentary analysis. The aim is to create a consistent hashing algorithm implementation that might help a .Net/C# developer to visualize the process and gain some insight into its inner mechanics.
In a nutshell, consistent hashing is a solution for the rehashing problem in a load distribution process. In a typical rehashing process, the target node for a key is determined by taking the mod of the key hash value, the divisor being the total number of nodes.
For example, if the key hash value is 32 and there are 5 nodes in total, then the target node is calculated as 32 % 5 = 2. It is quite apparent from the process that any change in the total number of nodes will change the target node value for all data keys. This results in the rearrangement of the keys across the nodes space for any addition and removal of nodes from the nodes cluster.
Consistent hashing attempts to solve this problem by minimizing the key rearrangement process so that only a small fraction of the keys needs reassignment. The core idea of consistent hashing is to map all values in a ring-shaped space. It can be considered as a visual representation of the modular arithmetic process utilized by the consistent hashing algorithm.
In the current example, the following approach is followed:
· Ring position is calculated for both node and data key by taking the mod of their individual hash value with the ring space as divisor. The ring space value should be large, greater than the total node count.
· The nodes and keys are mapped on the ring based on the calculated ring position.
· Keys are assigned to the next node in the ring in clock-wise direction. The direction is just a convention.
The best way to understand the process is to download the code from GitHub and analyze it.
Three main methods are defined.
- AddNode — Add a new node to the hash space. This implies that we can reassign some of the data keys from a different node to the newly created node.
In this case, the first node on the ring after the node to be added in the clockwise direction is identified. We do this by comparing the ring position. Now we have to identify the keys that needs to be assigned to the new node.
It is obvious that all keys to be reassigned are a subset of keys assigned to the next immediate node. All we have to do is to identify only those keys with the ring address less than the node to be added.
The identified set of keys are then reassigned to the new node.
Note that rest of the keys in the hash space remains unimpacted in this operation. This is the unique advantage of consistent hashing.
2. RemoveNode — Remove a node from the hash space. This implies that data keys assigned to the node to be removed will become orphan and needs to be reassigned.
The first node on the ring after the node to be removed in the clockwise direction is identified as the target node. We do this by comparing the ring position.
Now we simply reassign the keys belonging to the removed node to the target node.
Finally, the node in question is removed from the hash space.
As before, rest of the keys in the hash space remains unimpacted. Only the keys assigned to the node to be removed is affected.
3. AddKey — Add a new key to the hash space.
We start by calculating the hash value and ring position of the current key.
We identify the node in the hash space that is strictly greater than the ring position of the key to be added. As per convention, this node happens to be the first node in clockwise direction from the ring position of the key to be added.
Finally, we assign the current key to this node.
SearchNodes is a slightly modified binary search utility. Given a node position, It returns the exact match/strictly larger or strictly smaller node based on the node position from a sorted list of nodes.
SetNodes is a utility method which arranges a collection of given node and data keys into a dictionary collection of nodes and assigned keys as a preset for the subsequent operations. The arrangement of nodes can be random or equally spaced.
Solution structure and Testing
The ConsistentHashing solution contains the following two projects:
ConsistentHashingLib — The actual implementation of the consistent hashing algorithm. This a .Net library project. Code available here:
ConsistentHashing — A windows form project to visualize the process. System.Drawing namespace is used to graphically represent the hash space ring. Code available here:
After downloading the codebase, add a reference of the ConsistentHashingLib library to the ConsistentHashing project. ConsistentHashing project utilizes the System.Drawing namespace to graphically render the consistent hashing ring space. A sample representation from the project is given below.
Here, nodes are represented in orange and keys are in green. The node labels are in blue. Node labels contain the node name and the relative ring position in parenthesis. The key labels contain the key ring position and the parent node in parenthesis. The ring space is chosen as 360 so that the difference between the ring positions are identical to their angular difference.