Consistent hashing

Consider a situation like this. We have a distributed system containing n servers to store a huge hashMap. We use a simple method k%n to decide which server to store our key-value pair. (k is the hash key and n is the number of servers). The draw back of this method is that when you are adding a new server, you have to recalculate all the keys that stored in your distributed system because k%n+1 will be different for all the keys. This is not very desirable for the system because 1. recalculating all the keys are a lot of effort, and adding or removing servers in a distributed system is quite frequent. 2. You need to schedule a down time for the whole system to perform the task.

The consistent hashing was used to solve this problem. If using consistent hashing, only k/n keys at most need to be rehashed where n is the number of the servers and k is the number of keys. The idea is to put all the keys in the hash function range in a circle, and put the servers along the circle. To decide which server to put the key, simply calculate the hashcode of the key, map it to the circle, then walk along the circle to find the first server you met. That is the server you will save your key. When a new server comes in. When a new server is added, only the keys that seated in between the new added server and the server ahead of it needs to be rehashed. This method also allows easy balancing the load between the servers even if the keys are not evenly distributed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s