Graduate Networks, UCSD

CSE222 – Spring 2009

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

The Chord system is designed to be a distributed service which provides a very simple API: mapping a key to a node.

The main points of the paper were as follows:

  1. The Chord service only needs to provide a very basic primitive in order to be functional. While the service only provides key-to-node mappings, it is straightforward to implement richer services on top of it. For example, it is possible to implement a distributed file system by making the node which holds the mapping for a given key to contain the file whose name generates that key. Therefore, when a host wants to find a given file, it generates a key based on the filename and queries the node that Chord provides for that key, which responds with the file. By maintaining a general interface, it is possible to use Chord for many different, seemingly disparate applications without having to modify the underlying protocol
  2. Consistent hashing allows the system to be very efficient. The hashing system is designed in such a way that nodes joining or leaving the network cause a minimum amount of disturbance. A joining node can only steal a subset of the keys mapped to its successor, and when a node leaves the network, the mappings automatically fail over to its successor. The basic idea of a shared ring is quite simple, but much of the paper describes extensions designed to make lookups much faster (essentially by storing exponential probes with each node so that it is possible to search in log(n) instead of n time).
  3. Operating in the face of concurrency affects both correctness and performance. In order to deal with this fact, they add support for a stabilization protocol, which keeps the correctness information up to date. The fingers, which are used to provide the logarithmic searching performance, take longer to update in the face of a rapidly-changing network, but this is acceptable because they only provide performance, and as long as the stabilization algorithm keeps the successor pointers up to date, searches will always succeed, even if it takes longer than logarithmic time.

The main weakness of this paper is that since it only provides such a limited set of features, it is possible that services built on top of it will have to be more sophisticated. For example, a file system built on top of this will have to provide some kind of replication, because an unexpected node failure will leave the node’s successor as the holder of a given file without actually passing it the file data. While it may be argued that replication is easy to build into the system, doing it in such a way that it is reliable enough to be used for a long-term distributed file system is likely a serious undertaking.

Future research into Chord should provide a library on top of it to provide the things that were left out of the initial design, such as replication and security. This will simplify writing applications which are based on Chord by allowing them to a validated set of operations larger than just mapping a key to a node.

 

Chord: A scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

This paper introduces Chord, a lookup service using peer-to-peer hosts without reliance on hierarchy. It has applications in all kinds of distributed systems which have no central naming service and are distributing information on a large number of hosts. The contributions of this paper are as following:

The main service performed by Chord is a mapping service that maps keys onto nodes, which are supposed to be providing the value associated to the key. This is realized by using the consistent hashing scheme to distribute the content (which is formed by all values) equally on all nodes participating in the network. By using hashing, a node location (for example its IP address) is hashed to some node identifier. When a new value enters the network, its key hash value (=key identifier) is of the same length as the the node location hash (=node identifier), and the value is put on the closest node, e.g. with the node identifier being the closest successor to the key identifier.

The next contribution is the lookup function used to determine the location of the value in the network. The underlying assumption is that due to scalability concerns, a single node knows only the identifiers of a small subset of nodes in the network. Since a consistent hashing scheme can be thought of as a ring of hashes, a single node always keeps at least the location of his successor in memory for correctness, and for performance improvements also the location of 2-to-the-ith nodes for some small values of i. This helps to break down the search space exponentially with each jump.

The last contribution is a scheme for adding and removing hosts from this network, which means performing the necessary updates on the lookup tables at some hosts and by redistributing some of the content in the network. Correctness only depends on maintaining the successor node correctly. The stabilization operation also handles the case of multiple node joins or multiple leaving/failing nodes.

The most glaring problem of this work seems to me that the consistent hashing scheme just places the nodes arbitrarily on the ring, not leveraging any neighborhood information about hosts. If this system would really provide world-wide lookup services on globe-spanning peer-to-peer networks, not leveraging this information leads potentially to largely inefficient communication patterns.

Future work should as indicated in the paper present a security study on this type of protocol, to make sure that it is resilient to attacks by malicious peers at least to some extent.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

(i) Three most important things

1. Peer-to-peer systems and applications should be distributed systems without any centralized control or hierarchical organization, where the software running at each node is equivalent in functionality. The paper presents a scalable lookup protocol known as Chord that is fully distributed so that no node is more important than any other which improves robustness and makes Chord appropriate for loosely organized peer-to-peer applications.

2.  The load between nodes in peer-to-peer systems and applications needs to be balanced. The Chord protocol maps keys to nodes. A node might be responsible for storing a value associated with a key. Chord uses consistent hashing to assign keys to nodes and consistent hashing tends to spread keys evenly over the nodes, providing load balance.

3.  Lookup service for peer-to-peer systems and applications should be scalable to a large number of nodes. Previous work on consistent hashing assumed that nodes were aware of most other nodes in the system but that would be difficult to maintain for a large network. Each Chord node only needs routing information about only a few other nodes because the routing table is distributed so a node can resolve the hash function by communicating with a few other nodes through messages.

(ii) Most glaring problem

The most glaring problem would be that the paper doesn’t consider that data items associated with a key could be substantially larger than others. Chord load balancing ensures no node will carry more than k/n items (k being the number of keys and n the number of nodes) but a node could still be storing a much heavier load than the other nodes.

(iii) Future Research Directions

Future research directions for this work would be to improve lookup latency and robustness against malicious users. A malicious set of Chord users could present an incorrect view of the Chord ring or an attacker could insert a node into the Chord ring with an ID immediately following the item’s key and have the node return errors when asked to retrieve the data.