Graduate Networks, UCSD

CSE222 – Spring 2009

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

The Chord system is designed to be a distributed service which provides a very simple API: mapping a key to a node.

The main points of the paper were as follows:

  1. The Chord service only needs to provide a very basic primitive in order to be functional. While the service only provides key-to-node mappings, it is straightforward to implement richer services on top of it. For example, it is possible to implement a distributed file system by making the node which holds the mapping for a given key to contain the file whose name generates that key. Therefore, when a host wants to find a given file, it generates a key based on the filename and queries the node that Chord provides for that key, which responds with the file. By maintaining a general interface, it is possible to use Chord for many different, seemingly disparate applications without having to modify the underlying protocol
  2. Consistent hashing allows the system to be very efficient. The hashing system is designed in such a way that nodes joining or leaving the network cause a minimum amount of disturbance. A joining node can only steal a subset of the keys mapped to its successor, and when a node leaves the network, the mappings automatically fail over to its successor. The basic idea of a shared ring is quite simple, but much of the paper describes extensions designed to make lookups much faster (essentially by storing exponential probes with each node so that it is possible to search in log(n) instead of n time).
  3. Operating in the face of concurrency affects both correctness and performance. In order to deal with this fact, they add support for a stabilization protocol, which keeps the correctness information up to date. The fingers, which are used to provide the logarithmic searching performance, take longer to update in the face of a rapidly-changing network, but this is acceptable because they only provide performance, and as long as the stabilization algorithm keeps the successor pointers up to date, searches will always succeed, even if it takes longer than logarithmic time.

The main weakness of this paper is that since it only provides such a limited set of features, it is possible that services built on top of it will have to be more sophisticated. For example, a file system built on top of this will have to provide some kind of replication, because an unexpected node failure will leave the node’s successor as the holder of a given file without actually passing it the file data. While it may be argued that replication is easy to build into the system, doing it in such a way that it is reliable enough to be used for a long-term distributed file system is likely a serious undertaking.

Future research into Chord should provide a library on top of it to provide the things that were left out of the initial design, such as replication and security. This will simplify writing applications which are based on Chord by allowing them to a validated set of operations larger than just mapping a key to a node.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

(i) the three most important things the paper says

  1. One of the most important things mentioned in this paper is the fact that, in a distributed peer-to-peer system, no single node needs to know about more than two other nodes in order for that system to function properly (assuming a ring-type topology, and ignoring the requirements to remove a node).  Although slow, this interconnection scheme is the most scalable (in terms of adding nodes) and will continue to maintain an all-inclusive state within the system.  Another item mentioned on the same topic is that a reasonable lookup time can be achieved in this system by only requiring each node to remember the locations of log N nodes (where N is the number of nodes in the system).
  2. Another important observation is the fact that the system will always converge (possibly barring any severe partitions) from any number of joins (whether concurrent or not) using the stabilize machinery given the proper amount of time.  Basically, the authors guarantee that no additions to the system (joins) can ruin its consistency.  All that is required is time for the proper stabilize commands to complete and distribute the predecessor and successor information to each node.
  3. A third important observation of the paper is the fact that probability and statistics of typical usage are on the side of the system’s design.  The authors bring up the point that this system will work well based upon a typical failure rate for hosts on the internet.  I would say that if this system was used for distributing data from server to server, this scheme would never encounter problems with performance that would require it to reduce to minimal performance (because of the sheer number of failures that it would have to deal with).

(ii) the most glaring problem with the paper

Although mentioned in the “Future Work” section, the lack of security to this protocol is a bit disconcerting.  Even though security can be added later, I would think that for a protocol whose main deployment is on the Internet security would be a main concern (and most likely integrated directly into the protocol design and not just patched in later).  It may be that this entire protocol could be invalidated just because of the fact that it would be difficult to guard against a host spoofing the system for malicious purposes.

(iii) the future research directions of the work

I feel that it would be very interesting for the authors to investigate the effect of changing the distance between the entries in each for the finger tables per server (instead of log N).  Another study could display the empirical results of these comparisons alongside the theoretical results, just for the sake of accuracy on both ends.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

(i) Three most important things

1. Peer-to-peer systems and applications should be distributed systems without any centralized control or hierarchical organization, where the software running at each node is equivalent in functionality. The paper presents a scalable lookup protocol known as Chord that is fully distributed so that no node is more important than any other which improves robustness and makes Chord appropriate for loosely organized peer-to-peer applications.

2.  The load between nodes in peer-to-peer systems and applications needs to be balanced. The Chord protocol maps keys to nodes. A node might be responsible for storing a value associated with a key. Chord uses consistent hashing to assign keys to nodes and consistent hashing tends to spread keys evenly over the nodes, providing load balance.

3.  Lookup service for peer-to-peer systems and applications should be scalable to a large number of nodes. Previous work on consistent hashing assumed that nodes were aware of most other nodes in the system but that would be difficult to maintain for a large network. Each Chord node only needs routing information about only a few other nodes because the routing table is distributed so a node can resolve the hash function by communicating with a few other nodes through messages.

(ii) Most glaring problem

The most glaring problem would be that the paper doesn’t consider that data items associated with a key could be substantially larger than others. Chord load balancing ensures no node will carry more than k/n items (k being the number of keys and n the number of nodes) but a node could still be storing a much heavier load than the other nodes.

(iii) Future Research Directions

Future research directions for this work would be to improve lookup latency and robustness against malicious users. A malicious set of Chord users could present an incorrect view of the Chord ring or an attacker could insert a node into the Chord ring with an ID immediately following the item’s key and have the node return errors when asked to retrieve the data.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

1. Decentralized lookup with minimal hops
Chord realizes a distributed framework mapping a piece of information to the node corresponding to the information. This allows a node on the network to identify the node maintaining the information. The “information” being spoken of could be any value expressible as a key with a hash function. The mapping is achieved by using a common hash function to map each node identified by its IP address to a key. The information being accessed is mapped to a key as well. Knowing the two keys, a node on the network can identify the node that maintains the piece of information. The framework also supports portability of data over the network and

2. Efficient implementation of the lookup algorithm
In the Chord implementation, a node A seeking information maintained by node B accesses an intermediate node C which maintains the mapping. The hash of the information provides node A with a key which can be used to identify node C. On access node C, node A can determine the node maintaining the information of interest (node B). Chord realizes this while minimizing the amount of mapping entries that need to be maintained by each node and the number of hops that node A would take reach node C, both of which are conflicting requirements. Chord realizes an algorithm that allows node A to identify node C in minimal number of hops.

3. Dynamic and scalable.
The solution provided by the paper supports nodes joining and leaving the network on the fly. The complexity, information to be maintained and delay in lookup increases logarithmically with increasing nodes. Moreover, addition or removal of a node from the network results in the maximum of O(K/N) entries migrating, where K is the number of keys and N the number of nodes on the network.

Drawback

Introduces considerable overhead in maintaining mapping information. Each piece of information of interest, introduces a new entry into the mapping table.
It also expects the node interested in access the piece of information to know a priori the key corresponding to the piece of information.
Might result in triangle routing, wherein the node being used for lookup is further away than the node of interest (i.e. the node maintaining the information).

Future Work
A solution to minimizing the possibility of triangle routing could be worked out. This would take into consideration location of nodes in addition to the keys they generate. Caching could be introduced at different stages to minimize latencies and repeated lookups.