Graduate Networks, UCSD

CSE222 – Spring 2009

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications May 29, 2009

Paper presents Chord which is a distributed lookup protocol that addresses the problem of locating the node that stores a particular data item. It provides this functionality by just one operation, mapping a key onto a node. Data location can be implemented on top of Chord by associating a key with each data item and sorting the key/data item pair at the node to which the key maps. It adapts naturally as more nodes join or leave the system. It uses a variant of consistent hashing to assign keys to Chord nodes. Consistent hashing automatically balances the load since each node receives roughly the same number of keys and it involves very little movement of keys when a node joins or leaves the system.

One major difference from previous approach is the fact that it does not assume that nodes are aware of other nodes in the system. This property enables it to scale. It needs routing information only for few other nodes. Due to distributive nature of routing a node resolves the hash function by communicating with a few other nodes. In steady state of an N node system, each node maintains information only about other O(logN) nodes and resolves all all lookup via O(logN) messages to other nodes.  It also maintains its routing information as nodes join and leave the system with high probability each such events result in no more than O(log^2N) messages.

Only one piece of information per node needs to be correct for chord to guarantee correct (though slow) routing of queries. Chord can be effectively used for cooperative web caching, with data mapped to caches by consistent hashing functions. It can be used to find the cache that contains the desired web data, with URI as the key for the DHT.

One shortcoming of the Chord is the fact that though it ensures that no node will have to carry more than k/n items (where k is the number of keys and n is the number of nodes), it does not consider the fact that some data items could be substantially larger than others. In web caching scenario this situation might arise due to embedded video data or any large object.

Though Chord has disadvantage n terms of run time while looking for key to node mapping due to distributive nature compared to the centralized solution which is fast. But this disadvantage is outweighed by the advantage provided by the distributive nature resulting in higher reliability and fault tolerance.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

This paper talks about a distributed protocol, Chord, which addresses the problem of efficiently locating the node that stores a particular data item in peer-to-peer applications. The protocol also works in the scenario of dynamic disappearance and joining of nodes in peer-to-peer network. Therefore, Chord can answer queries even if the system is continuously changing. The basic Chord protocol provides fast distributed computation of a hash function mapping keys to nodes responsible for them and uses consistent hashing. The consistent hashing assigns each node and key an m-bit identifier using a base hash function. They choose a node’s identifier by hashing its IP address and key identifier by hashing the key. The protocol assumes that the length of m is large enough so that two nodes or keys do not hash to the same identifier. The protocol also needs very small amount of routing information to implement consistent hashing in a distributed environment since each node need only aware of its successor node on the circle. A portion of the Chord protocol also maintains the successors pointers to ensure that all lookups are resolved correctly. The chord protocol makes sure that each node’s successor is correctly maintained and for every key, node successor(k) is responsible for that key in providing the dynamic node joining capability. They maintain finger tables at each node. Chord needs to deal with nodes joining the system concurrently and with nodes that fail or leave voluntarily.  Each Chord node maintains a “successor-list” of its r nearest successors on the Chord ring to account for failures and replication.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

Important Things:
1) Chord addresses a general problem faced by P2P applications by implementing a conceptually simple and generally useful mechanism (distributed lookup) which is nontrivial to implement. A wide variety of applications can be built on it, and it removes much of the complexity from P2P applications.
2) Chord is scalable, with the scale of the Internet in mind. Time and storage complexity is O(log N).
3) Chord can handle nodes joining and leaving (or failing) gracefully.

Problems:
The problem of a Chord network that becomes partitioned is not solved. Not only is there no recovery mechanism, but there is also no mechanism to detect the partition. If data is replicated, then the application could continue, each partition having different copies of data, and from that point on their views of the universe can become inconsistent. They did mention the posibility of recovering from this in the future, but then what happens when you try to resolve inconsistencies that develop in the application?

Future Work:
Future work should address problems of how to resolve inconsistencies and security issues that may appear in decentralized applications where no central authority exists.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

This paper presents a beautiful idea for easy name lookup in a distributed system. It asks the question: how can peer-to-peer applications efficiently locate the node that stores a particular data item? Highlights are:

  1. The solution uses a variant of consistent hashing to map key to nodes. Each key and IP address of a node is hashed to an identifier; each node is responsible for maintaining the data items associated with the keys occuring after its predecessor node on a circular number line. Each node maintains information  for O(log N) nodes in a finger table. It is proved that lookup resolution can happen within O(log N) messages.
  2. The solution tackles many difficult issues in distributed systems design, namely, a) load balancing by use of consistent hash function to spread keys evenly over the nodes, b) decentralization: no node is more important than the others, c) scalability: lookup cost grows as log of the number of nodes in the system, d) availability: automatically adjusts internal tables as the system changes, e) flexible naming: no constraints on structure of the keys.
  3. The paper provides an inkling of the applications that can immensely benefit from usage of Chord’s approach to naming. Chord can be used effectively for cooperative mirroring and search engines. Unlike DNS, Chord requires no centralized components which provides effective fault tolerance.

I see a minor security loophole in Chord. It could be the case that it has been fixed subsequently. The paper mentions that the Chord software on the node notifies the application of all changes to the set of keys that node is responsible for. Wouldn’t it be the case that the application is informed of the physical location of a data item that it is not interested in? This information could be misused by an adversarial application and presents a security threat.

Future work should look into the possible reasons why Chord has not been widely employed in many file-sharing systems. Is it the case that name-lookup in a centralized fashion in O(1) time is considered viable instead of O(log N) time here? Since DNS-type lookup is extremely popular, do the benefits of Chord outweigh the overhead of replacing DNS in such systems? Will mere replication of DNS entries ensure better fault tolerance than Chord? These questions need to be looked into.

Chord can be used effectively for cooperative
web caching, with data mapped to caches by consistent hashing functions and Chord used to find
the cache that contains the desired web data, with its URI as the key for the DHT
 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

The primary contribution of the paper is the Chord algorithm. The algorithm addresses a very fundamental problem in peer-to-peer networks – location of services.In a very large peer-to-peer system, it is not possible for a single node to store the location information for all the services (eg. files in a peer-to-peer file system). So the location information needs to be distributed across different nodes in the system and efficiently looked up. The Chord algorithm provides this service which can be used a low level primitive that is commonly used by several higher level applications built for different purposes.

Another contribution of the paper is the numerous improvements/optimizations suggested to improve the performance of the lookups and the proofs of correctness for all these optimizations. The paper also provides upper bounds on the lookup time in the common case, failure cases etc which is very useful to theoretically evaluate how well the algorithm would perform. The basic idea is that if all the location information is stored in one node (complexity O(N)), then the lookup can be performed in O(1) time. At the other extreme, if each node stores only O(1) location information, then the lookup can be performed in O(N) time. So Chord achieves a balance between the two by storing O(log N) location information in each node and using O(log N) time for a lookup.

The final contribution of the paper is the extensive simulation results which show that the performance of Chord is as expected. These results validate their claims that Chord offers good performance at large scale where some number of nodes would simultaneously be joining or leaving the system at any point of time.

One flaw with Chord (mentioned in the paper as well) is that it does not take advantage of locality. In a large scale peer-to-peer system spread across the Internet, it would be beneficial to place services such that the node with a particular piece of data or service and the node that uses the data are close together. However, Chord does not have a mechanism to support this. In fact, Chord actually explicitly tries to make a uniform distribution in a random fashion which make locality hard to achieve. Another problem with Chord is that the distribution of nodes and keys around the Chord ring need not actually be uniform. If some nodes got lucky and some others got lucky, it is possible that some nodes end up with a very large number of keys that they are responsible to store. The authors claim that this can be taken care of by having several virtual nodes per single physical node, which will increase the probability that the nodes get more evenly spread out in the ring. This however does not solve the problem of keys being unevenly distributed. If the keys are unevenly distributed, even if additional virtual nodes are used, information might not get evenly distributed. Also, it appears like the virtual nodes technique will help when all the nodes in the system introduce several virtual nodes rather than just have a single node. But the paper does not describe how performance can be affected by some physical nodes introducing more virtual nodes than others etc. Another crucial aspect is security. The paper says that application level security is required to take care of that although it is not explicit that handling security at the application layer would not affect performance. For example to avoid DOS attacks, if security is handled entirely at the application level, then the lower Chord layer still does a lot of work to service each of the requests and bring it up to the application layer. This is a lot of overhead. The other problem as stated by the authors is that the algorithm as described in the paper cannot recover from network partitions.

Future research in this direction can look at making the algorithm deal with network partitions better. It also appears like there are some parallels between Chord and IP anycast. They serve similar goal of identifying a node that offers a particular service. Anycast tries to find the closest node while Chord finds ‘the’ node that contains that information. It would be interesting to see if the two ideas can be integrated to achieve better exploitation of locality with Chord.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

The Chord system is designed to be a distributed service which provides a very simple API: mapping a key to a node.

The main points of the paper were as follows:

  1. The Chord service only needs to provide a very basic primitive in order to be functional. While the service only provides key-to-node mappings, it is straightforward to implement richer services on top of it. For example, it is possible to implement a distributed file system by making the node which holds the mapping for a given key to contain the file whose name generates that key. Therefore, when a host wants to find a given file, it generates a key based on the filename and queries the node that Chord provides for that key, which responds with the file. By maintaining a general interface, it is possible to use Chord for many different, seemingly disparate applications without having to modify the underlying protocol
  2. Consistent hashing allows the system to be very efficient. The hashing system is designed in such a way that nodes joining or leaving the network cause a minimum amount of disturbance. A joining node can only steal a subset of the keys mapped to its successor, and when a node leaves the network, the mappings automatically fail over to its successor. The basic idea of a shared ring is quite simple, but much of the paper describes extensions designed to make lookups much faster (essentially by storing exponential probes with each node so that it is possible to search in log(n) instead of n time).
  3. Operating in the face of concurrency affects both correctness and performance. In order to deal with this fact, they add support for a stabilization protocol, which keeps the correctness information up to date. The fingers, which are used to provide the logarithmic searching performance, take longer to update in the face of a rapidly-changing network, but this is acceptable because they only provide performance, and as long as the stabilization algorithm keeps the successor pointers up to date, searches will always succeed, even if it takes longer than logarithmic time.

The main weakness of this paper is that since it only provides such a limited set of features, it is possible that services built on top of it will have to be more sophisticated. For example, a file system built on top of this will have to provide some kind of replication, because an unexpected node failure will leave the node’s successor as the holder of a given file without actually passing it the file data. While it may be argued that replication is easy to build into the system, doing it in such a way that it is reliable enough to be used for a long-term distributed file system is likely a serious undertaking.

Future research into Chord should provide a library on top of it to provide the things that were left out of the initial design, such as replication and security. This will simplify writing applications which are based on Chord by allowing them to a validated set of operations larger than just mapping a key to a node.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

(i) The three most important things the paper says:

1. The most important feature of Chord is that it is constrained. The authors tailored the protocol quickly and simply locate a key in a network and maintain a stabilized network for doing so.  Having the problem be concise and efficient means it can be easily adapted by any applications that can make use of it. There is always a risk that a service will be over designed which doesn’t allow for the flexibility necessary for it to be widely deployed and useful.

2. The goal of the project was to provide quick and efficient lookups of where keys are stored in a distributed hash. They provably achieved this goal by deciding on an algorithm that will tell you which of you neighbors is closest to the key and then passing the request on to him. This makes the algorithm take only logarithmic time since you are dividing the possible hosts that could have the key by some factor at each hop. This was the best they could hope to do with distributed keys.

3. The other key challenge upon implementing Chord was to make the system stable enough to still function even nodes join and leave (by choice or by crashing) the network. The authors have the system periodically running a stabilize function which updates each ones list of which nodes it can access and passes its distance to different hosts to its neighbors. This means that even a request that is in route should be rerouted around faults while in flight.

(ii) The most glaring problem with the paper:

I felt that the paper did not adequately address how to use chord to create a redundant array of data. Where data has copies and if the data storing a certain key goes down ideally it would provide a mechanism for the request to be rerouted to another copy of the data. In many peer to peer applications you wish to locate multiple people with the same data (how does chord facilitate this being that you can only have one of each key?). Given that they had a clear target for the protocol I feel it would have been wise to give application programmers ideas of how to make the best use of their network architecture.

(iii) The future research directions of the work:

Follow up on how Chord performs in real systems will be very useful information. The author mentions that it is unclear if separate loops can form in practice so the deployment should demonstrate whether or not these can occur. They also only simulated the recursive implementation of Chord so it would be nice to have data on the iterative version as well.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

This paper introduces Chord, whose main functionality is to efficiently locate a node in a peer to peer (P2P) network. Given a key, Chord maps it onto a node. It adapts efficiently with nodes leaving and joining the network frequently. Main contributions are:

  1. A modified version of consistent hashing: Chord uses consistent hashing to map keys to nodes. Consistent hashing provides load balancing by evenly distributing keys to nodes (each node is assigned roughly K/N keys), and it also has the nice property that keys don’t change too much as nodes join or leave the network (only O(1/N) fraction of the keys are moved to a new location as a result).  One modification that the authors made is to remove the need to be aware of all other nodes in the network. Based on their implementation, a node needs to know about only O(log N) other nodes in a network of N nodes. Therefore, lookups require O(log N) messages to other nodes. Chord can still function with awareness of fewer nodes, but its performance would suffer (each node only needs to know about one other node, its successor node).
  2. Notion of successor nodes, and finger tables: Chord is able to achieve its scalability and good properties through the use of finger tables. First, the keys are arranged in order on an identifier circle. A function such as SHA1 could be used to hash node IP addresses to keys on the identifier circle. The successor of a key is then the first node on the identifier circle with key equal or greater than the search key. Finger tables are maintained on each node n. The table is a list of successors of n in distances exponential in 2: the first key on the the table is the successor of n. The i-th entry on the table is the id of the first node that succceeds n by at least 2^i-1 on the identifier circle.  To lookup a key, the current node compares the key with its own id and if it’s not a match, it uses the finger table to find the node with id closest to the search key, and forwards the search to that node. In this manner, the distance to the search key halves each time a node is contacted, and thus at most O(log N) nodes are contacted before reaching the destination node.
  3. Chord simplifies P2P systems by providing: load balancing, decentralization, logarithmic scalability, high availability, and flexible naming. Chord avoids a single point of failure, since all nodes in the network have equal responsibility. To join a P2P network, a node only needs to know about one other Chord node in the network; there is no servers that are required. This, along with the fact that keys are distributed evenly on the nodes, provides availability and decentralization. Further, nodes keep track of several successors, so in case of failure they can update the finger tables easily. Scalability is achieved without requiring hierarchy, because nodes only need to keep track of O(log N) other nodes; lookups run in predictable time. Chord requires no naming structure, as opposed to services like DNS. So it offers a very flexible naming scheme. Another nice thing about the design is that it is simple and handles concurrent join and failures very well.

Chord has a very clever design. While it’s hard to critique it, I feel like frequent node join/leaves could hinder its performance. The keys of one or more nodes need to be redistributed once a node joins or leaves the network. If the joins/leaves are too frequent, then the nodes have to redistribute keys too frequently, thus hindering their performance.

Future research: it would be good to take into accoun the distance/latency between the nodes, and rather than selecting a successor based on closest hash, take latency into account as well

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

(i) the three most important things the paper says

  1. One of the most important things mentioned in this paper is the fact that, in a distributed peer-to-peer system, no single node needs to know about more than two other nodes in order for that system to function properly (assuming a ring-type topology, and ignoring the requirements to remove a node).  Although slow, this interconnection scheme is the most scalable (in terms of adding nodes) and will continue to maintain an all-inclusive state within the system.  Another item mentioned on the same topic is that a reasonable lookup time can be achieved in this system by only requiring each node to remember the locations of log N nodes (where N is the number of nodes in the system).
  2. Another important observation is the fact that the system will always converge (possibly barring any severe partitions) from any number of joins (whether concurrent or not) using the stabilize machinery given the proper amount of time.  Basically, the authors guarantee that no additions to the system (joins) can ruin its consistency.  All that is required is time for the proper stabilize commands to complete and distribute the predecessor and successor information to each node.
  3. A third important observation of the paper is the fact that probability and statistics of typical usage are on the side of the system’s design.  The authors bring up the point that this system will work well based upon a typical failure rate for hosts on the internet.  I would say that if this system was used for distributing data from server to server, this scheme would never encounter problems with performance that would require it to reduce to minimal performance (because of the sheer number of failures that it would have to deal with).

(ii) the most glaring problem with the paper

Although mentioned in the “Future Work” section, the lack of security to this protocol is a bit disconcerting.  Even though security can be added later, I would think that for a protocol whose main deployment is on the Internet security would be a main concern (and most likely integrated directly into the protocol design and not just patched in later).  It may be that this entire protocol could be invalidated just because of the fact that it would be difficult to guard against a host spoofing the system for malicious purposes.

(iii) the future research directions of the work

I feel that it would be very interesting for the authors to investigate the effect of changing the distance between the entries in each for the finger tables per server (instead of log N).  Another study could display the empirical results of these comparisons alongside the theoretical results, just for the sake of accuracy on both ends.

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

Three Important Things:

  • Chord utilizes the consistent hashing algorithm in order to assign responsibility of keys to individual nodes. This is an important and decision and a good choice, because of the properties of the algorithm. The algorithm distributes keys evenly across nodes, ensures minimum disruption when a node joins or leaves the network (efficient rebalancing), and provides that a node needs only a small amount of routing information about other nodes. The paper presents these features as the motivation for picking the algorithm, as well as short description and a reference to the paper that introduced it.
  • Due to the unpredictable behaviors of nodes leaving and joining simultaneously, chord required that every node periodically run a stabilization routine that reconciles the network.
  • The first part of the paper makes a number of theoretical performance claims about the algorithms it utilizes. Because these numbers are essential for demonstrating load balance and scalability, the paper dedicates a section to evaluate these claims. The performance section shows that the path length for a search is indeed on the order of log(n), meaning that as the number of nodes and keys grows the protocol is scalable.

Glaring Problem:

The paper mainly focuses on the description and evaluation of the chord protocol in the interest of constructing a p2p network. It would have been helpful to also include a discussion of some actual applications that utilize the protocol, in order to demonstrate how easily they were constructed and how generally the protocol can applied.

Future Work:

The principles developed and refined by Chord are useful in other areas of routing. Specifically in protected or private networks such as data centers, a scalable flat routing scheme based on Chord would help facilitate useful features such as host mobility and VM migration.

 

Chord a scalable peer to peer lookup service for internet applications May 29, 2009

(i) The three most important things the paper says:

1) By leveraging consistent hashing, Chord is not only able to resolve many of peer to peer’s most difficult problems (load balancing, scalability) , but do it in efficient time.  In addition to the fact that, Stoicia et al proved that this solution allowed nodes to enter and leave the network with minimal disruption as well as to have searches complete in O(log(n)).  I felt this was important, since the discovery does not involve further increasing complexity (e.g. CAN using d-dimensional Cartesian coordinate space) and thus easier from a minimalistic and security perspective.

2) The development of scalable key location in Chord allows it to only need to know about a few number of nodes.  In addition, this was further accelerated by keeping some additional routing information, so it doesn’t need to go through all the successors.       I thought keeping this additional information was important, because not only did it provide an optimization but it also added a way to check for correctness in case the successor information was not kept correctly.  Furthermore, since each node only stores a small bit of redundant information, scalability and robustness is naturally built in.

3) Chord handles failures very gracefully which contributes to the overall robustness and quickness of the system.  Whenever a node fails, a lookup quickly recovers by using information in the successors-list.  This feature, also has the benefits of allow lookups to proceed even during periods of stabilization.  I felt this is important, because without this tid-bit of insight, the failure during instability would be even exponential (currently linear).   This is yet another important feature to ensure that Chord is scalable to n nodes.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is its failure to address all of the lookup failures during stabilization.  According to Figure 12, it shows that as the number of nodes increase, the number of failure due to instability increases linearly.  And although it isn’t an exponential increase, it doesn’t scale logarithmically as the tout in other areas of Chord.  This problem would be further exacerbated in today’s large networks which would spend most of the time in a state of instability.

(iii) The future research directions of the work:

The future research of the work would be to provide anonymity in Chord.  One of the most appealing uses of peer to peer is to share and spread data (e.g. bit torrent).  And these types of systems benefit the most when there are more nodes in the network.  I believe that folks would be more incline to join a peer to peer network, if there was no threat on their identity (think of all the identity theft these days).  Therefore, this system with all its benefits may never be adopted over a less efficient system (e.g. Freenet), if it doesn’t provide anonymity to its users.

 

Chord: A scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

This paper introduces Chord, a lookup service using peer-to-peer hosts without reliance on hierarchy. It has applications in all kinds of distributed systems which have no central naming service and are distributing information on a large number of hosts. The contributions of this paper are as following:

The main service performed by Chord is a mapping service that maps keys onto nodes, which are supposed to be providing the value associated to the key. This is realized by using the consistent hashing scheme to distribute the content (which is formed by all values) equally on all nodes participating in the network. By using hashing, a node location (for example its IP address) is hashed to some node identifier. When a new value enters the network, its key hash value (=key identifier) is of the same length as the the node location hash (=node identifier), and the value is put on the closest node, e.g. with the node identifier being the closest successor to the key identifier.

The next contribution is the lookup function used to determine the location of the value in the network. The underlying assumption is that due to scalability concerns, a single node knows only the identifiers of a small subset of nodes in the network. Since a consistent hashing scheme can be thought of as a ring of hashes, a single node always keeps at least the location of his successor in memory for correctness, and for performance improvements also the location of 2-to-the-ith nodes for some small values of i. This helps to break down the search space exponentially with each jump.

The last contribution is a scheme for adding and removing hosts from this network, which means performing the necessary updates on the lookup tables at some hosts and by redistributing some of the content in the network. Correctness only depends on maintaining the successor node correctly. The stabilization operation also handles the case of multiple node joins or multiple leaving/failing nodes.

The most glaring problem of this work seems to me that the consistent hashing scheme just places the nodes arbitrarily on the ring, not leveraging any neighborhood information about hosts. If this system would really provide world-wide lookup services on globe-spanning peer-to-peer networks, not leveraging this information leads potentially to largely inefficient communication patterns.

Future work should as indicated in the paper present a security study on this type of protocol, to make sure that it is resilient to attacks by malicious peers at least to some extent.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

The paper proposes a scalable protocol Chord for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures. The primary operation supported by Chord is to map a given key to a node containing a data-item. To this end the authors use consistent hashing which tends to balance load, since each node receives roughly the same number of keys, and involves relatively little movement of keys when nodes join and leave the system. Chord tries to address the problems of load balancing, decentralization, scalability and provably low complexity of implementation. The main points discussed in the paper are Chord’s consistent hashing and key assignment mechanism, the scalability arguments for dynamic node joins and its fault tolerance.

Distributed hash tables provide for easy location of data by lookup on a key in applications such as peer-to-peer file-sharing, content distribution systems, web caching, and dynamic name services, to name a few. Starting from the idea of consistent distributed hashing to assign data to nodes, Chord provides a completely decentralized mechanism that allows a node to find the receptacle for a data item in O(log N) time by tracking only O(log N) locations of other nodes. Chord can be used effectively for cooperative web caching, with data mapped to caches by consistent hashing functions and Chord used to find the cache that contains the desired web data.

Chord is fully distributed where no node is more important than any other thereby improving its robustness and making Chord appropriate for loosely-organized peer-to-peer applications. The cost of a Chord lookup only grows as the log of the number of nodes, so even very large systems are feasible and making it very scalable. Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures, ensuring that, barring major failures in the underlying network, the node responsible for a key can always be found. This is true even if the system is in a continuous state of change and hence makes Chord a completely available system. Chord assigns identifiers to nodes by arranging them in a circular fashion and assigning keys to nodes whose identifier is the same or the successor of the current key. For join operation every successor node need only its predecessor whose tables are updated when a node joins the circle of nodes. Optimizations are made to avoid the traversing the entire circle and only stepping though having only O(log N) routes to the destinations. Chord also implements support to handle concurrent node joins and failures.

Replication for fault-tolerance is left up to the applications. This is reasonable respecting the end-to-end argument, if the rate of failure is not particularly bad, which in true for many peer-to-peer scenarios. However application writers may not be aware of just how prone individual nodes can be to failure. For example, if a failure-prone node joins the peer-to-peer system and takes over an application’s data, an application that has come to expect that its data is rarely lost can suddenly be disappointed. If replicated data is not read-only, however, it might be unreasonable to suggest that Chord be adapted to handle that issue.

A possible shortcoming of Chord is that while its load balancing ensures that no node will have to carry more than k/n items (k being the number of keys and n the number of nodes), it doesn’t consider that some data items could be substantially larger than others. In the case of web caching, most cached data could be textual web data, interspersed with some large videos, causing problems for the nodes that have to host the videos in addition to the normal load of textual data items. Whenever it is possible to structure the distribution of data across a number of hosts in a network Chord’s structure is best suited since it uses a DHT to distributing the data objects. Chord has a bright future since it removes the presence of a centralized authority, thus making the distributed nodes of a peer-to-peer system truly independent. However it is worth considering that centralized authority though likely to be more costly provides its advantages of reducing the time required to find the node for a data item from O(log N) to O(log 1).

 

Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications May 29, 2009

Thus paper proposes Chord a distributed lookup protocol that addresses the problem of efficiently locating a node in a peer-to-peer network that stores a particular data item. Chord just supports one operation: given a key it maps to a node. Data location can be easily implemented on top of Chord by associating a key with each data item and storing the key/data item pair at the node which stores the data. Chord uses a variant of consistent hashing algorithm that tends to balance the load since each node receives roughly the same number of keys, and also needs little movement of keys when nodes join and leave the system. In previous consistent hashing algorithm each node were aware of most other nodes in the system, thus making it impossible to scale to a large number of nodes. In contrast, Chord node needs the routing information of only a few other nodes. Because the routing table is distributed, a node resolves the hash function by communicating with a few other nodes. In steady state in N-node system, each node maintains information about only other O(log n) nodes, and resolves all lookups via O(log n) messages. The most important features of Chord are its simplicity, provable correctness and provable performance. Its simplicity results from the routing of key through a sequence of O(log n) other nodes towards the destination. A Chord node requires information about O(log n) other nodes for efficient routing but the performance does not degrade drastically when the information is out of date. This is important because nodes may join and leave the system arbitrarily thus making consistency hard to maintain. Hence even in case of failed nodes, the mapping of the key will eventually succeed with a little more performance overhead.

The problem existing with the Chord system is that it heavily relies on probabilistic behaviors and hence may fail in certain unlikely pathological cases. The ability to handle concurrent joining or failure of nodes efficiently is a challenge. Also in certain cases failure of nodes can entirely partition the network, thus breaking the key lookup chain and effectively making Chord to fail on a key mapping. Finally there are also a lot of security loopholes, specially if the Chord system is used for cryptographic purpose, in which case an adversary can insert a node with a convenient key that may perform a DOS attack for a particular query look up.

Future directions of the work may include building mechanisms for healing partitioned rings. One way to check global consistency would be for each node to periodically ask other nodes to do a Chord look up. If it does not yield the result, there might be an existing partition. Another approach might be for nodes to maintain a long-term memory of random nodes they have encountered in the past. In case of a partition, random sets in one partition are likely to include nodes from other partition. Also for defending security loopholes, techniques like cryptographic hashing of IP addresses can be used which makes an attack difficult. Finally the lookup latency can be improved by using a server selection technique which can measure the network delay of each of the table entries pointed to by a node and use the node with minimum delay to forward the lookups.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

(i) Three most important things

1. Peer-to-peer systems and applications should be distributed systems without any centralized control or hierarchical organization, where the software running at each node is equivalent in functionality. The paper presents a scalable lookup protocol known as Chord that is fully distributed so that no node is more important than any other which improves robustness and makes Chord appropriate for loosely organized peer-to-peer applications.

2.  The load between nodes in peer-to-peer systems and applications needs to be balanced. The Chord protocol maps keys to nodes. A node might be responsible for storing a value associated with a key. Chord uses consistent hashing to assign keys to nodes and consistent hashing tends to spread keys evenly over the nodes, providing load balance.

3.  Lookup service for peer-to-peer systems and applications should be scalable to a large number of nodes. Previous work on consistent hashing assumed that nodes were aware of most other nodes in the system but that would be difficult to maintain for a large network. Each Chord node only needs routing information about only a few other nodes because the routing table is distributed so a node can resolve the hash function by communicating with a few other nodes through messages.

(ii) Most glaring problem

The most glaring problem would be that the paper doesn’t consider that data items associated with a key could be substantially larger than others. Chord load balancing ensures no node will carry more than k/n items (k being the number of keys and n the number of nodes) but a node could still be storing a much heavier load than the other nodes.

(iii) Future Research Directions

Future research directions for this work would be to improve lookup latency and robustness against malicious users. A malicious set of Chord users could present an incorrect view of the Chord ring or an attacker could insert a node into the Chord ring with an ID immediately following the item’s key and have the node return errors when asked to retrieve the data.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

1. Decentralized lookup with minimal hops
Chord realizes a distributed framework mapping a piece of information to the node corresponding to the information. This allows a node on the network to identify the node maintaining the information. The “information” being spoken of could be any value expressible as a key with a hash function. The mapping is achieved by using a common hash function to map each node identified by its IP address to a key. The information being accessed is mapped to a key as well. Knowing the two keys, a node on the network can identify the node that maintains the piece of information. The framework also supports portability of data over the network and

2. Efficient implementation of the lookup algorithm
In the Chord implementation, a node A seeking information maintained by node B accesses an intermediate node C which maintains the mapping. The hash of the information provides node A with a key which can be used to identify node C. On access node C, node A can determine the node maintaining the information of interest (node B). Chord realizes this while minimizing the amount of mapping entries that need to be maintained by each node and the number of hops that node A would take reach node C, both of which are conflicting requirements. Chord realizes an algorithm that allows node A to identify node C in minimal number of hops.

3. Dynamic and scalable.
The solution provided by the paper supports nodes joining and leaving the network on the fly. The complexity, information to be maintained and delay in lookup increases logarithmically with increasing nodes. Moreover, addition or removal of a node from the network results in the maximum of O(K/N) entries migrating, where K is the number of keys and N the number of nodes on the network.

Drawback

Introduces considerable overhead in maintaining mapping information. Each piece of information of interest, introduces a new entry into the mapping table.
It also expects the node interested in access the piece of information to know a priori the key corresponding to the piece of information.
Might result in triangle routing, wherein the node being used for lookup is further away than the node of interest (i.e. the node maintaining the information).

Future Work
A solution to minimizing the possibility of triangle routing could be worked out. This would take into consideration location of nodes in addition to the keys they generate. Caching could be introduced at different stages to minimize latencies and repeated lookups.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

Chord solves the problem of scalable peer-to-peer lookups for applications by providing just one operation: given a key, it maps the key onto a node. Chord is a scalable protocol for lookup in a dynamic peer-to-peer system with frequent node arrivals and departures. It attempts to solve these 5 problems: load balancing, decentralization, scalability, availability and flexible naming.

Major Points:
1.)
Consistent Hashing
Chord uses consistent hashing with sha1 in a circular fashion that allows nodes to enter and leave  with very minimal disruption. The consistent hashing is what gives Chord part of its scalability and the sha1 gives Chord a deterministic behavior. Consistent hashing also balances load with a high probability by giving all nodes roughly the same number of keys.
2.)
Scalable Key Location
It would not be scalable if every node had to maintain information about every other node. For this, Chord has every node maintain a routing table with at most m entries where m is the number of bits in the key/node identifiers. It then uses modules of 2^m to create intervals around the Chord circle so that each node only needs to know about a small number of nodes and the nodes close to them. They use successor and predecessor pointers and a concept of finger tables for maintenance. With this they prove that they can with high probability, only visit O(logN) nodes to find a successor in a N-node network.
3.)
Joining/Leaving
To handle joins and leaves, Chord maintains predecessor pointers along with the successor pointers. Every time a node joins or leaves it must update the nodes that point to it or it points to like a link list. They prove that with high probability, any node in a N-node network will use O(log(N)^2) messages to re-establish the Chord routing invariants and finger tables. It is important that this is scalable because in a peer-to-peer network, nodes leave, fail and join very frequently. To deal with failures, Chord runs a stabilization protocol periodically on every node. This finds bad pointers in the tables and tries to fix them.

Glaring Problem:

Chord’s correctness is based off of probabilistic guarantees. There can be a given time where the pointers for the successor/predecessor are incorrect and data that is in the system can’t be located. It is just prove that with high probability this will not occur. Some applications need stronger guarantees though. Chord is also not very secure because of a bad peer could give bad tables.

Future Work:
Chord is important because it provides a decentralized scalable look-up for peer-to-peer systems. Chord can also be easily used too because of it’s availability as a library and it’s simplicity. Some of the future work on Chord that is mention in the paper include, security against malicious peers and the detection/healing of nodes who know each other.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

Three major contributions:

  1. A distributed lookup protocol that does not require consensus, or two-phase commit in order to work. (i.e. nodes do not need a strictly consistent view of the network)
  2. A distributed lookup protocol requiring a node to store O(log N) information about other nodes (for efficient routing).
  3. A distributed lookup protocol that can deal with a reasonable amount of network churn (i.e. nodes joining/leaving).

A major problem:

They do not address load balancing in two situations. First, their placement of nodes is based on SHA-1, an uniform hash function, meaning that nodes have an equal probability of being placed anywhere on the ring. This characteristic, in practice, allows some nodes to be responsible for a large fraction of the ring while other nodes will be responsible for far less. Also, this system assumes that nodes are homogeneous. Fifty percent of lookup requests for a node are going to pass through the nodes predecessor. Therefore, nodes are bottlenecked by the capacity of their predecessor.

Future implications:

Distributed applications now have a scalable way to lookup other resources in a network without a centralized service. Future work could include addressing the above problems.

 

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications May 29, 2009

i)

1. Chord uses consistent hashing to ensure that the number of keys at each node is relatively balanced, and only a small number of keys must be moved when a nodes joins or leaves the network. Chord does this by hashing the IP address of each node and hashing each key as well. Then every node is responsible for all keys that are on the interval from that nodes hash value to the hash value of the node with the next lower hash value (with hash values wrapping around from 0 back to the maximum value). This allows the assignments to be relatively stable since a new node joining/leaving the network will only intersect one interval and only a portion of keys on that interval will need to be transferred between nodes.

2. Chord speeds up location of keys by dividing up the keyspace into m partitions where m is equivalent to the number of bits in the hash value. Then each node n assigns the each entry in it’s table a pointer to the node that is the successor to the key value n+2^(i+1) where i the the entry’s index. When this table reflects the current state of the nodes, this allows a node servicing a request to very quickly route it close to where the key resides rather than making the request travel all around the ring.

3. One problem is dealing with nodes dynamically joining and leaving the network. Chord deals with this by having a stabilization protocol that each node runs periodically. The stabilization protocol has a node n, query it’s successor s, to see if it is that nodes predecessor p, and if not it either updates its state to reflect that p is now n’s successor or that n is now s’s predecessor. This allows the all the nodes pointers to be kept up to date.

ii) One flaw in Chord is that a malicious attacker may be able to compromise the Chord ring by having a large number of clients from different ip addresses join the ring. Those clients would then take up a large amount of the key space and allow the attacker to disrupt the system or return erroneous values.  For instance if a user is looking up the location of a file, the attacker could instead point the user to a piece of malware instead.

iii) Future work would be to investigate how Chord could be made more secure and prevent malicious attackers. One mechanism might be a node could periodically insert a key with some nonce value and then attempt to retrieve the key and check to see if it gets the same nonce value back. If not, a gossip style protocol could be used to propogate the fact that the node thinks the successor to the key it inserted may be malicious.