Graduate Networks, UCSD

CSE222 – Spring 2009

Chord a scalable peer to peer lookup service for internet applications May 29, 2009

(i) The three most important things the paper says:

1) By leveraging consistent hashing, Chord is not only able to resolve many of peer to peer’s most difficult problems (load balancing, scalability) , but do it in efficient time.  In addition to the fact that, Stoicia et al proved that this solution allowed nodes to enter and leave the network with minimal disruption as well as to have searches complete in O(log(n)).  I felt this was important, since the discovery does not involve further increasing complexity (e.g. CAN using d-dimensional Cartesian coordinate space) and thus easier from a minimalistic and security perspective.

2) The development of scalable key location in Chord allows it to only need to know about a few number of nodes.  In addition, this was further accelerated by keeping some additional routing information, so it doesn’t need to go through all the successors.       I thought keeping this additional information was important, because not only did it provide an optimization but it also added a way to check for correctness in case the successor information was not kept correctly.  Furthermore, since each node only stores a small bit of redundant information, scalability and robustness is naturally built in.

3) Chord handles failures very gracefully which contributes to the overall robustness and quickness of the system.  Whenever a node fails, a lookup quickly recovers by using information in the successors-list.  This feature, also has the benefits of allow lookups to proceed even during periods of stabilization.  I felt this is important, because without this tid-bit of insight, the failure during instability would be even exponential (currently linear).   This is yet another important feature to ensure that Chord is scalable to n nodes.

(ii) The most glaring problem with the paper:

The biggest problem with the paper is its failure to address all of the lookup failures during stabilization.  According to Figure 12, it shows that as the number of nodes increase, the number of failure due to instability increases linearly.  And although it isn’t an exponential increase, it doesn’t scale logarithmically as the tout in other areas of Chord.  This problem would be further exacerbated in today’s large networks which would spend most of the time in a state of instability.

(iii) The future research directions of the work:

The future research of the work would be to provide anonymity in Chord.  One of the most appealing uses of peer to peer is to share and spread data (e.g. bit torrent).  And these types of systems benefit the most when there are more nodes in the network.  I believe that folks would be more incline to join a peer to peer network, if there was no threat on their identity (think of all the identity theft these days).  Therefore, this system with all its benefits may never be adopted over a less efficient system (e.g. Freenet), if it doesn’t provide anonymity to its users.

 

Ten Things Lawyers Should Know About the Internet May 25, 2009

Although this paper has ten points which concern laws and the Internet, they can be boiled down to three:

  1. The laws relating to copyright, privacy, wiretapping, and common carriage are not relevant to the Internet today. Furthermore, trying to apply them to the Internet has many deleterious effects with no real gains, since the usage model of the Internet differs so much from the services provided at the time when the laws were written.
  2. Furthermore, we actually know very little about the Internet from an empirical perspective. This is due to two factors. First, much of the current legislation (see point 1) makes it impossible (or rather, illegal) to gather the very information that would illuminate the usage patterns on the Internet. This hamstrings researchers, who are able to come up with many wonderful theories about current or future traffic patterns, algorithms, etc., but are unable to test or validate them on the Internet at large. Secondly, even when legal, these same efforts are frustrated by the fact that the Internet is composed of many private companies only motivated by relatively short-term financial concerns and unconcerned with the viability of their practices in the mid- to far future.
  3. With both the legal and financial restrictions on what can and cannot be implemented, the Internet as it stands today is in a fair amount of risk. The underlying routing and naming services are insecure because they were designed for an Internet that is very different from the one present today. Not only are they insecure, but they are not designed to scale to the size of the modern Internet. Even though these problems are evident from even the limited amount of data available under current laws and from the companies operating the core routers in a secretive manner, the educated people who have exposed these problems are helpless to enact any change at a global level.

The main problem with this paper (and likely this summary) is that it views the current state of the Internet from a very pessimistic perspective. While it may be true that there are many unresolved problems on the Internet, it is risky for the researchers to look down from an ivory tower and proclaim that the Internet is doomed unless we do X, Y, and Z. What does need to happen is that government regulation needs to be designed in such a way as to not stifle the approaches that the market takes to optimizing the Internet for its current and future needs.

Future research needs to look at ways to incentivize the change that researchers think needs to be implemented. Unless they can provide economic incentives for the companies that control the Internet, then the changes will fall on deaf ears.

 

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises May 12, 2009

Three Important Things:

  • The motivating question for this work is that Ethernet networks provide several benefits such as “zero configuration” and a flat address space but do not scale well. The proposed solution operates at the network level and leverages a scalable DHT that does not require level 2 switches to maintain state for every host. Instead, host information (MAC, IP address) is mapped to a switch as the result of a distributed hash query. This removes the need for flooding in order to dissminate information, and protocols like ARP and DHCP can be replaced with on-demand queries and updates to a directory service abstraction.
  • SEATTLE was designed to provide the appearance of regular ethernet to regular hosts, while maintaining a new internal network structure. This is a key selling point and paves the way for realistic, practical deployment. This feature is achieved by making the switches “link aware.” A switch is able to detect which links are connected to hosts as opposed to other switches by sending a control messge not recognized by end host level 2 NICs. Such links are treated in a special way in order to perform discovery and configuration. DHCP and ARP packets are snooped for this purpose.
  • Experimental results were important in order to justify the claims of scalability and Ethernet class performance. Using simulation traces, the authors determined that forwarding table size is reduced by a factor of 41 in a 50K host network when compared to Ethernet. Table size and control overhead scale linearly with the number of hosts. In an Emulab implementation, SEATTLE switches significantly outperformed Ethernet in the middle of the network, however border switches required more processing time in order to deal with learning host addresses.

Glaring Problem:

Although a SEATTLE deployment would be transparent from the perspective of the end hosts, it would require a complete overhaul of the underlying network. This represents a significant cost barrier for an organization with a large existing infrastructure, the likes of which such a solution would be directed at. The authors dedicated little or no discussion to incremental deployment and coexistence with regular Ethernet switches and level 3 routers. Such a scenario has significant implications for the adoption of SEATTLE.

Future Work:

The DHT abstraction provided by the network is currently hidden from  the end hosts. Assuming the hosts can be made “SEATTLE aware,” they could make great use of the DHT for applications such as peer-to-peer content sharing. In order for this to happen future work would need to explore the necessary modifications to the host and possibly to the network.

 

Web Search for a Planet: The Google Cluster Architecture May 12, 2009

Three Important Things:

  • The Google service requires massive computational power and storage. A key insight is that handling queries is an embarrassingly parallel operation. Thus the service architecture is tailored for high request throughput. This is achieved through data distribution and replication, as well as load balancing across data centers and within data centers.
  • There are many choices for fulfilling the needs of a web search service. Google uses a cluster of commodity machines, which they have identified as the most cost effective way to build a data center. This follows the principle of aggregating large numbers of components with the best cost/performance ratio, as opposed to purchasing components with the highest absolute performance per component.
  • Web search is ideally an “always on” service, which means that it needs to be highly reliable and available. The google approach is to move reliability to software, as opposed to investing in safe hardware and high quality components. Data is replicated logically across multiple servers/clusters, and a level of indirection implemented with DNS allows catastrophic failures to be masked.

Glaring Problem:

Being a proprietary work and relatively short, this paper gives a high level overview of cluster architecture and does not delve into the implementation details. It would have benefited from mentioning the MapReduce framework as well as including more statistics on power usage, deployment numbers, and operational cost.

Future Work:

Computing clusters are becoming more and more prevalent as service  backbones. They happen to serve the needs of Google very well at the moment, but are they really the choice of the future for internet services? Future work could examine the general applicability of clusters and also alternative to them.

 

Web Search for a Planet: The Google Cluster Architecture May 12, 2009

In the paper “Web Search for a Planet: The Google Cluster Architecture”, three Google Engineers describe the architecture in place behind the most popular web search engine. The three main compelling features of this architecture are:

1. No specialized hardware is used to build the computing cluster for this architecture. Fault-tolerating hardware components are not used. The selection criteria for hardware components are focused on performance-per-watt or gigabyte-per-dollar, not on peak performance of single components. Reliability in this cluster is achieved by a software layer and massive redundancy of data and machines for the same function.

2. It is a multi-tiered architecture which has load-balancing functionality on each tier, therefore handling fail-over on each tier and reducing the overall complexity. This also addresses scalability issues when the document index grows, as machines only have to be added on one tier.

3. Since the data (term index and documents) is seldom subject to update operations, replication and distribution of this large dataset is easy, because the search queries don’t have inherent state information which needs to be passed around during query execution. The commodity machines each hold a little part of their data (index shards) and answer subqueries for it.

The most glaring problem of this paper would be the claim that this architecture is in its characteristics applicable to a wide range of web applications. As we have seen in the following years (paper is dated 2003), most web applications and web sites are not stateless anymore, as they are personalized for each user. This takes away some of the parallelism properties of this query-only architecture.

Future research could go into the power consumption of this commodity hardware to optimize the query-per-watt ratings of this architecture. This may also lead to research into specialized query-processing hardware with simpler processors, as indicated in the paper.

 

Internet Indirection Infrastructure April 16, 2009

Three Important Things:

  • The motivating observation by the authors is that the demand for additional internet abstractions is out there, and the current IP layer cannot provide things such as multicast, anycast, and mobility in a scalable way. Taking the overlay network approach to this problem is a crucial decision, and it has the advantage of working with the infrastructure in place instead of trying to replace it. Such a network could be easily deployed in today’s internet, and based on the evaluation of its merit could someday become the dominant protocol.
  • By decoupling the sender from the receiver i3 is able to implement a variety of services. Using indirection is a key design point that enables flexible communication between hosts. This transition to rendezvous-based communication  is a marked improvement over the point to point model.
  • As the way that we use the internet has evolved, security has become a primary concern because of the services that are delivered over the internet. The problem is that the protocols in use today were not designed with security in mind. Any new idea for internet architecture should make sure to address security concerns. The paper comments on issues that are particularly relevant such as eavesdropping and DoS attacks.  The ability to have private triggers, enforcing that only a host can insert triggers on its own behalf, and fair queuing are all important extensions.

Glaring Problem:
A major concern with i3 is that the end hosts are responsible for inserting triggers, which means that it would be hard to enforce order and structure in the resulting overlay network. I would have liked to see the paper address this more fully, and maybe elaborate on how a large scale network could evolve out of an ad hoc, approach. This is discussed in the context of preventing attacks, but not in terms of how the network should be structured.

Future Work:
The i3 overlay is presented to us as a general mechanism for implementing communication abstractions. It would be interesting to see what other abstractions could be implemented with i3 in its current form or with extensions.