Graduate Networks, UCSD

CSE222 – Spring 2009

A Scalable, Commodity, Data Center Network Architecture May 7, 2009

(i) the three most important things the paper says

One of the most important ideas that this paper brings to light is that, at the time this paper was written, a very-large-scale network with a 1:1 oversubscription rate was much cheaper to implement with a fat-tree design than with traditional data center network architectures.  Although the paper does not specify what percentage of larger data centers actually employ less efficient or more expensive architectures than the one proposed, it can be assumed that the problem is of sizable magnitude, at some level.  Another important idea that the paper brings to light is that a static approach to switching paths in the fat-tree architecture (and perhaps other architectures) is not as efficient of a solution as can exist for a minimal increase in cost.  This is shown through the increased efficiencies shown in the more dynamic protocols versus the static assignment of port forwarding.  The third most important idea in the paper is the fact that the fat-tree beats the traditional architecture in yet another metric (and perhaps even more of the more complicated, but not fat-tree oriented, designs that aren’t specifically detailed here): power usage and heat dissipation.  Since the fat-tree design detailed here sticks to only GigE switches, the power usage per Gbps was shown to be significantly lower (when compared to using 10Gbps switches to reach the same aggregate bandwidth marks).  For a large-scale data center, this should be a huge argument for switching to this architecture.

(ii) the most glaring problem with the paper

The biggest problem that I found in the paper was that in the Introduction and the Related Works sections, many advanced offerings from other companies were mentioned.  I was a little surprised to not find any benchmark numbers from these throughout the paper (or any direct comparisons with these systems).  I feel that the paper could have been much stronger had the authors shown the aggregate and link-based bandwidth comparisons between their system and the other commercial systems out there.  I say this with the understanding of the fact that many companies will not allow this type of testing (at least not without physically buying the actual solution–a prohibitively expensive research endeavor), but still think that it would have been very useful to provide some concrete numbers to supplement the other advantages of the system described in the paper (including the advantage of this system being less proprietary).

(iii) the future research directions of the work.

To start, I feel that this system should be compared with some of the more proprietary systems on a whole host of different metrics (cost, power, heat dissipation, complexity, aggregate bandwidth under different loads, etc.) in order to demonstrate its superiority (I repeated this here, even though I mentioned it in ii).  Another good future research direction would be a case study of this design in a real world example, with actual hardware running under a realistic data center load.  Also, it would be interesting to see if this design would still be power efficient (and still retain its benefits) when running with either a subset of 10Gbps switches or an entire design using 10Gbps switches.

 

Ethane: Taking Control of the Enterprise May 5, 2009

Three Important Things:

  • The authors gave an outline for what they believed should be the fundamental principles behind any network management solution. Governing policies should deal with high-level names such as users, devices etc. This is opposed to applying filters at lower levels like IP addresses, which are in constant dynamic churn. Policy enforcement entails directing the flow of packets explicitly, and ensuring a strong binding between a packet and its high-level origin. A network control architecture called Ethane is proposed to implement these ideas.
  • At the heart of the Ethane solution lies the centralized controller. The idea is that every network transaction is monitored and moderated by the controller. It handles registration and authentication of users and devices. This allows it to assign track the bindings between low-level entities and high-level identifiers. On the basis of these identifiers the controller can perform access control on packet flows, enforce resource limits on users, and program switches to implement multicast and anycast.
  • A major concern for a centralized network architecture is scalability, as the controller is involved in the setup of every distinct packet flow in the network.  Experimental results on a small network showed worst case setup times of 1.5ms. Results for a large network trace showed latencies of .6ms for 6,000 setup requests per second and .4ms for 2,000 reqs/sec. This lead the authors to conclude that a single controller could handle up to 20,000 hosts.

Glaring Problem:

There are many enterprise class organizations such as universities, corporations, and governments that might benefit from adopting Ethane. However many of these logically contiguous networks are physically separated across numerous offices scattered around the globe.  The paper claims to present an enterprise solution but makes little or no mention of such a deployment and its associated implementation concerns.

Future Work:

The Ethane project showed that it is possible to administer a fairly large network with numerous users and device classes using a central controller. Having a viable centralized network management solution is a powerful tool, and I would like to see exploration of other possible settings for Ethane deployments. These could include high demand settings like server farms, or high security settings like internal government networks.

 

Ethane: Taking Control of the Enterprise May 5, 2009

Ethane is a network architecture for providing enterprise-wide policy enforcement. The basis of the design is to use a central Controller to authenticate and authorize each flow between hosts. To accomplish this, the design calls for the use of customized switches to route new packets to the Controller for authorization and then to route authorized packet flows between hosts. The Controller receives all new packets from a customized switch that are not part of an existing flow. The Controller identifies the packets’ source and destination, and allows or disallows the flow according to the network policy.  For the Controller to be able to do so, it must keep track of which users are on which hosts and which hosts have which addresses.

The prospect of a central Controller keeping track of the entire network (topology included) and enforcing a network policy is a powerful one. It seems that the main contributions of this paper are two fold. First, because of the need for a custom layer 2 switch, filtering, QoS, load balancing, and other layer 3/7 features can be implemented in layer 2. This results from the fact that the Controller specifies if and where to route flows by adding entries in the custom switch flow tables. Implementing these features in layer 2 allows for very high processing speeds.

The other contribution is the case study of how well this design can perform in a real use scenario. Ethane was deployed at LBL and Standford University. Many proposed centralized designs can argue that they can scale, but until they are deployed and used the assertions are simply estimates. The authors provide flow request rates for both deployments over a period of days (on the order of a month). This metric is the most appropriate as it represents the heaviest load on the system’s bottleneck, the Controller. The results suggest that a single Controller (on a modest PC) can accommodate 20,000 hosts and up to 10,000 flow requests per second. This provides strong evidence that a centralized design can scale to a reasonable number of nodes.

The major flaw with this design (as pointed out by the authors), is that the Controller must be familiar with all protocols running over the network. Because all packets (not already authorized as part of a flow) are routed to the Controller, the Controller must understand all protocols from which it might receive a packet.

I think toady’s networks are running a huge number of protocols and network policy is constantly trying to keep up. It seems like Ethane is a good approach to enforcing a policy across the network. But it seems that further research may yield a similar approach in a decentralized manner. Perhaps using the same constructs, but by virtually partitioning the network.